Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

DNA and Mapping Databases: Understanding Nucleotide Sequence and Genomic Mapping Databases, Study notes of Genetics

This lecture discusses the importance of dna and mapping databases, focusing on nucleotide sequence databases and their flatfiles, as well as genomic mapping and its challenges. The dataflow of major dna databases, the importance of accuracy and ease of use, and the dissection of nucleotide sequence flatfiles. It also introduces mapping databases and their relationship with sequence data.

Typology: Study notes

Pre 2010

Uploaded on 07/28/2009

koofers-user-qgb
koofers-user-qgb 🇺🇸

5

(2)

10 documents

1 / 4

Toggle sidebar

Related documents


Partial preview of the text

Download DNA and Mapping Databases: Understanding Nucleotide Sequence and Genomic Mapping Databases and more Study notes Genetics in PDF only on Docsity! 1 1/17/2006, 123 Long Hall Lecture 2: DNA Databases & Mapping Databases I. DNA Nucleotide Databases 1. Dataflow of three majr DNA databases Figure 1. Dataflow for new submission and updates between the three databases 2. Importance of accuracy and ease of use for nucleotide sequence databases: a. Sequence comparison: more useful to translate DNA into coding protein database b. Avoiding error propagation c. Facilitating information retrieval 3. Nucleotide Sequence flatfiles a. Most common format-flatefile b. Sequence record represented as a string of nucleotides with tags and identifiers c. FATSA format: (>) denotes the beginning of a new seq record- definition line (‘def line’) and an identifier (accession ID) d. Upper or lower case letters for DNA seq; usually 60 character per line (Courier font is the best) e. Similarly, a protein seq can use FATSA format 4. Dissection of nucleotide seq flatfile: a. Header- database specific; first item-DDBJ/GenBank (LOCUS), EMBL (ID); has to be unique within the database; second- length of seq; third- molecule 2 type, biological nature of the molecule; fourth- division code (INV), historical; date-last date when the record was last made public b. Organismal division (http://www.ncbi.nlm.nih.gov/HTGS/table1.html) • BCT - bacterial sequences • FUN – fungal • HUM - Human • INV - invertebrate sequences • MAM - other mammalian sequences • ORG - Organelle sequences • PHG - bacteriophage sequences • PLN - plant, fungal, and algal sequences • PRI - primate sequences • RNA - Structural RNA sequences • ROD - rodent sequences • SYN - synthetic sequences • UNA - unannotated sequences • VRL - viral sequences • VRT - other vertebrate sequences c. Functional division: • CON – Constructed 9or Contigged) records of chromosomes, genomes, and other long DNA sequences • EST - EST sequences (expressed sequence tags) • GSS - GSS sequences (genome survey sequences) • HTC - unfinished high-throughput cDNA sequencing • HTG - HTGS sequences (high throughput genomic sequences • PAT - patent sequences • STS - STS sequences (sequence tagged sites) • WGS – Whole Genome Shotgun Sequence d. EST – expressed sequence tag • Partial DNA sequence (“single-pass”) of a cDNA clone • Largest and fastest growing division of GenBank • Derived from some specific RNA source • Source field can be searched e. Second part of header: definition lines (DE in EMBL)- summary of biological content f. Accession number: cited in publication, two formats: ‘1+5” and ‘2+6”, one upper case letter followed by five digits; more than two accession numbers first one is the primary one; version U54469.1 ACCESSION VERSION, accession unchanged but version incremented each and every time the sequence changes. g. Source & organism (OS& OC in EMBL) h. Feature tables: tabled, direct representation of biological information; feature keys, location, and additional qualifiers; source feature is the only feature that must be present in all DDBJ/EMBL/GenBank entries; CDS-coding sequence, instruction on how to join two sequences together or how to make an amino acid sequence from the indicated coordinates and inferred genetic code.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved