Download Expressed Sequence Tags - Applied Bioinformatics - Notes | BIT 150 and more Study notes Bioinformatics in PDF only on Docsity! Lecture 11 - Expressed Sequence Tags (ESTs) • Baxevanis and Ouellette, chapter 12 (2nd edition • http://www.ncbi.nlm.nih.gov/About/primer/est.html Genome
Transcriptome
Proteome
Metabolome
Phenome
,ATG CGA CAA TAG
5 Tt ott ott 3
gttttit tit tits,
TAC GCT GTT ATC
Transcription
Initiation Stop
codon codon
AUG CGA CAA »» UAG
UAC GCU GUU
PTL
Met Ala Glu
Translation
Met Fy Ala Fy Glu
} Non-transcribed DNA strand (sense)
} Transcribed DNA strand (anti-sense)
} Messenger RNA (mRNA) with codons
Transfer RNA (tRNA) with
amino acids attached to
anti-codons
Amino acid sequence of polypeptide
as determined by the sequence of
} the first nine nucleotides in the anti-
sense strand of the gene and
preserved through transcription
and translation
Complementary DNA
cDNA
Poly-A tail mRNA
STEP 1
Isolate single-stranded 3 MMATTITITITIIITI TIT TTT 5
reid
messenger RNA
STEP 2
Anneal poly-dT primer MMMM TITTTITITTITT ITT TTT TTT ©
to poly-A tail of MRNA ' ' ' I '
TITTT |
Poly-dT primer
STEP 2
Add reverse transcriptase 3' AAAAA 5
rriid
enzyme to synthesize
I ti trand brid
complementary strani 5° TTTTT 3
|
STEP 4
Remove mRNA strand 5' TTTTT
with RNAseH enzyme biti TITTTTTTTTTTTTTTTTTT! TY
| .
STEP 5
Se eM rene TETITTTITITIITI TTI ITT IT
complementary to remaining ' uw I '
DNAstrand using DNA” y AALAAFILILILTILILLILIILIL LL
polymerase | |
STEP 6 Recombinant
Insert double-stranded Plasmi \q Transformation
cDNA into plasmid + J
vector (see Box 4.1)
and transform
into E. coli
Transformed E. coli cell
Competent E. coli cell
DNA cloning in
bacteria plasmids
Plasmid Chromosomal
DNA DNA
STEP 1 Oo gS
Bacterial cell a
E. coli cell
containing plasmid
to be used as vector
STEP 2
Plasmid vector removed
#4 = EcoRI
and cleaved with EcoRI
Plasmid
DNA
STEP 3
Intact foreign 5° ¢
DNA to be 1
cloned ae
a
STEP 4
Foreign DNA AAT-T-G
cleaved with I
EcoRI and G
fragment isolated
from flanking DNA
Isolated DNA
fragment
STEP 5
Foreign DNA fragment 48-8
ligated with plasmid
vector using the enzyme
DNA ligase
* = DNA ligase
Recombinant
O°
lasmid
pl ir \ Transformation
STEP 6
Recombinant
plasmid transformed
into E. coli
7
Transformed E. coli cell
Competent E. coli cell
Fig 2. http://www.ncbi.nlm.nih.gov/About/primer/est.html
at mRNA
ve Transcriptase
a+ RNA
a+ cDNA
Ribonuclease degredation of RNA
Synthesis of Second Strand of DNA
Double
Stranded DNA
Forward Sequencing Reverse Sequencing
Primer Primer
>
S'EST 3' EST
—__ NCBI dbEST SE
Se http: //www.ncbi.nim.nih.gov/dbEST/index.html & Q> NCBI dbEST
E ds T
> vical mpressed Sequence ags
tl Entrez BLAST oY Eee Structure
SeachESTforts—“i‘s™S™S*;*~:;S
modified during the last | 10 Years + }
NCBI
> What is dbEST?
STN
dbEST (Nature Genetics 4:332-3;1993) is a division of GenBank
that contains sequence data and other information on "single-
pass" cDNA sequences, or Expressed Sequence Tags, from a
number of organisms. A brief account of the history of human
ESTs in GenBank is available (Trends Biochem. Sci. 20:295-
6;1995). Also, consult the special "Genome Directory" issue of
Nature (vol. 377, issue 6547S, 28 September 1995).
> Other ways to access dbEST
Other ways to access dbEST
> How to submit data
O00 dbEST Summary
<> G + | Shttp://www.ncbi.nim.nih.gov/dbEST/dbEST_summary.htm! - Q> NCBI dbEST °}
ye dbEST: database of "Expressed 0
eae Sequence Tags"
dbEST release 101405
Summary by Organism - October 14, 2005
Number of public entries: 30,014,098
Homo sapiens (human) 6,134,812
Mus musculus + domesticus (mouse) 4,686,083
Xenopus tropicalis 1,038,272
Rattus sp. (rat) 704,494
Bos taurus (cattle) 702,434
Ciona intestinalis 684,319
Danio rerio (zebrafish) 673,076
Zea mays (maize) 604,279
Triticum aestivum (wheat) 600,039
Gallus gallus (chicken) 578,416
Sus scrofa (pig) 502,500
Xenopus laevis (African clawed frog) 473,761
Arabidopsis thaliana (thale cress) 420,789
Oryza sativa (rice) 406,790
Hordeum vulgare + subsp. vulgare (barley) 394,996
Drosophila melanogaster (fruit fly) 383,407
Glycine max (soybean) 355,970
Canis familiaris (dog) 349,306 -
Pinus taeda (loblolly pine) 329,469 -
Caenorhabditis elegans (nematode) 302,080 ’
a
L<> jie ji +
c
S NCBI
arte
gee le tg)
ea
roar
teen fo)
eines)
Map Record
=— dbEST ae
S nitp:/ /www.ncbi.nim.nih.gov/dbEST/how_to_submit.ntm| -Q> NCBI dbEST °
dbEST: database of “Expressed 0
Sequence Tags"
Entrez BLAST ce) BE ee ti Structure
[Submitting sequences to dbEST and GenBank
Expressed Sequence Tags (ESTs) are short (usually about 300-500 bp), single-pass
sequence reads from MRNA (cDNA). Typically they are produced in large batches.
They represent a snapshot of genes expressed in a given tissue and/or at a given
developmental stage. They are tags (some coding, others not) of expression for a
given cDNA library.
Additional information about ESTs can be found in:
i MS, Low T 2v CM. 1993. dbEST--database for "expressed
sequence tags." Nat Genet 4(4):332-333.
Most EST projects develop large numbers of sequences. These are commonly
submitted to GenBank and dbEST as batches of dozens to thousands of entries, with
a great deal of redundancy in the citation, submitter and library information. To
improve the efficiency of the submission process for this type of data, we have
designed a special streamlined submission process and data format.
dbEST also includes sequences that are longer than the traditional ESTs, or are
produced as single sequences or in smal] batches. Among these sequences are
products of differential display experiments and RACE experiments. The thing that
these sequences have in common with waditional ESTs, regardless of length, quality,
or quantity, is that there is little information that can be annotated in the record.
If a sequence is later characterized and annotated with biological features such as a
coding region, 5'UTR, or 3'UTR, it should be submitted through the regular
GenBank submissions procedure (via Banklt or Sequin), even if part of the sequence
ae
Q> NCBI dbEST
UniGene
ORGANIZED VIEW OF THE TRANSCPUPTOME
y
<> NCBI
Pood ced
[Limits | Previewlindex | History | Clipboard | Details |
UniGene is an experimental system for automatically partitoning GenBank sequences into 2 non-
recundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent
@ unique gene, as well as related information such as the tissue types in which the gene has been
expressed and map location.
Dee
Species UniGene Entnes
er eal) Chordata
Mammalia
Bos taurus 39,048
Canis familiaris 22,930
Homo sapiens 54,576
Macaca mulatta 4,701
er ee ea Mus musculus 43,104
Ovis aries 1,714
Rattus norvegicus 38,675
Sus scrofa 32,069
Reet aia Aves
_ Gallus gallus 30,221
HomoloGene Amphibia
7 Xenopus laevis 28,040
ead Xenopus tropicalis 33,132
Actinopterygii
nhac Danio pois 31,681
= Fundulus heteroclitus 3,154
td Oncorhynchus mykiss 24,362
Oryzias latipes 9,848
Sacaaald Salmo salar 8,371
Takifugu rubripes 2,355
MGC cDNA clones Ascidiacea
Ciona intestinalis 14,373
ares LN Ciona savignyi 6,315
Projects Molgula tectiformis 7,351
fe) OO UniGene =|
<> Cc + S http: //www.ncbi.nim.nih.gov/UniGene/UGOrg.cgi?TAXID=3352 = Q> NCBI dbEST
ee
PubMed Protein Genome Structure PopSet Taxonomy OMIM
Search UniGene ~3) Pinus taedaforganism| (Go) (Clear )
Limits Index History Details
Pinus taeda: UniGene Build #21
Lineage:cellular organisms; Eukaryota; Viridiplantae; Streptophyta;
Charophyta/Embryophyta group; Embryophyta; Tracheophyta;
Euphyllophyta; Spermatophyta; Coniferophyta; Coniferopsida;
Coniferales; Pinaceae; Pinus; Pinus; Pinus taeda
Known genes are from GenBank 06 Sep 2005
ESTs are from dbEST through 06 Sep 2005 UniGene Links
108 mRNAs Library Browser
0 Models DDD
0 HTC Q Ti
62,100 EST, 3'reads ery Nps
140,375 EST, S'reads FAQ
27,037 EST, other/unknown
229,620 total sequences in clusters
Other Links
uild M Transcript Base Gene
Alignments between all transcript sequences are used to generate HomoloGene
clusters of sequences originating from the same gene.
More... dbEST
Trace Archive
N sters (« BLAST
14,198 sets total CGAP
89 sets contain at least one MRNA
0 sets contain at least one HTC sequence
14,188 sets contain at least one EST
79 sets contain both mRNAs and ESTs
<a. The Institute for Genomic Research -- Gene Indices
Cl) + |) Si hup:/swww.tigr.org/tdb/tgi/plant.shumt 7 Qy NCBI dbEST
44 TIGR
The Insituiesror’ Ge th is
we
Home Research Groupe | Scientitic Progrems
© Database Home Gene Indices
© Lomerechonsivo
Microbial Resources What's New | Blast | TG1 Software | £60 | Genomic Maps | Resourcer | FAQ hk
Unfinished Microbia
® Genomes
© Plant Genomics
© Parasites Databases
Per)
Fead ult eet)
PEE]
a 4.0. ee andy
; Pe 1 de
© Fungal Databa:
© [brelvtic Ruminel
Bacteria
ee Pel
Bre)
VEE sites
butler ay
[ETzs
Genome Properties
Database
TGI Data 5 amy “a ee