Download CS5263 Bioinformatics: Guest Lecture on Regulatory Biology and the ENCODE Project and more Study Guides, Projects, Research Computer Science in PDF only on Docsity! CS5263 Bioinformatics Guest Lectures: The ENCODE Project and Phylogenetics Computational Analysis of cis-regulatory regions Carolina Livi Computational Biology Initiative (livi@uthscsa.edu) Nov. 4th, 2008 Regulatory Biology From gene in genome to protein in proteome Similar process in different compartments
A) EUCARYOTES
primary ANA transcript
| transcription
ADDS CAP AND
POLY(A) TAIL
es
J rva seuicins
mans Qa AAA fo
(8) PROCARYOTES —
http: / /www.accessexcellence.org/
AB/GG/steps_to_Prot.html
Eukaryotic transcriptional initiation involves many
general factors, as well as specific enhancers.
Sgtivator protein
BEES SS
TATA box
start of
BINDING OF transcription
GENERAL TRANSCRIPTION
FACTORS, RNA POLYMERASE,
MEDIATOR, CHROMATIN REMODELING
COMPLEXES, AND HISTONE ACETYLASES
Lo
enhancer
(binding site for
activator protein)
et
chromatin
remodeling
complex
> s+
| = | histone acetylase
Figure from MBOC
Where to get sequences in FASTA format? HomoloGene.
i} HomoloGene
ee
| umes | Prewewindex History | Cliptoard | Details |
Display “Sammey =) Show! 20 = )[sendve 2)
r } cx“,
All: 70 Fungi: 0 Mammais: 12 |%
aay tems 1 = 20 of 70 (ree) [7] of 4 Newt
Quory Tips 113 HomeioGe = al Wishes |
pone Pi ali U1: HomoioGene:74903. Cee conserved in Annrivta Downoad, Lines |
ami Hzagions PDGFE panelel dered growth factor beta po'yze...
P.bogdytes LOcdsSa44 sirmilar to Plaletet-derived growth factor...
Clamilians POGFE Platelet-derived growth factor beta pone...
Monusoulas Pdgfa Platelet derivad growth factor, B polypapt...
frp Arovegous Pacts pistelet derived growth factor, B polypapt...
eee)
percent Cgullus PDGFE peielet-derved growih factor bets poyze...
jormooGene. 32065. Gere conserved in Amnniota Sownoud, Linas
ee)
H.sapions PDGEA Pistclet derived growth factor alpha polyp... |
Clsrrilisnis POGFA Pisielet-derivad growth factor alphe polyp...
M.mnugeulss Pagts pistelet derived growth factor, apna
FAirovegour Padgfa Platelet deevad growth factor, apna
G gallus POGEA Pistalat derivad growth factor alphs pelyp...
93: HaroioGane-9423. Gena conservad in Amnicta Download, Links |
H.sanians POGFC Platelet derivad growth tactor C |
P.tegiedytes POGEC Pan trogiodytas POGFC gare
Charilianis LOCKE2886 Similar to gislelet-derived growth facier ...
M.museulas Pagle Pisielel-derived growth factor. C polyoupl...
Finoveg ous Pagic platelet-derivad growth factor, C polypapt...
G gallus scosF Platelet deevvad growth factor C
jamoinGanect*B76. Gane canservad in Amricta Dawniaad, Links |
Hsapians POGFD Platelet derivad growth factor D
P.trogiedytes. PDGFD Platalat derivad growth factor D
C.familianz LOCA7EE80 similar 10 platelet devived gromth foster...
M.musculas Payl pistelel-derivad growih fuctor. D poypept...
C.gallue PDorD pistelet derivad growth tactor D y
jomooGeane3* 361. Gere conserve in Bilateria Dowoad, Like >
HomoloGene in a convenient place to get sequences from different species, but is not accurate phylogenetically!!! Select sequence type (mRNA, protein, or genomic) and species. If genomic, specify length 5’ and 3’ of exons. Ensembl • www.ensembl.org • Good to look for genomic information • Use to annotate 5’ and 3’ UTRs – Upload sequences by “typing in text” – Copy and paste UTR sequences from exon info • Will setup 3 way analysis of mRNA • Look for post-transcriptional regulatory elements www.ensembl.org
Ensembl Genome Browser
+ | EXinttp://www.ensembl.org/index.htm!
OGS9 71: Google
©! Ensemb1
rn
CCGB: Miller Lab Sea Urchin G...ome Project RSA-tools -...arch manual Plant miRNA ...rget Finder Database Lin...r Biologists RegRNA: A R...ents Finder AAAI Digital..ular Biology
Search all Ensembl: { Anything ez
Use EnsembI to. |What's New in Ensemb! 41 2 ] _ Mammalian genomes | | Other species
=> Runa BLAST search » New species - Medaka (Oryzias latipes) Homo sapiens Gallus gallus
= Search Ensembl database NcBI36 | Vega wast 1 | UPDATED! pro!
+ Data mining [BioMart]
= Display your own data
= Export data
= Download data
Other Links
= Help & Documentation
= What's New
= Home
= Sitemap
€f View previous release of
page in Archive!
€f Stable Archive! link for this
page
SES i
Sa)
ra
a)
New chimp assembly and genebuild (Pan
troglodytes)
New genebuild on zebrafish assembly Zv6 (Danio
rerio)
» Import of WormBase 160 (Caenorhabditis
elegans)
» New animated tutorials (all species)
More news...
Ensemb/ is a joint project between EMBL. - EBI and the
Sanger Institute to develop a software system which
produces and maintains automatic annotation on
selected eukaryotic genomes. Ensembl is primarily
funded by the Wellcome Trust.
This site provides free access to all the data and
software from the Ensembl project. Click on a species
name to browse the data.
‘Access to all the data produced by the project, and to
the software used to analyse and present it, is provided
free and without constraints. Some data and software
may be subject to third-party constraints.
For all enquiries, please contact the Ensembl HelpDesk
(helpdesk@ensembl.ora).
» pre! Ensembl - previews of upcoming assemblies
chive! - past releases of Ensembl
VEGA - Vertebrate Genome Annotation
EBI Genome Reviews database - mainly archaea
and bacteri
» Trace server
Other sites using Ensembl software.
UPDATED! Pantro 2.1
Macaca mulatta
ea) MMUL 1.0
Otolemur garettii
NEW! SUSHBABY!
€ Pan troglodyte:
Mus musculus
NcBim3é | Vega
> Rattus norvegicus
RSC 3.4
Oryctolagus cuniculus
ABT
Canis familiaris
BY conram 1.01 Vega | pret
Felis catus
caT
Bos taurus
i” sau2.0! UPDATED! pre!
Sus scrofa
(clone status map)
Sorex araneus
NEW! sorAvai
Erinaceus europaeus
NEW! erEurl
Myotis lugitugus
NEW! NiCROBATI
lly Dasypus novemcinctus
ARMA
Xenopus tropicalis
y64.1
Danio rerio
UPDATED! 2v 6 | Vega
Takifugu rubripes
FuGu 4.0
@ Tetraodon nigroviridis
TETRAOOON 7
Gasterosteus aculeatus
BROAD S1
Oryzias latipes
NEW! MEDAKA 1
Ciona intestinalis
sc
Ciona savignyi
csav2.0
Drosophila melanogaster
a0GP 4
Anopheles gambiae
ZYSE Aaames
| Aedes aegypti
Aaegl 1
Caenorhabditis elegans
UPDATED! WS 150
Saccharomyces cerevisiae
sco1
Chicken Genome Page
€! Ensembt Chicken .9, Contia196.12, ENSGALG00000001136
Use EnsembI to... Explore the Gallus gallus genome
Click on a chromosome for a closer view ‘Assembly
This site presents an annotation of the first draft chicken genome assembly,
March 2004 [NIH press release]. The chicken genome sequence was
determined by whole genome shotgun at the Genome Sequencing Center at
Washington University, St Louis. The analysis of the chicken sequence
involves an international group of scientists including individuals from the US,
UK, Europe and China.
Annotation
The gene set for Chicken was built using a modified version of the standard Ensembl genebuild
pipeline. The majority of gene models are based on genewise alignments of proteins from other
species. Most of the proteins being aligned were from species genetically distant to chicken. To
improve the accuracy of models generated from these proteins, the Genewise alignments were
made to stretches of genomic sequence rather than to 'minisegs’. The gene models were
assessed by generating sets of potential orthologs to genes from other mammalian species.
Potentially missing predictions and partial gene predictions were identified by examining the
‘orthologs, and exonerate used to build new gene models for these based on the human ortholog
peptide sequence.
Other Links
Warning
This release of G. gallus GGAW contains some sequence that is not specific to
chromosome W. A large portion of the sequence assigned to W was done so based on
the presence of W-specific repeats. These repeats have now been shown to be not
specific to chromosome W. Thus, the only portions of GGAW which should currently be
considered specific to W are:
Caenorhabditis > = 195831
elegans » Chr W, bases 4895452 - 4916845
» All of Chr W_random|
WormBase 160
FPR ce 15806 8
Ie
| © Exon Information
No. Exon/Iintron
5" upstream sequence
1 ENSGALE0000013174:
Intron 1-2
2 ENSGALE00000131744
Intron 2-3
3 ENSGALE00000131746
Intron 3-4
4 ENSGALE00000131747
Intron 4-5
5 _ ENSGALE 131742
Intron 5-6
6 ENSGALE00000131741
Intron 6-7
7 ENSGALE00000131743
3' downstream sequence
El Supporting Evidence
Chr Strand Start
1
End StartPhase EndPhase Length Sequence
tagcagagecececagcectgctcecegcacececeggtactgaggegat
47,436,158 47,436,314 157 GGGGACAGGCAGCCTGCTTGCTGCCTGAGGCCGGCTECCACTGCTTCCCTCCCGGGECTC
‘TCCGTCCATGTGCCCGCAGCCGGCAAGGCTTGAACCCGGCATGAATTICGGCGTGGTCTT
CGECGTCATECTCTCCCTECCCCTGGCCCGCCTGGAG
47,436,315 47,440,898 4,584 gtgagtcccatagaggggaggccgg.......... ttttttttttcttgteatctggeag
47,440,899 47,440,995 97 GGGGACCCCATACCCGAAGATATTTATGAGATTTTGGGTGGCAGCTCCGTACGCTCCATC
AGTGACCTCCAGCGTGCCCTGCGGATAGACTCCGTAG
47,440,996 47,443,763 2,768 gtaaatctcctcttcaccaaacact......++ ecagetctctecttccctttgcag
47,443,764 47,443,847 84 AGGAGGACAGCTCTAGCCTGGACCTGAATGCAACTCAGCCCAGCCAAAACCATGTGTCCC
‘TGTCTCGAGAGAGGCGAAGCCTTG
47,443,848 47,444,070 223 gtgagtgtggggtgctgcacctcgt......-..- teactectectcteggectctgcag
47,444,071 47,444,276 206 ATGCTCTGGCAGCAGCAGAGCCAGCTGTCCTCGCCGAGTGCAAGACACGGACGGTGGTCT
‘TIGAGATCTCCCGTGACATGGTGGACAGCACCAATGCCAACTICGTGGTGTGGCCACCCT
\IGIGCAGTGCCGCCCCA
‘TTGCAGATTCGCGTCCGGCACGTCCAG
47,444,277 47,445,969 1,693 gtaaggcaggcatagececctaaac........++ tttgggcgcatctctcttcaaatag
47,445,970 47,446,111 142 GTGAACAAGATTGAGTTTTTCCAGAGGAAGCCAATATTCAAAAAAGTCATCGTGCCTITG
GAGGACCACGTGCAGTGCCGGTGCGAAGCGGTGTCCCGGCCGCCACCCAGGAGCAACCGA
CCGGCATCCCGTGAGCAGAGAC
47,446,112 47,446,409 298 gtaaggacctcagectttgtagtgc.........- tatgctetecectetettttggcag
47,446,410 47,446,579 170 GCTTGTCGCCGTCATTCACCACAGCCGCCATCTCCCAGAGGAAGCGGGTACGCCGGCCGC
‘CAGCACAGAAGAGAAAACACAAGAAATACAAGCATGTCAACGATAAGAAAGTGCTGAAAG
‘AAATCCTCATAGCATAGAAGTGCTGGCAGGGGAGAGAGAGCACAAGGCAG
47,446,580 47,447,513 934 gtaacagcaagetgttttccectgg. -gtgcctgtttttgtttgeectccag
47,447,514 47,447,712 199 GTTTATTTAATATATTIGCTGTATTGCCCCCATGGGGTCCTTGGAGTGATAACTITTCCT
CTTTGCTGGTCTGCCTCAACGACTGATTCAGGCGGCAAATGGTGCTTCCCTTTCCATCAG
‘TGGACCTTCTCCTACCGAAGCCTCTCCCTTCTTTCATTTATTAACATCTTAAAGTTTTAC
AAAAAACAAAAAACCAACC
aaaaaaaaaaaaaaaaaagaaaaaaagacaaacacagcttatatatatat..
The supporting evidence below consists of the sequence matches on which the exon predictions were based and are sorted by alignment score.
There are a large number of supporting evidence hits for this transcript. Only the top ten 10 hits have been shown. Click to view all 15 supporting evidence hits.
Score:
NP_se9601.1
31240
espDu9
ceari7
oso
potzr
Pi2019
as613
aspc0s
‘Ag031025.1
Mio «ME>-00 ME>-o7 M>-0
12 sg sg
5 M>=s0 <=50 NO EVIDENCE
7
.9i146048972)ref|NP_G20601.1| platelet-derived growth factor beta polypeptide (simian sarcoma viral (vsis).
ne ee a a
P31240.1 POGFB_MOUSE Plateletderived growth factor B chain precursor (PDGF B-chain) (Plateletderived.
(Q6D0U9.1 GBDOU_XENLA Pagfb protein
(6077.1 POGFE_CANFA Plateletderived growth factor B chain precursor (PDGF E-chain) (Plateletderived,
(Q0sn22.1 POGFE_RAT Platletderived growth factor B chain precursor (PDGF B-chain) (Plateletderived
P01127.1 POGFE_HUMAN Plateletderived growth factor B chain precursor (PDGF E-chain) (Platsletderived,
P12919.1 POGFB_FELCA Plateletderived growth factor B chain precursor (PDGF B-chain) (Platletderived
(€20613,1 €20613_FELCA Cis prob oncogene.
(260008.1 G80DC08_BRARE Pagfa protin
‘ABD31025.1 Gallus gallus PIGF-B mRNA for plateletderived growth factor B chain, complete cds.
Cartwheel/FamilyRelations • http://family.caltech.edu • Software program – Nucleotide sequence alignment – Consensus motif searching – Sequence annotation Mouse X Human PDGFB - PipMaker at 70%
Pair View| Det Pat|
Tep requence: "More genemi,..""; hotter sequence: “Human genomi...""
I I} ih
Mouse X Human PDGFB - PipMaker at 70%
Pair View| Det Pat|
Tep requence: "More genemi,..""; hotter sequence: “Human genomi...""
I I} ih
Figure 2. A "dot-plot" style view of a subregion of the otx comparison (see Figure 1). The top sequence is a zoomed-in view of the otx genomic region from S. purpuratus, as in Figure 1; the region runs from 119.6 kb to 133.0 kb. The side sequence is a zoomed-in view of the orthologous region from L. variegatus, running from 38.5 kb to 51.5 kb. The region surrounding the first exon (in red) of the sp α-otx transcript is selected on the top (S. purpuratus) sequence, and the corresponding TBLASTX matches are highlighted on the left (L. variegatus) sequence in blue. The selection box in the center of the view contains the paircomp matches in this region, showing only 20 bp matches that match at 19/20 or 20/20 (corresponding to a 95% threshold). A closeup view of this region, showing the DNA sequence of the two regions with the corresponding matches, is shown in Figure 3. Figure 3. A closeup view of the paircomp comparison of the genomic sequence surrounding the first exon of otx in S. purpuratus (top sequence) and L. variegatus (bottom sequence). The top half of the closeup view shows orthologous 2 kb genomic regions (126.2 kb – 128.3 kb in the S. purpuratus BAC, 44.4 kb – 46.5 kb in the L. variegatus BAC). Matches of 19/20 or 20/20 bases are drawn in red between the sequences, and the exon matches from Figure 2 are shown in black on the sequence lines. The bottom half of the closeup view shows the part of the sequence selected in blue on the top half of the view. Lines are drawn in black between individual matching bases, and the matching bases are colored in red. Note that both blocks shown match at 19/20 because of the single mismatch in the middle of the blocks. VISTA and PIPMAKER • Phylogenetic Footprinting – Conservation reflects function • Transcription Factor Binding Sites – Motif searching – SELEX • Examples of cis-regulatory regions • Using Vista • Using PipMaker sequence2 mouse:1-241717
Alignment 1
sequence
puman +)
- 7
Criteria: 70%, 100 bp
Regions: 102
X-axis: sequence2
Resolution: 79
Window size: 100 bp . \ i
ul | ill fh ull Lf wt |
\ | ;
ts Li
Hatt Hite tis cate
000125094
PREPPPEEE
000102543
000125154
P>PP>o>>>
000102556
000125203
P>PP>o>>>
000102616
oqoo000000
BREPPPEEES
000102669
ooo000000
BPE DDD DD>
000102729
000125254
PPD De
000102789
000125279
SPP PSS >>
000102848
000125339
SPP PSS >>
000102908
000125399
BREPPPEEE
000102968
000125452
P>PP>>>>>
000103027
000125512
P>PP>S>>>
qo0000000
000125572
REPRE EEE
qo00o0000
GACTATTGGACTTCTCAGTGATGCTTGTGTCCCTCTCAATAGCACATTCCTTATTGCTTG
TCTTGCCA--
TGATAAACAGCAGAACACAGCTGACAGGATAGGATG-TAAATAATATG-~-------~- TC
LEE TE PELTTT tE
TGAAAGAAGGCAGAACACCACCAGCTGGAGAGGTGGGTACATTTTATGCCTACTGTAGCA
ATTCCCACTTCTGAGTCTTTTGCTCCC------- TICTICCATTGTTTGCCAGGACAG--
LL TIRE ot LiL Tt Te |
TCCCCCTCTICTGAGTCTTTTTITTCECETITCCTTCTIICTITCTTT-------| crrre
TITCTUTCTITCTITCTICTGAGTCTTGTAGTTAGCTTCAGCTGTTTTCATACTTGCTTA
TGTATTCAGGCTGCCAGCAACTCACAATCCTCCTGCCTGAGCATCTGAGTGCTGGCAGTA
eee e ee = - - 5 == = ‘CTCTGGCCTIICTA'
TTTEGCTTAGT
Ltd UT tt
CATGTGGCTGCTTCCCTGTGACTATGTCTCTTATTICCGTIGTITITCTATTICG-TGAGT
TTTCTAGAGCCATCCTAGCAATC
GTGGACCACTGCTTTCTCCAAGC TGTTAGCTITATCT
LET TEE Td TT LEEET TEE TEE TE TE TE Tr |
CCAGGCCATTGCTTTCTCTAAGTTTCCTAGAACCATCCTAGTAACCCATTAGCATCATTT
CCTGGAGTCCTTGCCAGAATGTAAATACTGTAGTACAGGAATGTGCTTATATCAATATGC
LETTE TLE TTT EE Tr TEELTETE TE
CCTGGAGGCCTGGCCAGGATGTAGATCCCACAGTACAGGGATGTTCTTACATCAATAAAC
TCTGCTTTATTTAGCATTATTTICTCC-TCCCTCCCCAATTCAGCACA------= CTGGGG
PCRGCTPTECTTAGCATTATGPTTTCTCTTCCRCCETCCTTTCCCA-ARGCAATCTCGGG
ACCCTITACCCAGTTICTCCCACCTTICTGCAGATGTGTATATAAGTTGGGTAGGTTTITTT
ACCTGATCCEATGCICTECTAGC=-no--aa-nnnnnnnnnnnnennnnnnnnnennens
TCAGAGGAGTCTCTCTCTGTCACCCAGGCTGGAGTGCAGTGGCCCGATCTCGGCTCACTG
000125153
ecdecccce
000102555
000125202
edecccece
000102615
000125253
eidcccece
000102668
ooooo0000
ecdecccce
aool02728
ooooo0000
edddcecce
000102788
000125278
ececcccce
000102847
000125338
cceccecce
000102907
000125398
cceccecce
000102967
000125451
ecdecccce
000103026
000125511
eidcccece
000103049
000125571
edcccece
oooo000000
000125631
ecdecccce
ooooo0d00
ENCODE Project: Junking the Idea of Junk DNA • Many more regions of chromosomes are transcribed than previously thought (genome pervasively transcribed), function remains unknown • Many more transcription initiation sites than previously thought The above statements are obviously interconnected. Annotated and unannotated TxFrags detected in different cell lines. The proportion of different types of transcripts detected in the indicated number of cell lines (from 1/11 at the far left to 11/11 at the far right) is shown. The data for annotated and unannotated TxFrags are indicated separately, and also split into different categories based on GENCODE classification: Exonic, Intergenic (Proximal being within 5 kb of a gene and Distal being otherwise), Intronic (Proximal being within 5 kb of an intron and Distal being otherwise), and matching other ESTs not used in the GENCODE annotation (principally because they were unspliced). The y-axis indicates the percent of tiling array nucleotides present in that class for that number of tissues. Nature. 2007 June 14; 447(7146): 799–816 ENCODE Project: Junking the Idea of Junk DNA • 5% of genome highly conserved • Main assumption is that conservation reflects function • 60% of conserved regions have known function • We o not know the function of 40% of conserved regions so far identified • Some regions not conserved by criteria used for evaluation have known function Higher-order functional domains in the genome. The general concordance of multiple data types is illustrated for an illustrative ENCODE region (ENm005). (a) Domains were determined by simultaneous HMM segmentation of replication time (TR50; black), bulk RNA transcription (blue), H3K27me3 (purple), H3ac (orange), DHS density (green), and RFBR density (light blue) measured continuously across the 1.6-Mb ENm005. All data were generated using HeLa cells. The histone, RNA, DHS, and RFBR signals are wavelet-smoothed to an approximately 60 kb scale (see Supplementary Information section S4.7). The HMM segmentation is shown as the blocks labeled “active” and “repressed” and the structure of GENCODE genes (not used in the training) is shown at the end. (b) Enrichment or depletion of annotated sequence features (GENCODE TSSs, CpG islands, different types of repetitive elements, and non-exonic CSs) in active versus repressed domains. Note the marked enrichment of TSSs, CpG islands, and Alus in active domains, and the enrichment of LINE and LTRs in repressed domains. Nature. 2007 June 14; 447(7146): 799–816 What are regulatory elements regulating? An introduction to regulatory biology Outline• Cell Differentiation – Genomic Equivalence • Cell Regulation – Overview – Effects • Central Dogma of Molecular Biology – Regulation and Modification • Trans-Regulatory Systems • Cis-Regulatory Systems • The Otx story How Cells Regulate • Control of Transcription – Chromosome packing • “Packed” DNA cannot be transcribed, therefore the gene is not expressed • Unmodified eukaryotic chromatin is non- permissive in nature Transcription factor proteins bind to DNA in a sequence specific manner Cell Comparison • Smooth Muscle – Contraction of blood vessels and digestive tract • Neuron – Communication and sensory perception The Central Dogma
Transcriptional Post-Transcriptional
Regulation Modification
The Central Dogma
{ on > RNA) —' -Protelfi)
Translational
Transcriptional Post-Transcriptional Regulation
Regulation Modification
The Central Dogma
{ on Transcription RNA) Translation Protein)
Translational
Transcriptional Post-Transcriptional Regulation —_Post-Translational
Regulation Modification Modification
Trans-Regulatory Systems • Transcription factors can be activators/repressors • Transcription factors are modular proteins – Distinct domains for each function • Types of domains – DNA Binding – Protein Binding – Trans-activating DNA Binding Domains • These domains are the portion of the protein that actually interact with the DNA molecule. • Zinc Finger – attaches to a groove in the DNA Protein Binding Domains • Co-activators or co- repressors can attach to transcription factors to produce a complex and mediate activity. Allopatric speciation: populations are separated by a barrier. After some time, even if the barrier is removed the two populations can no longer form hybrids (now different species) Sympatric speciation: the population shares the environment, mate selection effectively separates gene pools Gene duplication
diagram:
The three bands are 4 4
duplicated.
Duplicated area
Before
duplication
After
duplication
Useful links • ENCODE project overview http://www.genome.gov/10005107 • ENCODE project at UCSC http://genome.ucsc.edu/ENCODE/ • ENCODE project at ENSEMBL http://www.ensembl.org/Homo_sapiens/encode.html • Mammalian Gene Collection (MGC) http://mgc.nci.nih.gov/ • Family Relations http://family.caltech.edu/ http://ged.msu.edu/index.html • Wold Lab Software http://woldlab.caltech.edu/html/software Other sites http://www.behav.org/GENE/Phylo/pract_tree/!!!course_outline.htm • Nature. 2007 June 14; 447(7146): 799–816. • PMCID: PMC2212820 • NIHMSID: NIHMS27513 • Copyright notice and Disclaimer • Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project • The ENCODE Project Consortium