Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

CS5263 Bioinformatics: Guest Lecture on Regulatory Biology and the ENCODE Project, Study Guides, Projects, Research of Computer Science

University of Texas - San Antonio Computer Science

A lecture slides from a cs5263 bioinformatics class, focusing on regulatory biology and the encode project. Topics covered include sequence databases, regulatory biology basics, and computational analysis of cis-regulatory regions using tools like cartwheel and ensembl. The encode project is discussed in detail, with an emphasis on its goals, challenges, and current status.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 07/30/2009

koofers-user-lv8 🇺🇸

10 documents

1 / 66

Partial preview of the text

Download CS5263 Bioinformatics: Guest Lecture on Regulatory Biology and the ENCODE Project and more Study Guides, Projects, Research Computer Science in PDF only on Docsity! CS5263 Bioinformatics Guest Lectures: The ENCODE Project and Phylogenetics Computational Analysis of cis-regulatory regions Carolina Livi Computational Biology Initiative (livi@uthscsa.edu) Nov. 4th, 2008 Regulatory Biology From gene in genome to protein in proteome Similar process in different compartments A) EUCARYOTES primary ANA transcript | transcription ADDS CAP AND POLY(A) TAIL es J rva seuicins mans Qa AAA fo (8) PROCARYOTES — http: / /www.accessexcellence.org/ AB/GG/steps_to_Prot.html Eukaryotic transcriptional initiation involves many general factors, as well as specific enhancers. Sgtivator protein BEES SS TATA box start of BINDING OF transcription GENERAL TRANSCRIPTION FACTORS, RNA POLYMERASE, MEDIATOR, CHROMATIN REMODELING COMPLEXES, AND HISTONE ACETYLASES Lo enhancer (binding site for activator protein) et chromatin remodeling complex > s+ | = | histone acetylase Figure from MBOC Where to get sequences in FASTA format? HomoloGene. i} HomoloGene ee | umes | Prewewindex History | Cliptoard | Details | Display “Sammey =) Show! 20 = )[sendve 2) r } cx“, All: 70 Fungi: 0 Mammais: 12 |% aay tems 1 = 20 of 70 (ree) [7] of 4 Newt Quory Tips 113 HomeioGe = al Wishes | pone Pi ali U1: HomoioGene:74903. Cee conserved in Annrivta Downoad, Lines | ami Hzagions PDGFE panelel dered growth factor beta po'yze... P.bogdytes LOcdsSa44 sirmilar to Plaletet-derived growth factor... Clamilians POGFE Platelet-derived growth factor beta pone... Monusoulas Pdgfa Platelet derivad growth factor, B polypapt... frp Arovegous Pacts pistelet derived growth factor, B polypapt... eee) percent Cgullus PDGFE peielet-derved growih factor bets poyze... jormooGene. 32065. Gere conserved in Amnniota Sownoud, Linas ee) H.sapions PDGEA Pistclet derived growth factor alpha polyp... | Clsrrilisnis POGFA Pisielet-derivad growth factor alphe polyp... M.mnugeulss Pagts pistelet derived growth factor, apna FAirovegour Padgfa Platelet deevad growth factor, apna G gallus POGEA Pistalat derivad growth factor alphs pelyp... 93: HaroioGane-9423. Gena conservad in Amnicta Download, Links | H.sanians POGFC Platelet derivad growth tactor C | P.tegiedytes POGEC Pan trogiodytas POGFC gare Charilianis LOCKE2886 Similar to gislelet-derived growth facier ... M.museulas Pagle Pisielel-derived growth factor. C polyoupl... Finoveg ous Pagic platelet-derivad growth factor, C polypapt... G gallus scosF Platelet deevvad growth factor C jamoinGanect*B76. Gane canservad in Amricta Dawniaad, Links | Hsapians POGFD Platelet derivad growth factor D P.trogiedytes. PDGFD Platalat derivad growth factor D C.familianz LOCA7EE80 similar 10 platelet devived gromth foster... M.musculas Payl pistelel-derivad growih fuctor. D poypept... C.gallue PDorD pistelet derivad growth tactor D y jomooGeane3* 361. Gere conserve in Bilateria Dowoad, Like > HomoloGene in a convenient place to get sequences from different species, but is not accurate phylogenetically!!! Select sequence type (mRNA, protein, or genomic) and species. If genomic, specify length 5’ and 3’ of exons. Ensembl • www.ensembl.org • Good to look for genomic information • Use to annotate 5’ and 3’ UTRs – Upload sequences by “typing in text” – Copy and paste UTR sequences from exon info • Will setup 3 way analysis of mRNA • Look for post-transcriptional regulatory elements www.ensembl.org Ensembl Genome Browser + | EXinttp://www.ensembl.org/index.htm! OGS9 71: Google ©! Ensemb1 rn CCGB: Miller Lab Sea Urchin G...ome Project RSA-tools -...arch manual Plant miRNA ...rget Finder Database Lin...r Biologists RegRNA: A R...ents Finder AAAI Digital..ular Biology Search all Ensembl: { Anything ez Use EnsembI to. |What's New in Ensemb! 41 2 ] _ Mammalian genomes | | Other species => Runa BLAST search » New species - Medaka (Oryzias latipes) Homo sapiens Gallus gallus = Search Ensembl database NcBI36 | Vega wast 1 | UPDATED! pro! + Data mining [BioMart] = Display your own data = Export data = Download data Other Links = Help & Documentation = What's New = Home = Sitemap €f View previous release of page in Archive! €f Stable Archive! link for this page SES i Sa) ra a) New chimp assembly and genebuild (Pan troglodytes) New genebuild on zebrafish assembly Zv6 (Danio rerio) » Import of WormBase 160 (Caenorhabditis elegans) » New animated tutorials (all species) More news... Ensemb/ is a joint project between EMBL. - EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes. Ensembl is primarily funded by the Wellcome Trust. This site provides free access to all the data and software from the Ensembl project. Click on a species name to browse the data. ‘Access to all the data produced by the project, and to the software used to analyse and present it, is provided free and without constraints. Some data and software may be subject to third-party constraints. For all enquiries, please contact the Ensembl HelpDesk (helpdesk@ensembl.ora). » pre! Ensembl - previews of upcoming assemblies chive! - past releases of Ensembl VEGA - Vertebrate Genome Annotation EBI Genome Reviews database - mainly archaea and bacteri » Trace server Other sites using Ensembl software. UPDATED! Pantro 2.1 Macaca mulatta ea) MMUL 1.0 Otolemur garettii NEW! SUSHBABY! € Pan troglodyte: Mus musculus NcBim3é | Vega > Rattus norvegicus RSC 3.4 Oryctolagus cuniculus ABT Canis familiaris BY conram 1.01 Vega | pret Felis catus caT Bos taurus i” sau2.0! UPDATED! pre! Sus scrofa (clone status map) Sorex araneus NEW! sorAvai Erinaceus europaeus NEW! erEurl Myotis lugitugus NEW! NiCROBATI lly Dasypus novemcinctus ARMA Xenopus tropicalis y64.1 Danio rerio UPDATED! 2v 6 | Vega Takifugu rubripes FuGu 4.0 @ Tetraodon nigroviridis TETRAOOON 7 Gasterosteus aculeatus BROAD S1 Oryzias latipes NEW! MEDAKA 1 Ciona intestinalis sc Ciona savignyi csav2.0 Drosophila melanogaster a0GP 4 Anopheles gambiae ZYSE Aaames | Aedes aegypti Aaegl 1 Caenorhabditis elegans UPDATED! WS 150 Saccharomyces cerevisiae sco1 Chicken Genome Page €! Ensembt Chicken .9, Contia196.12, ENSGALG00000001136 Use EnsembI to... Explore the Gallus gallus genome Click on a chromosome for a closer view ‘Assembly This site presents an annotation of the first draft chicken genome assembly, March 2004 [NIH press release]. The chicken genome sequence was determined by whole genome shotgun at the Genome Sequencing Center at Washington University, St Louis. The analysis of the chicken sequence involves an international group of scientists including individuals from the US, UK, Europe and China. Annotation The gene set for Chicken was built using a modified version of the standard Ensembl genebuild pipeline. The majority of gene models are based on genewise alignments of proteins from other species. Most of the proteins being aligned were from species genetically distant to chicken. To improve the accuracy of models generated from these proteins, the Genewise alignments were made to stretches of genomic sequence rather than to 'minisegs’. The gene models were assessed by generating sets of potential orthologs to genes from other mammalian species. Potentially missing predictions and partial gene predictions were identified by examining the ‘orthologs, and exonerate used to build new gene models for these based on the human ortholog peptide sequence. Other Links Warning This release of G. gallus GGAW contains some sequence that is not specific to chromosome W. A large portion of the sequence assigned to W was done so based on the presence of W-specific repeats. These repeats have now been shown to be not specific to chromosome W. Thus, the only portions of GGAW which should currently be considered specific to W are: Caenorhabditis > = 195831 elegans » Chr W, bases 4895452 - 4916845 » All of Chr W_random| WormBase 160 FPR ce 15806 8 Ie | © Exon Information No. Exon/Iintron 5" upstream sequence 1 ENSGALE0000013174: Intron 1-2 2 ENSGALE00000131744 Intron 2-3 3 ENSGALE00000131746 Intron 3-4 4 ENSGALE00000131747 Intron 4-5 5 _ ENSGALE 131742 Intron 5-6 6 ENSGALE00000131741 Intron 6-7 7 ENSGALE00000131743 3' downstream sequence El Supporting Evidence Chr Strand Start 1 End StartPhase EndPhase Length Sequence tagcagagecececagcectgctcecegcacececeggtactgaggegat 47,436,158 47,436,314 157 GGGGACAGGCAGCCTGCTTGCTGCCTGAGGCCGGCTECCACTGCTTCCCTCCCGGGECTC ‘TCCGTCCATGTGCCCGCAGCCGGCAAGGCTTGAACCCGGCATGAATTICGGCGTGGTCTT CGECGTCATECTCTCCCTECCCCTGGCCCGCCTGGAG 47,436,315 47,440,898 4,584 gtgagtcccatagaggggaggccgg.......... ttttttttttcttgteatctggeag 47,440,899 47,440,995 97 GGGGACCCCATACCCGAAGATATTTATGAGATTTTGGGTGGCAGCTCCGTACGCTCCATC AGTGACCTCCAGCGTGCCCTGCGGATAGACTCCGTAG 47,440,996 47,443,763 2,768 gtaaatctcctcttcaccaaacact......++ ecagetctctecttccctttgcag 47,443,764 47,443,847 84 AGGAGGACAGCTCTAGCCTGGACCTGAATGCAACTCAGCCCAGCCAAAACCATGTGTCCC ‘TGTCTCGAGAGAGGCGAAGCCTTG 47,443,848 47,444,070 223 gtgagtgtggggtgctgcacctcgt......-..- teactectectcteggectctgcag 47,444,071 47,444,276 206 ATGCTCTGGCAGCAGCAGAGCCAGCTGTCCTCGCCGAGTGCAAGACACGGACGGTGGTCT ‘TIGAGATCTCCCGTGACATGGTGGACAGCACCAATGCCAACTICGTGGTGTGGCCACCCT \IGIGCAGTGCCGCCCCA ‘TTGCAGATTCGCGTCCGGCACGTCCAG 47,444,277 47,445,969 1,693 gtaaggcaggcatagececctaaac........++ tttgggcgcatctctcttcaaatag 47,445,970 47,446,111 142 GTGAACAAGATTGAGTTTTTCCAGAGGAAGCCAATATTCAAAAAAGTCATCGTGCCTITG GAGGACCACGTGCAGTGCCGGTGCGAAGCGGTGTCCCGGCCGCCACCCAGGAGCAACCGA CCGGCATCCCGTGAGCAGAGAC 47,446,112 47,446,409 298 gtaaggacctcagectttgtagtgc.........- tatgctetecectetettttggcag 47,446,410 47,446,579 170 GCTTGTCGCCGTCATTCACCACAGCCGCCATCTCCCAGAGGAAGCGGGTACGCCGGCCGC ‘CAGCACAGAAGAGAAAACACAAGAAATACAAGCATGTCAACGATAAGAAAGTGCTGAAAG ‘AAATCCTCATAGCATAGAAGTGCTGGCAGGGGAGAGAGAGCACAAGGCAG 47,446,580 47,447,513 934 gtaacagcaagetgttttccectgg. -gtgcctgtttttgtttgeectccag 47,447,514 47,447,712 199 GTTTATTTAATATATTIGCTGTATTGCCCCCATGGGGTCCTTGGAGTGATAACTITTCCT CTTTGCTGGTCTGCCTCAACGACTGATTCAGGCGGCAAATGGTGCTTCCCTTTCCATCAG ‘TGGACCTTCTCCTACCGAAGCCTCTCCCTTCTTTCATTTATTAACATCTTAAAGTTTTAC AAAAAACAAAAAACCAACC aaaaaaaaaaaaaaaaaagaaaaaaagacaaacacagcttatatatatat.. The supporting evidence below consists of the sequence matches on which the exon predictions were based and are sorted by alignment score. There are a large number of supporting evidence hits for this transcript. Only the top ten 10 hits have been shown. Click to view all 15 supporting evidence hits. Score: NP_se9601.1 31240 espDu9 ceari7 oso potzr Pi2019 as613 aspc0s ‘Ag031025.1 Mio «ME>-00 ME>-o7 M>-0 12 sg sg 5 M>=s0 <=50 NO EVIDENCE 7 .9i146048972)ref|NP_G20601.1| platelet-derived growth factor beta polypeptide (simian sarcoma viral (vsis). ne ee a a P31240.1 POGFB_MOUSE Plateletderived growth factor B chain precursor (PDGF B-chain) (Plateletderived. (Q6D0U9.1 GBDOU_XENLA Pagfb protein (6077.1 POGFE_CANFA Plateletderived growth factor B chain precursor (PDGF E-chain) (Plateletderived, (Q0sn22.1 POGFE_RAT Platletderived growth factor B chain precursor (PDGF B-chain) (Plateletderived P01127.1 POGFE_HUMAN Plateletderived growth factor B chain precursor (PDGF E-chain) (Platsletderived, P12919.1 POGFB_FELCA Plateletderived growth factor B chain precursor (PDGF B-chain) (Platletderived (€20613,1 €20613_FELCA Cis prob oncogene. (260008.1 G80DC08_BRARE Pagfa protin ‘ABD31025.1 Gallus gallus PIGF-B mRNA for plateletderived growth factor B chain, complete cds. Cartwheel/FamilyRelations • http://family.caltech.edu • Software program – Nucleotide sequence alignment – Consensus motif searching – Sequence annotation Mouse X Human PDGFB - PipMaker at 70% Pair View| Det Pat| Tep requence: "More genemi,..""; hotter sequence: “Human genomi..."" I I} ih Mouse X Human PDGFB - PipMaker at 70% Pair View| Det Pat| Tep requence: "More genemi,..""; hotter sequence: “Human genomi..."" I I} ih Figure 2. A "dot-plot" style view of a subregion of the otx comparison (see Figure 1). The top sequence is a zoomed-in view of the otx genomic region from S. purpuratus, as in Figure 1; the region runs from 119.6 kb to 133.0 kb. The side sequence is a zoomed-in view of the orthologous region from L. variegatus, running from 38.5 kb to 51.5 kb. The region surrounding the first exon (in red) of the sp α-otx transcript is selected on the top (S. purpuratus) sequence, and the corresponding TBLASTX matches are highlighted on the left (L. variegatus) sequence in blue. The selection box in the center of the view contains the paircomp matches in this region, showing only 20 bp matches that match at 19/20 or 20/20 (corresponding to a 95% threshold). A closeup view of this region, showing the DNA sequence of the two regions with the corresponding matches, is shown in Figure 3. Figure 3. A closeup view of the paircomp comparison of the genomic sequence surrounding the first exon of otx in S. purpuratus (top sequence) and L. variegatus (bottom sequence). The top half of the closeup view shows orthologous 2 kb genomic regions (126.2 kb – 128.3 kb in the S. purpuratus BAC, 44.4 kb – 46.5 kb in the L. variegatus BAC). Matches of 19/20 or 20/20 bases are drawn in red between the sequences, and the exon matches from Figure 2 are shown in black on the sequence lines. The bottom half of the closeup view shows the part of the sequence selected in blue on the top half of the view. Lines are drawn in black between individual matching bases, and the matching bases are colored in red. Note that both blocks shown match at 19/20 because of the single mismatch in the middle of the blocks. VISTA and PIPMAKER • Phylogenetic Footprinting – Conservation reflects function • Transcription Factor Binding Sites – Motif searching – SELEX • Examples of cis-regulatory regions • Using Vista • Using PipMaker sequence2 mouse:1-241717 Alignment 1 sequence puman +) - 7 Criteria: 70%, 100 bp Regions: 102 X-axis: sequence2 Resolution: 79 Window size: 100 bp . \ i ul | ill fh ull Lf wt | \ | ; ts Li Hatt Hite tis cate 000125094 PREPPPEEE 000102543 000125154 P>PP>o>>> 000102556 000125203 P>PP>o>>> 000102616 oqoo000000 BREPPPEEES 000102669 ooo000000 BPE DDD DD> 000102729 000125254 PPD De 000102789 000125279 SPP PSS >> 000102848 000125339 SPP PSS >> 000102908 000125399 BREPPPEEE 000102968 000125452 P>PP>>>>> 000103027 000125512 P>PP>S>>> qo0000000 000125572 REPRE EEE qo00o0000 GACTATTGGACTTCTCAGTGATGCTTGTGTCCCTCTCAATAGCACATTCCTTATTGCTTG TCTTGCCA-- TGATAAACAGCAGAACACAGCTGACAGGATAGGATG-TAAATAATATG-~-------~- TC LEE TE PELTTT tE TGAAAGAAGGCAGAACACCACCAGCTGGAGAGGTGGGTACATTTTATGCCTACTGTAGCA ATTCCCACTTCTGAGTCTTTTGCTCCC------- TICTICCATTGTTTGCCAGGACAG-- LL TIRE ot LiL Tt Te | TCCCCCTCTICTGAGTCTTTTTITTCECETITCCTTCTIICTITCTTT-------| crrre TITCTUTCTITCTITCTICTGAGTCTTGTAGTTAGCTTCAGCTGTTTTCATACTTGCTTA TGTATTCAGGCTGCCAGCAACTCACAATCCTCCTGCCTGAGCATCTGAGTGCTGGCAGTA eee e ee = - - 5 == = ‘CTCTGGCCTIICTA' TTTEGCTTAGT Ltd UT tt CATGTGGCTGCTTCCCTGTGACTATGTCTCTTATTICCGTIGTITITCTATTICG-TGAGT TTTCTAGAGCCATCCTAGCAATC GTGGACCACTGCTTTCTCCAAGC TGTTAGCTITATCT LET TEE Td TT LEEET TEE TEE TE TE TE Tr | CCAGGCCATTGCTTTCTCTAAGTTTCCTAGAACCATCCTAGTAACCCATTAGCATCATTT CCTGGAGTCCTTGCCAGAATGTAAATACTGTAGTACAGGAATGTGCTTATATCAATATGC LETTE TLE TTT EE Tr TEELTETE TE CCTGGAGGCCTGGCCAGGATGTAGATCCCACAGTACAGGGATGTTCTTACATCAATAAAC TCTGCTTTATTTAGCATTATTTICTCC-TCCCTCCCCAATTCAGCACA------= CTGGGG PCRGCTPTECTTAGCATTATGPTTTCTCTTCCRCCETCCTTTCCCA-ARGCAATCTCGGG ACCCTITACCCAGTTICTCCCACCTTICTGCAGATGTGTATATAAGTTGGGTAGGTTTITTT ACCTGATCCEATGCICTECTAGC=-no--aa-nnnnnnnnnnnnennnnnnnnnennens TCAGAGGAGTCTCTCTCTGTCACCCAGGCTGGAGTGCAGTGGCCCGATCTCGGCTCACTG 000125153 ecdecccce 000102555 000125202 edecccece 000102615 000125253 eidcccece 000102668 ooooo0000 ecdecccce aool02728 ooooo0000 edddcecce 000102788 000125278 ececcccce 000102847 000125338 cceccecce 000102907 000125398 cceccecce 000102967 000125451 ecdecccce 000103026 000125511 eidcccece 000103049 000125571 edcccece oooo000000 000125631 ecdecccce ooooo0d00 ENCODE Project: Junking the Idea of Junk DNA • Many more regions of chromosomes are transcribed than previously thought (genome pervasively transcribed), function remains unknown • Many more transcription initiation sites than previously thought The above statements are obviously interconnected. Annotated and unannotated TxFrags detected in different cell lines. The proportion of different types of transcripts detected in the indicated number of cell lines (from 1/11 at the far left to 11/11 at the far right) is shown. The data for annotated and unannotated TxFrags are indicated separately, and also split into different categories based on GENCODE classification: Exonic, Intergenic (Proximal being within 5 kb of a gene and Distal being otherwise), Intronic (Proximal being within 5 kb of an intron and Distal being otherwise), and matching other ESTs not used in the GENCODE annotation (principally because they were unspliced). The y-axis indicates the percent of tiling array nucleotides present in that class for that number of tissues. Nature. 2007 June 14; 447(7146): 799–816 ENCODE Project: Junking the Idea of Junk DNA • 5% of genome highly conserved • Main assumption is that conservation reflects function • 60% of conserved regions have known function • We o not know the function of 40% of conserved regions so far identified • Some regions not conserved by criteria used for evaluation have known function Higher-order functional domains in the genome. The general concordance of multiple data types is illustrated for an illustrative ENCODE region (ENm005). (a) Domains were determined by simultaneous HMM segmentation of replication time (TR50; black), bulk RNA transcription (blue), H3K27me3 (purple), H3ac (orange), DHS density (green), and RFBR density (light blue) measured continuously across the 1.6-Mb ENm005. All data were generated using HeLa cells. The histone, RNA, DHS, and RFBR signals are wavelet-smoothed to an approximately 60 kb scale (see Supplementary Information section S4.7). The HMM segmentation is shown as the blocks labeled “active” and “repressed” and the structure of GENCODE genes (not used in the training) is shown at the end. (b) Enrichment or depletion of annotated sequence features (GENCODE TSSs, CpG islands, different types of repetitive elements, and non-exonic CSs) in active versus repressed domains. Note the marked enrichment of TSSs, CpG islands, and Alus in active domains, and the enrichment of LINE and LTRs in repressed domains. Nature. 2007 June 14; 447(7146): 799–816 What are regulatory elements regulating? An introduction to regulatory biology Outline• Cell Differentiation – Genomic Equivalence • Cell Regulation – Overview – Effects • Central Dogma of Molecular Biology – Regulation and Modification • Trans-Regulatory Systems • Cis-Regulatory Systems • The Otx story How Cells Regulate • Control of Transcription – Chromosome packing • “Packed” DNA cannot be transcribed, therefore the gene is not expressed • Unmodified eukaryotic chromatin is non- permissive in nature Transcription factor proteins bind to DNA in a sequence specific manner Cell Comparison • Smooth Muscle – Contraction of blood vessels and digestive tract • Neuron – Communication and sensory perception The Central Dogma Transcriptional Post-Transcriptional Regulation Modification The Central Dogma { on > RNA) —' -Protelfi) Translational Transcriptional Post-Transcriptional Regulation Regulation Modification The Central Dogma { on Transcription RNA) Translation Protein) Translational Transcriptional Post-Transcriptional Regulation —_Post-Translational Regulation Modification Modification Trans-Regulatory Systems • Transcription factors can be activators/repressors • Transcription factors are modular proteins – Distinct domains for each function • Types of domains – DNA Binding – Protein Binding – Trans-activating DNA Binding Domains • These domains are the portion of the protein that actually interact with the DNA molecule. • Zinc Finger – attaches to a groove in the DNA Protein Binding Domains • Co-activators or co- repressors can attach to transcription factors to produce a complex and mediate activity. Allopatric speciation: populations are separated by a barrier. After some time, even if the barrier is removed the two populations can no longer form hybrids (now different species) Sympatric speciation: the population shares the environment, mate selection effectively separates gene pools Gene duplication diagram: The three bands are 4 4 duplicated. Duplicated area Before duplication After duplication Useful links • ENCODE project overview http://www.genome.gov/10005107 • ENCODE project at UCSC http://genome.ucsc.edu/ENCODE/ • ENCODE project at ENSEMBL http://www.ensembl.org/Homo_sapiens/encode.html • Mammalian Gene Collection (MGC) http://mgc.nci.nih.gov/ • Family Relations http://family.caltech.edu/ http://ged.msu.edu/index.html • Wold Lab Software http://woldlab.caltech.edu/html/software Other sites http://www.behav.org/GENE/Phylo/pract_tree/!!!course_outline.htm • Nature. 2007 June 14; 447(7146): 799–816. • PMCID: PMC2212820 • NIHMSID: NIHMS27513 • Copyright notice and Disclaimer • Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project • The ENCODE Project Consortium

Documents

questions

CS5263 Bioinformatics: Guest Lecture on Regulatory Biology and the ENCODE Project, Study Guides, Projects, Research of Computer Science

Related documents

Partial preview of the text