Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Gene Prediction: Lecture 26 in BCB 444/544 Fall 07 at ISU by Dobbs - Prof. Drena Leigh Dob, Exams of Bioinformatics

A lecture slides from the bioinformatics and computational biology (bcb) course at iowa state university (isu) in fall 2007. The lecture, titled 'gene prediction,' covers topics such as regulatory element prediction, rna structure prediction, gene finding, and gene prediction software. The slides include figures and diagrams to illustrate concepts, as well as references to further reading.

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-92k-1
koofers-user-92k-1 🇺🇸

5

(2)

10 documents

1 / 7

Toggle sidebar

Related documents


Partial preview of the text

Download Gene Prediction: Lecture 26 in BCB 444/544 Fall 07 at ISU by Dobbs - Prof. Drena Leigh Dob and more Exams Bioinformatics in PDF only on Docsity! #26 - Gene Prediction 10/22/07 BCB 444/544 Fall 07 Dobbs 1 1BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 BCB 444/544 Lecture 26 Gene Prediction #26_Oct22 2BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Mon Oct 22 - Lecture 26 Gene Prediction • Chp 8 - pp 97 - 112 Wed Oct 24 - Lecture 27 (will not be covered on Exam 2) Regulatory Element Prediction • Chp 9 - pp 113 - 126 Thurs Oct 25 - Review Session & Project Planning Fri Oct 26 - EXAM 2 Required Reading (before lecture) 3BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Assignments & Announcements Sun Oct 21 - Study Guide for Exam 2 was posted Mon Oct 22 - HW#4 Due (no "correct" answer to post) Thu Oct 25 - Lab = Optional Review Session for Exam 544 Project Planning/Consult with DD & MT Fri Oct 26 - Exam 2 - Will cover: • Lectures 13-26 (thru Mon Sept 17) • Labs 5-8 • HW# 3 & 4 • All assigned reading: Chps 6 (beginning with HMMs), 7-8, 12-16 Eddy: What is an HMM Ginalski: Practical Lessons… 4BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 BCB 544 "Team" Projects • 544 Extra HW#2 is next step in Team Projects • Write ~ 1 page outline • Schedule meeting with Michael & Drena to discuss topic • Read a few papers • Write a more detailed plan • You may work alone if you prefer • Last week of classes will be devoted to Projects • Written reports due: Mon Dec 3 (no class that day) • Oral presentations (15-20') will be: Wed-Fri Dec 5,6,7 • 1 or 2 teams will present during each class period  See Guidelines for Projects posted online 5BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 BCB 544 Only: New Homework Assignment 544 Extra#2 (posted online Thurs?) No - sorry! sent by email on Sat… Due: PART 1 - ASAP PART 2 - Fri Nov 2 by 5 PM Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas 6BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Oct 25 Thur - BBMB Seminar 4:10 in 1414 MBB • Dave Segal UC Davis Zinc Finger Protein Design • Oct 19 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Guang Song ComS, ISU Probing functional mechanisms by structure-based modeling and simulations #26 - Gene Prediction 10/22/07 BCB 444/544 Fall 07 Dobbs 2 7BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Chp 16 - RNA Structure Prediction SECTION V STRUCTURAL BIOINFORMATICS Xiong: Chp 16 RNA Structure Prediction (Terribilini) • RNA Function • Types of RNA Structures • RNA Secondary Structure Prediction Methods • Ab Initio Approach • Comparative Approach • Performance Evaluation 8BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Fig 6.2 Baxevanis & Ouellette 2005 Covalent & non-covalent bonds in RNA Primary: Covalent bonds Secondary/Tertiary Non-covalent bonds • H-bonds (base-pairing) • Base stacking This is a new slide 9BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 RNA Pseudoknots & Tetraloops http://academic.brooklyn.cuny.edu/chem/z huang/QD/mckay_hr.gif This is a new slide http://www.lbl.gov/Science-Articles/Research- Review/Annual-Reports/1995/images/rna.gif • Often have important regulatory or catalytic functions Pseudoknot Tetraloop 10BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Base Pairing in RNA G-C, A-U, G-U ("wobble") & many variants http://www.fli-leibniz.de/ImgLibDoc/nana/IMAGE_NANA.html#basepairs See: IMB Image Library of Biological Molecules This slide has been changed 11BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 RNA Secondary Structure Prediction Methods Two (three, recently) main types of methods: 1. Ab initio - based on calculating most energetically favorable secondary structure(s) Energy minimization (thermodynamics) 2. Comparative approach - based on comparisons of multiple evolutionarily-related RNA sequences Sequence comparison (co-variation) 3. Combined computational & experimental Use experimental constraints when available This slide has been changed 12BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 RNA Secondary structure prediction - 3 3) Combined experimental & computational • Experiments: Map single-stranded vs double- stranded regions in folded RNA • How? Enzymes: S1 nuclease, T1 RNase Chemicals: kethoxal, DMS, OH• • Software: Mfold Sfold RNAStructure RNAFold RNAlifold This is a new slide Kethoxal modification (mild) (strong) DMS modification (mild) (strong) G 200 240 220 DMS #26 - Gene Prediction 10/22/07 BCB 444/544 Fall 07 Dobbs 5 25BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Gene Prediction in Prokaryotes vs Eukaryotes Prokaryotes • Small genomes 0.5 - 10·106 bp • About 90% of genome is coding • Simple gene structure • Prediction success ~99% Eukaryotes • Large genomes 107 – 1010 bp • Often less than 2% coding • Complicated gene structure (splicing, long exons) • Prediction success 50-95% ATG TAA Promotor Open reading frame (ORF) Start codon Stop codon Promotor 5’ UTR Exons Introns 3’ UTR ATG TAA Splice sites 26BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 DNA "Signals" Used by Gene Finding Algorithms 1. Exploit the regular gene structure ATG—Exon1—Intron1—Exon2—…—ExonN—STOP 2. Recognize “coding bias” CAG-CGA-GAC-TAT-TTA-GAT-AAC-ACA-CAT-GAA-… 3. Recognize splice sites Intron—cAGt—Exon—gGTgag—Intron 4. Model the duration of regions Introns tend to be much longer than exons, in mammals Exons are biased to have a given minimum length 5. Use cross-species comparison Gene structure is conserved in mammals Exons are more similar (~85%) than introns 27BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Computational Gene Finding Approaches • Ab initio methods • Search by signal: find DNA sequences involved in gene expression. • Search by content: Test statistical properties distinguishing coding from non-coding DNA • Similarity based methods • Database search: exploit similarity to proteins, ESTs, and cDNAs • Comparative genomics: exploit aligned genomes • Do other organisms have similar sequence? • Hybrid methods - best 28BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Examples of Gene Prediction Software  Ab initio  Genscan, GeneMark.hmm, Genie, GeneID…  Similarity-based  BLAST, Procrustes…  Hybrids  GeneSeqer, GenomeScan, GenieEST, Twinscan, SGP, ROSETTA, CEM, TBLASTX, SLAM.  BEST? Ab initio - Genescan (according to some assessments) Hybrid - GeneSeqer But depends on organism & specific task Lists of Gene Prediction Software http://www.bioinformaticsonline.org/links/ch_09_t_1.html http://cmgm.stanford.edu/classes/genefind/ 29BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Synthesis & Processing of Eukaryotic mRNA exon 1 exon 2 exon 3intron intron Transcription Splicing (remove introns) Capping & polyadenylation Export to cytoplasm AAAAA 3’5’ 5’ 5’ 5’ 3’ 5’3’ 3’ 3’ 7MeG m 1' transcript (RNA) Mature mRNA DNGene in DNA 30BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 What are cDNAs & ESTs? cDNA libraries are important for determining gene structure & studying regulation of gene expression • Isolate RNA (always from a specific organism, region, and time point) • Convert RNA to complementary DNA • (with reverse transcriptase) • Clone into cDNA vector • Sequence the cDNA inserts • Short cDNAs are called ESTs or Expressed Sequence Tags ESTs are strong evidence for genes • Full-length cDNAs can be difficult to obtain vector insert #26 - Gene Prediction 10/22/07 BCB 444/544 Fall 07 Dobbs 6 31BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 UniGene: Unique genes via ESTs • Find UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGene • UniGene clusters contain many ESTs • UniGene data come from many cDNA libraries. When you look up a gene in UniGene, you can obtain information re: level & tissue distribution of expression 32BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Gene Prediction • Overview of steps & strategies • What sequence signals can be used? • What other types of information can be used? • Algorithms • HMMs, Bayesian models, neural nets • Gene prediction software • 3 major types • many, many programs! 33BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Overview of Gene Prediction Strategies What sequence signals can be used? • Transcription: TF binding sites, promoter, initiation site, terminator, GC islands, etc. • Processing signals: Splice donor/acceptors, polyA signal • Translation: Start (AUG = Met) & stop (UGA,UUA, UAG) ORFs, codon usage What other types of information can be used? • Homology (sequence comparison, BLAST) • cDNAs & ESTs (experimental data, pairwise alignment) 34BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Gene prediction: Eukaryotes vs prokaryotes Gene prediction is easier in microbial genomes Why? Smaller genomes Simpler gene structures Many more sequenced genomes! (for comparative approaches) Many microbial genomes have been fully sequenced & whole-genome "gene structure" and "gene function" annotations are available e.g., GeneMark.hmm TIGR Comprehensive Microbial Resource (CMR) NCBI Microbial Genomes 35BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Predicting Genes - Basic steps: • Obtain genomic sequence • BLAST it! • Perform database similarity search (with EST & cDNA databases, if available) • Translate in all 6 reading frames (i.e., "6-frame translation") • Compare with protein sequence databases • Use Gene Prediction software to locate genes • Analyze regulatory sequences • Refine gene prediction 36BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Predicting Genes - Details: 1. 1st, mask to "remove" repetitive elements (ALUs, etc.) 2. Perform database search on translated DNA (BlastX,TFasta) 3. Use several programs to predict genes (GENSCAN, GeneMark.hmm, GeneSeqer) 4. Search for functional motifs in translated ORFs (Blocks, Motifs, etc.) & in neighboring DNA sequences 5. Repeat #26 - Gene Prediction 10/22/07 BCB 444/544 Fall 07 Dobbs 7 37BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 • Perform pairwise alignment with large gaps in one sequence (due to introns) • Align genomic DNA with cDNA, ESTs, protein sequences • Score semi-conserved sequences at splice junctions • Using Bayesian model or MM • Score coding constraints in translated exons • Using a Bayesian model or MM Spliced Alignment Algorithm Brendel 2005 GeneSeqer - Brendel et al.- ISU http://deepc2.psi.iastate.edu/cgi-bin/gs.cgi Intron GT AG Splice sites Donor Acceptor Brendel et al (2004) Bioinformatics 20: 1157 38BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 Brendel - Spliced Alignment II: Compare with protein probes Genomic DNA Start codon Stop codon Protein Brendel 2005 39BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 • Information Content Ii : I f fi iB B U C A G iB= + ! "2 2 , , , log ( ) • Extent of Splice Signal Window: I I i I ! + 196. " i: ith position in sequence Ī: avg information content over all positions >20 nt from splice site σĪ: avg sample standard deviation of Ī Splice Site Detection Brendel 2005 Do DNA sequences surrounding splice "consensus" sequences contribute to splicing signal? YES 40BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -50 -40 -30 -20 -10 0 10 20 30 40 50 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -50 -40 -30 -20 -10 0 10 20 30 40 50 Human T2_GT Human T2_AG Information content vs position Brendel 2005 Which sequences are exons & which are introns? How can you tell? Brendel et al (2004) Bioinformatics 20: 1157 41BCB 444/544 F07 ISU Dobbs #26 - Gene Prediction 10/22/07 en en+1 in in+1 PΔG PA(n)PΔG (1-PΔG)PD(n+1) (1-PΔG)PD(n+1) (1-PΔG)(1-PD(n+1)) 1-PA(n) PΔG Markov Model for Spliced Alignment Brendel 2005
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved