Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Promoter Prediction: BCB 444/544X Lecture Notes - Prof. Drena Leigh Dobbs, Exams of Bioinformatics

These lecture notes from iowa state university cover the topic of promoter prediction in the context of gene identification. Information on promoter prediction algorithms, software, and the differences between promoter prediction in prokaryotes and eukaryotes. Additionally, the notes discuss the role of transcription factors and their binding sites in eukaryotic gene transcription.

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-3ji
koofers-user-3ji 🇺🇸

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Promoter Prediction: BCB 444/544X Lecture Notes - Prof. Drena Leigh Dobbs and more Exams Bioinformatics in PDF only on Docsity! Promoter Prediction 10/24/05 D Dobbs ISU - BCB 444/544X 1 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 1 10/24/05 Promoter Prediction RNA Structure & Function Prediction 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 2 Announcements Seminar (Mon Oct 24) (several additional seminars listed in email sent to class) 12:10 PM IG Faculty Seminar in 101 Ind Ed II "Laser capture microdissection-facilitated transcriptional profiling of abscission zones in Arabidopsis" Coralie Lashbrook, EEOB http://www.bb.iastate.edu/%7Emarit/GEN691.html Mark your calendars: 1:10 PM Nov 14 Baker Seminar in Howe Hall Auditorium "Discovering transcription factor binding sites" Douglas Brutlag,Dept of Biochemistry & Medicine, Stanford University School of Medicine 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 3 Announcements 544 Semester Projects Thanks to all who sent already! Others: Information needed today! ddobbs@iastate.edu Briefly describe: • Your background & current grad research • Is there a problem related to your research you would like to learn more about & develop as project for this course? or • What would your ‘dream’ project be? 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 4 Announcements Exam 2 - this Friday Posted Online: Exam 2 Study Guide 544 Reading Assignment (2 papers) Office Hours: David Mon 1-2 PM in 209 Atanasoff Drena Tues 10-11AM in 106 MBB Michael - none this week Thurs No Lab - Extra Office Hrs instead: David 1-3 PM in 209 Atanasoff Drena 1-3 PM in 106 MBB 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 5 Announcements • Updated PPTs & PDFs for Gene Prediction lectures (covered on Exam 2) will be posted today (changes are minor) • Is everyone on BCB 444/544 mailing list? Auditors? 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 6 Promoter Prediction & RNA Structure/Function Prediction Mon Quite a few more words re: Gene prediction Promoter prediction Wed RNA structure & function RNA structure prediction 2' & 3' structure prediction miRNA & target prediction Thurs No Lab Fri Exam 2 Promoter Prediction 10/24/05 D Dobbs ISU - BCB 444/544X 2 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 7 Reading Assignment - previous Mount Bioinformatics • Chp 9 Gene Prediction & Regulation • pp 361-401 • Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html * Brown Genomes 2 (NCBI textbooks online) • Sect 9 Overview: Assembly of Transcription Initiation Complex • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.7002 • Sect 9.1-9.3 DNA binding proteins, Transcription initiation • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.section.7016 * NOTEs: Don’t worry about the details!! • See Study Guide for Exam 2 re:Sections covered 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 8 Optional - but very helpful reading: 1) Zhang MQ (2002) Computational prediction of eukaryotic protein- coding genes. Nat Rev Genet 3:698-709 http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html 2) Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html 03489059922 (that's a hint!) Check this out: http://www.phylofoot.org/NRG_testcases/ 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 9 Reading Assignment (for Wed) Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. 327-355 • Ck Errata: http://www.bioinformaticsonline.org/help/errata2.html Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 10 Review last lecture: Gene Prediction (formerly Gene Prediction - 3) • Overview of steps & strategies • Algorithms • Gene prediction software 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 11 Predicting Genes - Basic steps: • Obtain genomic DNA sequence • Translate in all 6 reading frames • Compare with protein sequence database • Also perform database similarity search with EST & cDNA databases, if available • Use gene prediction programs to locate genes • Analyze gene regulatory sequences Note: Several important details missing above: 1. Mask to "remove" repetitive elements (ALUs, etc.) 2. Perform database search on translated DNA (BlastX,TFasta) 3. Use several programs to predict genes (GenScan,GeneMark.hmm) 4. Translate putative ORFs and search for functional motifs (Blocks, Motifs, etc.) & regulatory sequences 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 12 Gene prediction flowchart Fig 5.15 Baxevanis & Ouellette 2005 Promoter Prediction 10/24/05 D Dobbs ISU - BCB 444/544X 5 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 25 en en+1 in in+1 PΔG PA(n)PΔG (1-PΔG)PD(n+1) (1-PΔG)PD(n+1) (1-PΔG)(1-PD(n+1)) 1-PA(n) PΔG Markov Model for Spliced Alignment Brendel 2005 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 26 Evaluation of Splice Site Prediction • Normalized specificity: ! " " # = $ $ + 1 1 Actual True False PP=TP+FP PN=FN+TN AP=TP+FN AN=FP+TN Predicted True False TNFN FPTP Brendel 2005 • Specificity: S TP PP AN PP r p = = ! = ! ! + / 1 1 1 " # # " r AN AP =S TP PP AN PP r p = ! = ! ! + / 1 1 1 " # # " S TP PP AN PP p = = ! = ! ! + / 1 1 1 " # # • Misclassification rates: ! = FN AP ! = FP AN • Sensitivity: S TP APn = = !/ 1 " = CoverageS TP APn = = !/ 1 " 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 27 0.00 0.20 0.40 0.60 0.80 1.00 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 0.00 0.20 0.40 0.60 0.80 1.00 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 σ σ SnSn Human GT site Human AG site 0.00 0.20 0.40 0.60 0.80 1.00 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 0.00 0.20 0.40 0.60 0.80 1.00 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 SnSn A. thaliana AG site A. thaliana GT site σ σ Brendel 2005 Performance?  Note: these are not ROC curves (plots of (1-Sn) vs Sp) • But plots such as these (& ROCs) much better than using "single number" to compare different methods • Both types of plots illustrate trade-off: Sn vs Sp 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 28 Evaluation of Splice Site Prediction Fig 5.11 Baxevanis & Ouellette 2005 What do measures really mean? Sp = 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 29 Careful: different definitions for "Specificity" Actual True False PP=TP+FP PN=FN+TN AP=TP+FN AN=FP+TN Predicted True False TNFN FPTP • Specificity: S TP PP AN PP r p = = ! = ! ! + / 1 1 1 " # # " r AN AP =S TP PP AN PP r p = ! = ! ! + / 1 1 1 " # # " S TP PP AN PP p = = ! = ! ! + / 1 1 1 " # # • Sensitivity: S TP APn = = !/ 1 " = CoverageS TP APn = = !/ 1 " cf. Guig�ó definitions Sn: Sensitivity = TP/(TP+FN) Sp: Specificity = TN/(TN+FP) = Sp- AC: Approximate Coefficient = 0.5 x ((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) - 1 Other measures? Predictive Values, Correlation Coefficient Brendel definitions 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 30 Best measures for comparing different methods? • ROC curves (Receiver Operating Characteristic?!!) http://www.anaesthetist.com/mnm/stats/roc/ "The Magnificent ROC" - has fun applets & quotes: "There is no statistical test, however intuitive and simple, which will not be abused by medical researchers" • Correlation Coefficient (Matthews correlation coefficient (MCC) MCC = 1 for a perfect prediction 0 for a completely random assignment -1 for a "perfectly incorrect" prediction Do not memorize this! Promoter Prediction 10/24/05 D Dobbs ISU - BCB 444/544X 6 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 31 Performance of GeneSeqer vs other methods? • Comparison with ab initio gene prediction (e.g., GENESCAN) • Depends on: • Availability of ESTs • Availability of protein homologs Brendel 2005 Other Performance Evaluations? Guig�ó http://www1.imim.es/courses/SeqAnalysis/GeneIdentification /Evaluation.html 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 32 Target protein alignment score 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0 10 20 30 40 50 60 70 80 90 100 Ex on (S n + Sp ) / 2 GeneSeqer NAP GENSCAN Brendel 2005 GENSCAN - Burge, MIT GeneSeqer vs GENSCAN (Exon prediction) 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 33 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 0 10 20 30 40 50 60 70 80 90 100 Target protein alignment score In tro n (S n + Sp ) / 2 GeneSeqer NAP GENSCAN Brendel 2005 GENSCAN - Burge, MIT GeneSeqer vs GENSCAN (Intron prediction) 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 34 Other Resources Current Protocols in Bioinformatics http://www.4ulr.com/products/currentprotocols/bioinformatics.html Finding Genes 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations 4.2 Using MZEF To Find Internal Coding Exons 4.3 Using GENEID to Identify Genes 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm 4.6 Eukaryotic Gene Prediction Using GeneMark.hmm 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation 4.10 Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 35 New Today: Promoter Prediction • A few more words about Gene prediction • Predicting regulatory regions (focus on promoters) Brief review promoters & enhancers Predicting in eukaryotes vs prokaryotes Introduction to RNA Structure & function 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 36 Predicting Promoters What signals are there? Algorithms Promoter prediction software Promoter Prediction 10/24/05 D Dobbs ISU - BCB 444/544X 7 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 37 What signals are there? Simple ones in prokaryotes BIOS Scientific Publishers Ltd, 1999 Brown Fig 9.17 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 38 Prokaryotic promoters • RNA polymerase complex recognizes promoter sequences located very close to & on 5’ side (“upstream”) of initiation site • RNA polymerase complex binds directly to these. with no requirement for “transcription factors” • Prokaryotic promoter sequences are highly conserved • -10 region • -35 region 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 39 What signals are there? Complex ones in eukaryotes! Fig 9.13 Mount 2004 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 40 Simpler view of complex promoters in eukaryotes: Fig 5.12 Baxevanis & Ouellette 2005 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 41 Eukaryotic genes are transcribed by 3 different RNA polymerases BIOS Scientific Publishers Ltd, 1999 Brown Fig 9.18 Recognize different types of promoters & enhancers: 10/24/05 D Dobbs ISU - BCB 444/544X: Promoter Prediction 42 Eukaryotic promoters & enhancers • Promoters located “relatively” close to initiation site (but can be located within gene, rather than upstream!) • Enhancers also required for regulated transcription (these control expression in specific cell types, developmental stages, in response to environment) • RNA polymerase complexes do not specifically recognize promoter sequences directly • Transcription factors bind first and serve as “landmarks” for recognition by RNA polymerase complexes
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved