Download Genomics, Phylogenetics, Machine Learning, and Artificial Neural Networks - Prof. Drena Le and more Study notes Bioinformatics in PDF only on Docsity! #33 - Genomics 11/09/07 BCB 444/544 Fall 07 Dobbs 1 1BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 BCB 444/544 Lecture 33 Genomics #33_Nov09 2BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 √ Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML • Chp 11 - pp 142 – 169 √ Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33 Functional and Comparative Genomics •Chp 17 and Chp 18 Required Reading (before lecture) 3BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Assignments & Announcements Fri Nov 9 - HW#6 (will be posted this weekend) HW#6 - More fun with Machine Learning!! Due: Fri Nov 16 (or sometime before Mon Nov 26) 4BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Nov 7 Wed - BBMB Seminar 4:10 in 1414 MBB • Sharon Roth Dent MD Anderson Cancer Center • Role of chromatin and chromatin modifying proteins in regulating gene expression • Nov 8 Thurs - BBMB Seminar 4:10 in 1414 MBB • Jianzhi George Zhang U. Michigan • Evolution of new functions for proteins • Nov 9 Fri - BCB Faculty Seminar 2:10 in 102 SciI • Amy Andreotti ISU • T cell signaling: insights from protein NMR spectroscopy 5BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs • Distance-Based Methods • Character-Based Methods • Phylogenetic Tree Evaluation • Phylogenetic Programs 6BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Machine Learning • What is learning? • What is machine learning? • Learning algorithms • Machine learning applied to bioinformatics and computational biology • Some slides adapted from Dr. Vasant Honavar and Dr. Byron Olson #33 - Genomics 11/09/07 BCB 444/544 Fall 07 Dobbs 2 7BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Examples of Machine Learning Algorithms • Naïve Bayes (NB) • Bayes Theorem • Neural network (NN) or Artificial Neural Net (ANN) • Perceptrons • Support Vector Machine (SVM) • Kernel functions Lab - WEKA: Decision Trees (DT), NB, SVM 8BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 An Application: Predicting RNA Binding Sites in Proteins • Problem: Given an amino acid sequence, classify each residue as RNA binding or non-RNA binding • Input to the classifier is a string of amino acid identities • Output from the classifier is a class label, either binding or not 9BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Bayes Theorem Applied to RNA Binding Site Prediction )( )1|()1( )|1( xXP cxXPcP xXcP = === === )( )0|()0( )|0( xXP cxXPcP xXcP = === === ) ( )| ()( ) |( seqaaP bindingseqaaPbindingP seqaabindingP = 10BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Naïve Bayes for Binary Classification !" == == )|0( )|1( xXcP xXcPAssign c = 1 if Otherwise, assign c = 0 11BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Example: Is ARG 6 RNA-binding or not? ARG 6 T S K K K R Q R G S R p(X1 = T | c = 1) p(X2 = S | c = 1) … p(X1 = T | c = 0) p(X2 = S | c = 0) … ≥ θ 12BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Predicted vs Actual RNA Binding for Ribosomal protein L15 (PDB ID 1JJ2:K) ActualPredicted #33 - Genomics 11/09/07 BCB 444/544 Fall 07 Dobbs 5 25BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Genomics - for excellent overview lectures, see these posted by NHGRI & Pevsner: 1- Genomic sequencing Mapping and Sequencing CTGA2005Lecture1.pdf Eric Green, NHGRI 2- Human genome project The Human Genome 2005-10-19_ch17.pdf Jonathan Pevsner, Kennedy Krieger Institute 3- SNPs Studying Genetic Variation II: Computational Techniques Jim Mullikin, NHGRI TGA2005Lecture13.pdf 4- Comparative Genomics Comparative Sequence Analysis Elliott Margulies, NHGRI CTGA2005Lecture8.pdf 26BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 1- Genomic sequencing Many thanks to: Eric Green, NHGRI for the following slides extracted from his lecture on: Mapping and Sequencing CTGA2005Lecture1.pdf 27BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Genomic Sequencing - Brief Review 28BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Comparison of Sequenced Genome Sizes 29BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Comparison of Genetic & Physical Maps 30BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 STSs: Provide common markers for "linking" genetic & physical maps #33 - Genomics 11/09/07 BCB 444/544 Fall 07 Dobbs 6 31BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 With complete genomes (now), why bother to generate physical maps? 32BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Genomic sequencing requires assembly of sequences obtained from cloned DNA 33BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07 Human Genome Sequencing Two approaches: • Public (government) - International Consortium (6 countries, NIH-funded in US) • "Hierarchical" cloning & BAC-by-BAC sequencing • Map-based assembly • Private (industry) - Celera (Craig Venter) • Whole genome random "shotgun" sequencing • Computational assembly (took advantage of public maps & sequences,too) Guess which human genome Celera sequenced? 34BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 NIH: "Hierarchical" BAC-by-BAC Sequencing 35BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 "Hierarchical" Subcloning Strategy 36BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Celera: Whole-Genome "Shotgun" Sequencing #33 - Genomics 11/09/07 BCB 444/544 Fall 07 Dobbs 7 37BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 "Shotgun" Sequencing Stategy 38BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Either Strategy: Sequence "Finishing" = Hardest part !! 39BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Advances in DNA Sequencing Technology 40BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Sequencing Method #1: Gilbert-Maxim "Chemical Degradation" 41BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Sequencing Method #2: Sanger "Di-deoxy Chain Termination" 42BCB 444/544 F07 ISU Dobbs#33 - Genomics 11/09/07E Green 2005 Automated Sequencing for Genome Projects: Sanger method - with improvements Another “recent” improvement: rapid & high resolution separation of fragments in capillaries instead of gels (E Yeung,Ames Lab, ISU)