Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Protein 2' Structure Prediction using Neural Networks and SVMs in Bioinformatics - Prof. D, Study notes of Bioinformatics

A lecture note from a bioinformatics course at iowa state university (isu) in fall 2006. The lecture focuses on the use of neural networks (nns) and support vector machines (svms) for predicting protein secondary structures. References to research articles, seminar presentations, and assignments related to the topic.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-n9r-1
koofers-user-n9r-1 🇺🇸

10 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download Protein 2' Structure Prediction using Neural Networks and SVMs in Bioinformatics - Prof. D and more Study notes Bioinformatics in PDF only on Docsity! #32 - SVMs & NNs & Protein 2' Structure Prediction 11/6/06 BCB 444/544 Fall 06 Terribilini 1 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 1 BCB 444/544 - Introduction to Bioinformatics Lecture 32 NNs & SVMs Secondary Structure Prediction #32_Nov8 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 2 Mon Nov 6 • Sue Lamont (An Sci, ISU) Integrated genomic approaches to enhance host resistance to food-safety pathogens IG Faculty Seminar 12:10 PM in 101 Ind Ed II Thurs Nov 9 • Sean Rice (Biol Sci, Texas Tech) Constructing an exact and universal evolutionary theory Applied Math/EEOB Seminar 3:45 in 210 Bessey Fri Nov 10 • Surya Mallapragada (Chem & Biol Eng, ISU) Micropatterned Polymer Substrates for Peripheral Nerve Regeneration and Control of Neural Stem Cell Growth and Differentiation BCB Faculty Seminar 2:10 in Lago W142 Thurs Nov 16 • Hassane Mchauourab (Center for Structural Biology, Vanderbilt) Structural dynamics of multidrug transporters Baker Center Seminar 2:10 PM in Howe Hall Auditorium Seminars in Bioinformatics/Genomics 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 3 Assignments: Reading This Week Mon Nov 6 Review: Protein Structure Prediction Ginalski et al (2005) Nucleic Acids Res.33:1874 doi:10.1093/nar/gki327 Wed Nov 8 1) Review: SVMs in Bioinformatics Yang 2004 Briefings in Bioinformatics 5:328 doi:10.1093/bib/5.4.328 2) SVMs http://en.wikipedia.org/wiki/Support_Vector_Machine 3) ANNs http://en.wikipedia.org/wiki/Artificial_neural_network Thurs Nov 9 Lab 10: Protein Structure Prediction Fri Nov 10 Chp 8.1 - 8.4 Proteomics (Previously assigned) 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 4 Assignments: Due this week BCB 544 Only: Correction: 544Extra#2: Due at Noon, Mon Nov 13 Teams: Must meet with us this week Provirus Cytoplasm Nucleus Macromolecular interactions mediated by the Rev protein in lentiviruses (HIV & EIAV) pre-mRNA AAAA (protein-protein) (protein-protein) (protein-protein) NUCLEAR EXPORT AAAARevRevRevRevNUCLEAR IMPORT SpliceosomeSpliceosome AAAA RevRev MULTIMERIZATIONAAAARevRev RNA BINDINGRevRev (protein-RNA) Susan Carpenter Late: Structural Proteins Progeny RNA Tat RevRev Early: Regulatory Proteins • Computationally model structures of lentiviral Rev proteins - using threading algorithm (with Ho et al) • Predict critical residues for RNA-binding, protein interaction - using machine learning algorithms (with Honavar et al ) • Test model and predictions - using genetic/biochemical approaches (with Carpenter & Culver) - using biophysical approaches (with Andreotti & Yu groups) Initially: focus on EIAV Rev & RRE Hypothesis: Rev proteins share structural features critical for function Approach: #32 - SVMs & NNs & Protein 2' Structure Prediction 11/6/06 BCB 444/544 Fall 06 Terribilini 2 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 7 EIAV HIVFIV SIV Dimer HIV Dimer Comparison of Predicted Rev Structures Yungok Ihm Predicting the RNA-binding domain of EIAV Rev 61 71 81 91 ARRHLGPGPT QHTPSRRDRW IREQILQAEV LQERLEWRIR … ++ +++++++ ++++++++++ + + KRRRK RRDRW 121 131 141 151 161 HFREDQRGDF SAWGDYQQAQ ERRWGEQSSP RVLRPGDSKRRRKHL + ++++ ++ +++ +++++++++++++++ Michael Terribilini Yungok Ihm KRRRK RRDRW Summary: Predictions vs Experiments 131 141 151 161 QRGDFSAWGDYQQAQERRWGEQSSPRVLRPGDSKRRRKHL ++++++++++ ++ +++ ++++++ + ++++++++++++++++++++ 61 71 81 91 ARRHLGPGPTQHTPSRRDRWIREQILQAEVLQERLEWRI +++++++++++++++ ++++++++++++++++ 41 51 GPLESDQWCRVLRQSLPEEKISSQTCI ++++++++ ++ Lee et al (2006) J Virol 80:3844 Terribilini et al (2006) PSB 11:415 RRDRW ERLE KRRRK NES 57 125 145 16531 FOLD NLS/RBMRBM 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 10 Summary Computational & wet lab approaches revealed that: ◊ EIAV Rev has a bipartite RNA binding domain ◊ Two Arg-rich RBMs are critical – RRDRW in central region – KRRRK at C-terminus, overlapping the NLS • Based on computational modeling, the RBMs are in close proximity within the 3-D structure of protein • Lentiviral Revs & RRE binding sites may be more similar in structure than has been appreciated • Future: Identify "predictive rules" for protein-RNA recognition &… Lee et al (2006) J Virol 80:3844 Terribilini et al (2006) PSB 11:415 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 11 Secondary Structure Prediction • Given a protein sequence a1a2…aN, secondary structure prediction aims at defining the state of each amino acid ai as being either H (helix), E (extended=strand), or O (other) (Some methods have 4 states: H, E, T for turns, and O for other). • The quality of secondary structure prediction is measured with a “3-state accuracy” score, or Q3. Q3 is the percent of residues that match “reality” (X-ray structure). 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 12 Quality of Secondary Structure Prediction Determine Secondary Structure positions in known protein structures using DSSP or STRIDE: 1. Kabsch and Sander. Dictionary of Secondary Structure in Proteins: pattern recognition of hydrogen-bonded and geometrical features. Biopolymer 22: 2571-2637 (1983) (DSSP) 2. Frischman and Argos. Knowledge-based secondary structure assignments. Proteins, 23:566-571 (1995) (STRIDE) #32 - SVMs & NNs & Protein 2' Structure Prediction 11/6/06 BCB 444/544 Fall 06 Terribilini 5 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 25 Accuracy • Both Chou and Fasman and GOR have been assessed and their accuracy is estimated to be Q3=60-65%. (initially, higher scores were reported, but the experiments set to measure Q3 were flawed, as the test cases included proteins used to derive the propensities!) 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 26 Neural networks The most successful methods for predicting secondary structure are based on neural networks. The overall idea is that neural networks can be trained to recognize amino acid patterns in known secondary structure units, and to use these patterns to distinguish between the different types of secondary structure. Neural networks classify “input vectors” or “examples” into categories (2 or more). They are loosely based on biological neurons. 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 27 Biological Neurons Dendrites receive inputs, Axon gives output Image from Christos Stergiou and Dimitrios Siganos http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 28 Artificial Neuron – “Perceptron” Image from Christos Stergiou and Dimitrios Siganos http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 29 The perceptron X1 X2 XN w1 w2 wN T ∑ = = N i ii WXS 1 ⎩ ⎨ ⎧ < > TS TS 0 1 Input Threshold Unit Output The perceptron classifies the input vector X into two categories. If the weights and threshold T are not known in advance, the perceptron must be trained. Ideally, the perceptron must be trained to return the correct answer on all training examples, and perform well on examples it has never seen. The training set must contain both type of data (i.e. with “1” and “0” output). 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 30 The perceptron Notes: - The input is a vector X and the weights can be stored in another vector W. - the perceptron computes the dot product S = X.W - the output F is a function of S: it is often set discrete (i.e. 1 or 0), in which case the function is the step function. For continuous output, often use a sigmoid: Xe XF −+ = 1 1)( 0 1/2 1 0 #32 - SVMs & NNs & Protein 2' Structure Prediction 11/6/06 BCB 444/544 Fall 06 Terribilini 6 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 31 The perceptron Training a perceptron: Find the weights W that minimizes the error function: ( )∑ = −= P i ii XtWXFE 1 2)().( P: number of training data Xi: training vectors F(W.Xi): output of the perceptron t(Xi) : target value for Xi Use steepest descent: - compute gradient: - update weight vector: - iterate ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ =∇ Nw E w E w E w EE ,...,,, 321 EWW oldnew ∇−= ε (e: learning rate) 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 32 Biological Neural Network Image from http://en.wikipedia.org/wiki/Biological_neural_network 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 33 Artificial Neural Network A complete neural network is a set of perceptrons interconnected such that the outputs of some units becomes the inputs of other units. Many topologies are possible! Neural networks are trained just like perceptron, by minimizing an error function: ( )∑ = −= Ndata i ii XtXNNE 1 2)()( 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 34 Neural networks and Secondary Structure prediction Experience from Chou and Fasman and GOR has shown that: ◊ In predicting the conformation of a residue, it is important to consider a window around it. ◊ Helices and strands occur in stretches ◊ It is important to consider multiple sequences 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 35 PHD: Secondary structure prediction using NN 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 36 13x20=260 values PHD: Input For each residue, consider a window of size 13: #32 - SVMs & NNs & Protein 2' Structure Prediction 11/6/06 BCB 444/544 Fall 06 Terribilini 7 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 37 PHD: Network 1 Sequence Structure 13x20 values 3 values Pα(i) Pβ(i) Pc(i) Network1 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 38 PHD: Network 2 Structure Structure 3 values Pα(i) Pβ(i) Pc(i) 3 values Pα(i) Pβ(i) Pc(i) 17x3=51 values For each residue, consider a window of size 17: Network2 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 39 PHD • Sequence-Structure network: for each amino acid aj, a window of 13 residues aj-6…aj…aj+6 is considered. The corresponding rows of the sequence profile are fed into the neural network, and the output is 3 probabilities for aj: P(aj,alpha), P(aj, beta) and P(aj,other) • Structure-Structure network: For each aj, PHD considers now a window of 17 residues; the probabilities P(ak,alpha), P(ak,beta) and P(ak,other) for k in [j-8,j+8] are fed into the second layer neural network, which again produces probabilities that residue aj is in each of the 3 possible conformation • Jury system: PHD has trained several neural networks with different training sets; all neural networks are applied to the test sequence, and results are averaged • Prediction: For each position, the secondary structure with the highest average score is output as the prediction 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 40 PSIPRED xe−+1 1 Convert to [0-1] Using: Add one value per row to indicate if Nter of Cter Jones. Protein secondary structure prediction based on position specific scoring matrices. J. Mol. Biol. 292: 195-202 (1999) 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 41 Performances (monitored at CASP) Jones80282000CASP4 Jones75181998CASP3 Rost 70241996CASP2 Rost and Sander 6361994CASP1 Group<Q3># of TargetsYEARCASP 11/8/06BCB 444/544 F06 ISU Terribilini #32 - NNs & SVMs / Protein 2' Structure Prediction 42 -Available servers: - JPRED : http://www.compbio.dundee.ac.uk/~www-jpred/ - PHD: http://cubic.bioc.columbia.edu/predictprotein/ - PSIPRED: http://bioinf.cs.ucl.ac.uk/psipred/ - NNPREDICT: http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html - Chou and Fassman: http://fasta.bioch.virginia.edu/fasta_www/chofas.htm Secondary Structure Prediction -Interesting paper: - Rost and Eyrich. EVA: Large-scale analysis of secondary structure prediction. Proteins 5:192-199 (2001)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved