Download BCB 444/544 Exam 2: HMMs, Gene & Protein Structure, RNA Secondary Structure - Prof. Drena and more Exams Bioinformatics in PDF only on Docsity! BCB 444/544 Fall 08 Oct 31 Exam 2 p 1 of 7 BCB 444/544 Exam 2 (100 pts) Name_____________________________________ 1. HMM (20 pts TOTAL) Consider the simplified CpG island HMM example discussed in class. The system has 3 states: B denotes the start state In denotes the state when the sequence is in a CpG island Out denotes the state when a the sequence is out of a CpG island The transition probabilities between these states are shown in the diagram. 0.2B Out In0.5 0.5 0.8 0.6 0.4 The emission probabilities are: for state Out, eOut(A) = eOut(C) = eOut(G) = eOut(T) = 0.25 for state In: eIn(A) = eIn(T) = 0.1 eIn(C) = eIn(G) = 0.4 1. What is the most probable sequence of states, starting from state B, to produce the sequence of nucleotides CG? For full credit, you must show your work and fill in the table below. C G B 1 0 0 In 0 = 0.5 * 0.4 = 0.2 = 0.4 * max { 0.125 * 0.6 0.2 * 0.8 } = 0.25 * 0.2 * 0.8 = 0.064 Out 0 = 0.5 * 0.25 = 0.125 = 0.25 * max { 0.125 * 0.4 0.2 * 0.4 } = 0.25 * 0.2 * 0.4 = 0.0125 The most probable sequence of states is: B -> In -> In BCB 444/544 Fall 08 Oct 31 Exam 2 p 2 of 7 What is the total probability of the sequence CG? Show your work and fill in the table below. C G B 1 0 0 In 0 = 0.5 * 0.4 = 0.2 = 0.4 * sum { 0.2 * 0.4 0.125 * 0.2 } = 0.4 * 0.105 = 0.042 Out 0 = 0.5 * 0.25 = 0.125 = 0.25 * sum { 0.2 * 0.6 0.125 * 0.8 } = 0.25 * 0.3 = 0.075 The total probability of the sequence CG is: 0.042 + 0.075 = 0.117 BCB 444/544 Fall 08 Oct 31 Exam 2 p 5 of 7 4. RNA Secondary Structure Prediction (20 points total) Describe 3 general methods for predicting the secondary structure of RNA 1. Ab initio – find the structure of the RNA with the lowest free energy. The free energy calculations essentially search for the conformation that allows for maximal base pairing because base pairing lowers the free energy of the structure, mainly due to stacking interactions between adjacent base pairs but also due to the hydrogen bonds in the base pairs. 2. Comparative – use two or more related RNA sequences to find a common secondary structure. There are two ways to use the related RNAs. First by looking for covariation in the sequences – i.e., the secondary structure is likely to be more highly conserved than the sequence, so if one nucleotide involved in a base pair mutates, it is likely that a compensating mutation in the partner nucleotide will be selected for, preserving the ability to base pair. The second way to use multiple RNAs is to predict a structure for each, then identify the secondary structures in common between the two RNAs. 3. Combined computational and experimental – RNA secondary structures can be determined experimentally using various chemicals and enzymes that selectively act on either single stranded or double stranded nucleotides. Combined approaches for secondary structure prediction allow you to use experimentally determined constraints and only search for predicted secondary structures that fit the available experimental data. BCB 444/544 Fall 08 Oct 31 Exam 2 p 6 of 7 5. Short Answer (10 points total) (2 pts) What is the difference between a profile and a PSSM? The main difference is that profiles allow for gaps and PSSMs do not. (2 pts) What is the difference between a protein motif and a protein domain? Domains are longer and form independent structural or functional regions of a protein. (2 pts) Why is a HMM a more accurate representation of a motif or domain than a regular expression? HMMs are a full probabilistic model for each position in the motif or domain whereas regular expressions condense the information into a string. For example, a regular expression contains things like X for an unknown residue or [S,T] when the residue can be either an S or a T, but an HMM would contain probabilities for all 20 amino acids for the X and the probability of S and the probability of T instead of just S,T. (2 pts) What is meant by covariation in RNA secondary structure prediction? Covariation is when two positions in a set of related RNA sequences vary together to preserve a base pairing relationship. When we see covariation in a multiple sequence alignment of RNA sequences, we can be more confident of predicting that these two nucleotides are base paired to each other. (2 pts) Match the following terms: __B__ Local structural elements, such as an α-helix or β-sheet __D__ Multiple subunits (polypeptide chains) assembled into a single functional unit __C__ The fully “folded” 3-dimensional structure of a single polypeptide chain __A__ The sequence of amino acids in a protein a) Primary (1°) structure b) Secondary (2°) structure c) Tertiary (3°) structure d) Quaternary (4 °) structure BCB 444/544 Fall 08 Oct 31 Exam 2 p 7 of 7 6. Molecular Biology & Bioinformatics Terms (10 pts total) (1pt each) Fill in the box beside each definition with one term or acronym that corresponds to the definition provided. (Some have more than one correct answer). Term Definition 1. Mfold A program for predicting RNA secondary structure 2. Prof A program for predicting protein secondary structure 3. Prosite A database of protein domains and motifs 4. PDB A protein structure database 5. Pymol A program for visualizing protein structures 6. Motif A nucleotide or amino-acid sequence pattern that is often conserved and has, or is conjectured to have, functional significance 7. Pfam A database of protein families 8. GeneSeqer A program for predicting genes in eukaryotes 9. Domain Independent structural or functional unit of a protein 10 X-ray crystallography Experimental method for determining the 3-D structure of a macromolecule Most of these had more than one possible answer.