Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

BCB 444/544 Exam 2: HMMs, Gene & Protein Structure, RNA Secondary Structure - Prof. Drena , Exams of Bioinformatics

Information from exam 2 in the bcb 444/545 course, covering topics such as hidden markov models (hmms) for cpg island identification, gene prediction in prokaryotic and eukaryotic organisms, protein structure prediction using homology modeling, threading, and ab initio methods, and rna secondary structure prediction using ab initio, comparative, and combined computational and experimental methods. Examples and calculations for each topic.

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-6rf
koofers-user-6rf 🇺🇸

4.3

(3)

10 documents

1 / 7

Toggle sidebar

Related documents


Partial preview of the text

Download BCB 444/544 Exam 2: HMMs, Gene & Protein Structure, RNA Secondary Structure - Prof. Drena and more Exams Bioinformatics in PDF only on Docsity! BCB 444/544 Fall 08 Oct 31 Exam 2 p 1 of 7 BCB 444/544 Exam 2 (100 pts) Name_____________________________________ 1. HMM (20 pts TOTAL) Consider the simplified CpG island HMM example discussed in class. The system has 3 states: B denotes the start state In denotes the state when the sequence is in a CpG island Out denotes the state when a the sequence is out of a CpG island The transition probabilities between these states are shown in the diagram. 0.2B Out In0.5 0.5 0.8 0.6 0.4 The emission probabilities are: for state Out, eOut(A) = eOut(C) = eOut(G) = eOut(T) = 0.25 for state In: eIn(A) = eIn(T) = 0.1 eIn(C) = eIn(G) = 0.4 1. What is the most probable sequence of states, starting from state B, to produce the sequence of nucleotides CG? For full credit, you must show your work and fill in the table below. C G B 1 0 0 In 0 = 0.5 * 0.4 = 0.2 = 0.4 * max { 0.125 * 0.6 0.2 * 0.8 } = 0.25 * 0.2 * 0.8 = 0.064 Out 0 = 0.5 * 0.25 = 0.125 = 0.25 * max { 0.125 * 0.4 0.2 * 0.4 } = 0.25 * 0.2 * 0.4 = 0.0125 The most probable sequence of states is: B -> In -> In BCB 444/544 Fall 08 Oct 31 Exam 2 p 2 of 7 What is the total probability of the sequence CG? Show your work and fill in the table below. C G B 1 0 0 In 0 = 0.5 * 0.4 = 0.2 = 0.4 * sum { 0.2 * 0.4 0.125 * 0.2 } = 0.4 * 0.105 = 0.042 Out 0 = 0.5 * 0.25 = 0.125 = 0.25 * sum { 0.2 * 0.6 0.125 * 0.8 } = 0.25 * 0.3 = 0.075 The total probability of the sequence CG is: 0.042 + 0.075 = 0.117 BCB 444/544 Fall 08 Oct 31 Exam 2 p 5 of 7 4. RNA Secondary Structure Prediction (20 points total) Describe 3 general methods for predicting the secondary structure of RNA 1. Ab initio – find the structure of the RNA with the lowest free energy. The free energy calculations essentially search for the conformation that allows for maximal base pairing because base pairing lowers the free energy of the structure, mainly due to stacking interactions between adjacent base pairs but also due to the hydrogen bonds in the base pairs. 2. Comparative – use two or more related RNA sequences to find a common secondary structure. There are two ways to use the related RNAs. First by looking for covariation in the sequences – i.e., the secondary structure is likely to be more highly conserved than the sequence, so if one nucleotide involved in a base pair mutates, it is likely that a compensating mutation in the partner nucleotide will be selected for, preserving the ability to base pair. The second way to use multiple RNAs is to predict a structure for each, then identify the secondary structures in common between the two RNAs. 3. Combined computational and experimental – RNA secondary structures can be determined experimentally using various chemicals and enzymes that selectively act on either single stranded or double stranded nucleotides. Combined approaches for secondary structure prediction allow you to use experimentally determined constraints and only search for predicted secondary structures that fit the available experimental data. BCB 444/544 Fall 08 Oct 31 Exam 2 p 6 of 7 5. Short Answer (10 points total) (2 pts) What is the difference between a profile and a PSSM? The main difference is that profiles allow for gaps and PSSMs do not. (2 pts) What is the difference between a protein motif and a protein domain? Domains are longer and form independent structural or functional regions of a protein. (2 pts) Why is a HMM a more accurate representation of a motif or domain than a regular expression? HMMs are a full probabilistic model for each position in the motif or domain whereas regular expressions condense the information into a string. For example, a regular expression contains things like X for an unknown residue or [S,T] when the residue can be either an S or a T, but an HMM would contain probabilities for all 20 amino acids for the X and the probability of S and the probability of T instead of just S,T. (2 pts) What is meant by covariation in RNA secondary structure prediction? Covariation is when two positions in a set of related RNA sequences vary together to preserve a base pairing relationship. When we see covariation in a multiple sequence alignment of RNA sequences, we can be more confident of predicting that these two nucleotides are base paired to each other. (2 pts) Match the following terms: __B__ Local structural elements, such as an α-helix or β-sheet __D__ Multiple subunits (polypeptide chains) assembled into a single functional unit __C__ The fully “folded” 3-dimensional structure of a single polypeptide chain __A__ The sequence of amino acids in a protein a) Primary (1°) structure b) Secondary (2°) structure c) Tertiary (3°) structure d) Quaternary (4 °) structure BCB 444/544 Fall 08 Oct 31 Exam 2 p 7 of 7 6. Molecular Biology & Bioinformatics Terms (10 pts total) (1pt each) Fill in the box beside each definition with one term or acronym that corresponds to the definition provided. (Some have more than one correct answer). Term Definition 1. Mfold A program for predicting RNA secondary structure 2. Prof A program for predicting protein secondary structure 3. Prosite A database of protein domains and motifs 4. PDB A protein structure database 5. Pymol A program for visualizing protein structures 6. Motif A nucleotide or amino-acid sequence pattern that is often conserved and has, or is conjectured to have, functional significance 7. Pfam A database of protein families 8. GeneSeqer A program for predicting genes in eukaryotes 9. Domain Independent structural or functional unit of a protein 10 X-ray crystallography Experimental method for determining the 3-D structure of a macromolecule Most of these had more than one possible answer.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved