Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Probability of Nucleotide Sequences in a Hidden Markov Model for CpG Islands - Prof. Drena, Assignments of Bioinformatics

Solutions to homework problems related to hidden markov models (hmms) for identifying cpg islands in nucleotide sequences. Calculations for the probabilities of specific nucleotide sequences given certain states (inside or outside of a cpg island) and finding the most likely sequence of states that produced a given nucleotide sequence. Students will learn how to calculate probabilities using hmms and understand the concept of most likely paths.

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-yca-1
koofers-user-yca-1 🇺🇸

4.3

(3)

10 documents

1 / 4

Toggle sidebar

Related documents


Partial preview of the text

Download Probability of Nucleotide Sequences in a Hidden Markov Model for CpG Islands - Prof. Drena and more Assignments Bioinformatics in PDF only on Docsity! BCB 444/544 Fall 08 Sept 29 HW3 p1of 4 BCB 444/544 Homework 3 (20pts) Name_________________________________________ Due Mon Oct 6 by 5 pm (please bring to class or deliver to MBB 106) Objectives: 1. Practice using hidden Markov models to compute probabilities Notes: You may work together on these problems, but each student must submit answers in his/her own words. It's always best to show all of your calculations & intermediate steps. Introduction: We learned about hidden Markov models in class, but it’s difficult to really understand how they work until you have some practice working with them. This homework will give you practice calculating probabilities from an HMM. 1. Consider this simplified HMM for CpG islands. A CpG island is a region of a nucleotide sequence with a high fraction of C-G dinucleotides (different from CG base pairs). We may want ot find CpG islands because they are often found near promoter regions of genes. The system has 3 states: B denotes the start state In denotes the state when we are in a CpG island Out denotes the state when we are outside of a CpG island The transition probabilities between these states are shown in the diagram. The emission probabilities are: 0.2B Out In0.5 0.5 0.8 0.6 0.4 for state Out, eOut(A) = eOut(C) = eOut(G) = eOut(T) = 0.25 for state In, eIn(A) = eIn(T) = 0.1 eIn(C) = eIn(G) = 0.4 BCB 444/544 Fall 08 Sept 29 HW3 p2of 4 a) (5 pts) Calculate the probability: What is the probability of the sequence (A, C, G) given that we are in a CpG island for all three nucleotides? P(ACG|InInIn) = P(A|In) * P(In -> In) * P(C|In) * P(In -> In) * P(G|In) = 0.1 * 0.4 * 0.4 * 0.4 * 0.4 = 0.00256 I didn’t specify starting in the Begin state or not, so this answer is also acceptable: P(ACG|BInInIn) = P(B -> In) * P(ACG|InInIn) = 0.5 * 0.00256 = 0.00128 What is the probability of the sequence (A, C, G) given that we are outside of a CpG island for all three nucleotides? P(ACG|OutOutOut) = P(A|Out) * P(Out -> Out) * P(C|Out) * P(Out -> Out) * P(G|Out) = 0.25 * 0.8 * 0.25 * 0.8 * 0.25 = 0.01 OR: 0.01 * P(B -> Out) = 0.01 * 0.5 = 0.005
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved