Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Laboratory Worksheet, Monday, Oct. 10., Summaries of Probability and Statistics

Laboratory Worksheet, Monday, Oct. 10. I. HMM Viterbi Algorithm. ... We want to create data to train the HMM, i.e., to find the HMM's probability.

Typology: Summaries

2022/2023

Uploaded on 05/11/2023

gaurish
gaurish 🇺🇸

4.6

(12)

5 documents

1 / 3

Toggle sidebar

Related documents


Partial preview of the text

Download Laboratory Worksheet, Monday, Oct. 10. and more Summaries Probability and Statistics in PDF only on Docsity! Math/Stats/BI 548, Fall 2005: Computations in Biological Sequence Analysis D. Burns and J. DeWet Laboratory Worksheet, Monday, Oct. 10. I. HMM Viterbi Algorithm. This is just to finish off what was started in class. In the coursetools you should find a file of Matlab scripts and data. First of all, you have to log in to the Mac computers in the plaza level lab (it has access to Matlab) and open matlab from the applications directory. Then download the Matlab scripts onto your desktop. Then download the Kevin Murphy toolbox file on the course Web Resources page (it is the last entry). You will have to follow some links here. Then in Matlab open setpath from the file menu. I will explain this in class; there is a subtlety in that you cannot save the pathway to the matlab directory, but you can use it this session on your desktop. When this is sorted out, upload the dicedata.mat into the matlab workspace. I will show you how to do this. Locate the variables in the workspace. We will first use the command dataOL.m to convert the data string from the dicedata into a 2 x 300 matrix of observed likelihoods. Then use this as part of the input to viterbi path to learn the Viterbi decoding of the HMM. II. Training Exercise. This time let us assume we do not know the parameters for the HMM. We want to create data to train the HMM, i.e., to find the HMM’s probability parameters from data. This is done by the script casinorandomizer.m. Open this function file up and read what the inputs are. Now create a matrix 10 x 300 in size which give random data with the Markov parameters we knew form the original dishonest casino problem. Yes, this is a bit circular, strictly speaking, but the idea is to rediscover these parameters from the Baum-Welch (expectation-maximization) method. We will use the function dhmm em.m from the HMM toolbox. As a write up for this week, please copy form the screen your best approximation to the parameteres we used to generate data, as learned by the training algorithm dhmm em. What adjustments seemed to help or harm your getting this result? That is, did changing the threshold number of repetitions help? Did generating more data help? Did insisting on a more stringent threshold for change in LL from one iteration to the next help? III. p-values and Pairwise Sequence Alignment. We have to transfer back to 2036 PC for this one, because we have the USC alignment package mounted in “our” laboratory (and not in the UM IT lab on the 3rd floor). Go back to the exercise to compare E. coli tRNA’s against the 16S subunit of the ribosome. From the 548 Resources page, you can download the data files ECORRD and EctRNAdata. You will have to use the function pvlocal from the command line in the Linux based lab computers. I will hopefully be able to mount the results of this comparison form an older paper of Waterman’s. Be sure t do the comparison involving the tRNA for cysteine. Since we have a lab day knocked out by the Fall Break this year, we will probably try to do this example inn class before the (distant!) next lab day. I have attached two pages form the paper “”Hearing Distant Echoes” by Michael Waterman from Calculating the Secrets of Life, E. Lander and M. Waterman, eds., NAS Press, 1995. It shows an analysis (just the data) of pairwise comparison between E. coli 16S ribosomal RNA and 1 the various tRNA’s for the bug. The point is the significance column. The second figure uses a more accurate estimation of the significance. Unfortunately, it is given in standard deviations and not straight p-value. The p-value for cystine’s σ = 6.2 is about 10−3. 2
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved