Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Laboratory Worksheet, Monday, Oct. 10., Summaries of Probability and Statistics

University of Chicago (UC)Probability and Statistics

Laboratory Worksheet, Monday, Oct. 10. I. HMM Viterbi Algorithm. ... We want to create data to train the HMM, i.e., to find the HMM's probability.

Typology: Summaries

2022/2023

Uploaded on 05/11/2023

gaurish 🇺🇸

4.6

(12)

5 documents

1 / 3

Partial preview of the text

Download Laboratory Worksheet, Monday, Oct. 10. and more Summaries Probability and Statistics in PDF only on Docsity! Math/Stats/BI 548, Fall 2005: Computations in Biological Sequence Analysis D. Burns and J. DeWet Laboratory Worksheet, Monday, Oct. 10. I. HMM Viterbi Algorithm. This is just to finish off what was started in class. In the coursetools you should find a file of Matlab scripts and data. First of all, you have to log in to the Mac computers in the plaza level lab (it has access to Matlab) and open matlab from the applications directory. Then download the Matlab scripts onto your desktop. Then download the Kevin Murphy toolbox file on the course Web Resources page (it is the last entry). You will have to follow some links here. Then in Matlab open setpath from the file menu. I will explain this in class; there is a subtlety in that you cannot save the pathway to the matlab directory, but you can use it this session on your desktop. When this is sorted out, upload the dicedata.mat into the matlab workspace. I will show you how to do this. Locate the variables in the workspace. We will first use the command dataOL.m to convert the data string from the dicedata into a 2 x 300 matrix of observed likelihoods. Then use this as part of the input to viterbi path to learn the Viterbi decoding of the HMM. II. Training Exercise. This time let us assume we do not know the parameters for the HMM. We want to create data to train the HMM, i.e., to find the HMM’s probability parameters from data. This is done by the script casinorandomizer.m. Open this function file up and read what the inputs are. Now create a matrix 10 x 300 in size which give random data with the Markov parameters we knew form the original dishonest casino problem. Yes, this is a bit circular, strictly speaking, but the idea is to rediscover these parameters from the Baum-Welch (expectation-maximization) method. We will use the function dhmm em.m from the HMM toolbox. As a write up for this week, please copy form the screen your best approximation to the parameteres we used to generate data, as learned by the training algorithm dhmm em. What adjustments seemed to help or harm your getting this result? That is, did changing the threshold number of repetitions help? Did generating more data help? Did insisting on a more stringent threshold for change in LL from one iteration to the next help? III. p-values and Pairwise Sequence Alignment. We have to transfer back to 2036 PC for this one, because we have the USC alignment package mounted in “our” laboratory (and not in the UM IT lab on the 3rd floor). Go back to the exercise to compare E. coli tRNA’s against the 16S subunit of the ribosome. From the 548 Resources page, you can download the data files ECORRD and EctRNAdata. You will have to use the function pvlocal from the command line in the Linux based lab computers. I will hopefully be able to mount the results of this comparison form an older paper of Waterman’s. Be sure t do the comparison involving the tRNA for cysteine. Since we have a lab day knocked out by the Fall Break this year, we will probably try to do this example inn class before the (distant!) next lab day. I have attached two pages form the paper “”Hearing Distant Echoes” by Michael Waterman from Calculating the Secrets of Life, E. Lander and M. Waterman, eds., NAS Press, 1995. It shows an analysis (just the data) of pairwise comparison between E. coli 16S ribosomal RNA and 1 the various tRNA’s for the bug. The point is the significance column. The second figure uses a more accurate estimation of the significance. Unfortunately, it is given in standard deviations and not straight p-value. The p-value for cystine’s σ = 6.2 is about 10−3. 2

Documents

questions

Laboratory Worksheet, Monday, Oct. 10., Summaries of Probability and Statistics

Related documents

Partial preview of the text