Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Laboratory Worksheet, Monday, Nov. 7., Slides of Calculus

Math/Stats/BI 548, Fall 2005: Computations in Biological Sequence Analysis. D. Burns and J. DeWet. Laboratory Worksheet, Monday, Nov. 7.

Typology: Slides

2022/2023

Uploaded on 05/11/2023

oliver97
oliver97 🇺🇸

4.4

(44)

94 documents

Partial preview of the text

Download Laboratory Worksheet, Monday, Nov. 7. and more Slides Calculus in PDF only on Docsity! Math/Stats/BI 548, Fall 2005: Computations in Biological Sequence Analysis D. Burns and J. DeWet Laboratory Worksheet, Monday, Nov. 7. I. Randomizing EM Training. This week we will take the randomized data we generated for the Casino problem and see whether we can improve per- formance by randomizing. I will just give you the most elementary version today, randomizing the inputs to the EM algorithm. So, in particular, this does not give you a Metropolis randomization during the calculation of the maximal log likelihood. Loook in Ctools for the new function script samplestartEM.m. This uses a special case of the Dirichlet distribution, or Dirichlet prior, on the set of all distributions on a single die. We will discuss that briefly before the lab. So, the exercise is to take your 30 by 300 data set generated by casinoran- domizer.m and use that as the data input for samplestartEM.m. As always, read the samplestartEM.m file to see what the inputs and outputs are, and the syntax of calling the function. Compare the results with your previous results using dhmm em.m. II. Protein Family Profiles III. Training Exercise Revisited. A quotation from two weeks ago: “This exercise will be about constructing (“training”) a pro- tein family profile HMM from real data. In this exercise, you will be given a sequence accession number (NP 000671: alpha1 adrenergic receptor). You will pass through some relatively simple steps: BLAST your sequence. Choose a handful of the best hits, but don’t choose overlapping sequences (choose dis- tinct species, if possible). Then submit these protein sequences to CLUSTAL for MSA (multiple sequence alignment). You may do this from the command line using the local installation. Then use the MSA of your “seed” sequences, running this through hmmbuild, the profile HMM construction program in the HMMer suite. Having done this, you can compare to what Pfam has made of your sequence and its relatives.” This week do the same thing initially except that you will now use the HMMer program hmmalign. So, this time you will save half of your sequences which BLAST found (i.e., top twenty) and use half to train the HMM as above. Then use hmmalign to align the remaining sequences to the HMM model. This will give you a larger alignment. Does this alignment compare well to an alignment of all 20 sequences done by ClustalW? Does this alignment depend on the sequences used in the “seed”? Notice that your seed is smaller than it was last week. Added this week, November 7, 2005: there is a command on ClustalW which will enforce the order of the sequences in the alignment, from top to bottom. Thus, at least that component of a comparison between the two results will be easier. III. Protein Family Profiles IV. Searching with a Profile. The idea here is to use the HMM profile of I. or III. to search for family members against a 1
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved