Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Exam 1 with Answer Key | Introduction to Bioinformatics | BCB 444, Exams of Bioinformatics

Material Type: Exam; Professor: Dobbs; Class: INTRO BIOINFORMATCS; Subject: BIOINFORMATICS AND COMPUTATIONAL BIOL; University: Iowa State University; Term: Fall 2008;

Typology: Exams

Pre 2010
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 09/02/2009

koofers-user-bsa
koofers-user-bsa 🇺🇸

10 documents

1 / 5

Toggle sidebar
Discount

On special offer

Related documents


Partial preview of the text

Download Exam 1 with Answer Key | Introduction to Bioinformatics | BCB 444 and more Exams Bioinformatics in PDF only on Docsity! BCB 444/544 - F08 Exam 1 Name _______ ANSWER KEY________________ 1. BLAST (20 points) a. (9 points) Describe the heuristics used by BLAST to avoid aligning the query sequence against every sequence in the database. BLAST first builds a list of words from the query sequence and then determines which of the possible words will match the query sequence with a score above a threshold. BLAST then searches for exact matches to these high scoring words. Before performing the alignment with the database sequence, BLAST looks for at least two matching words on the same diagonal. By forcing the database sequence to contain at least two exact matches to high scoring words, BLAST can ignore many of the database sequences and focus on only the ones likely to produce significant alignments. b. (9 points) You are studying a zebrafish gene that appears to function in regulating cell growth and division. A mutation in this gene causes cancer in the zebrafish. You are very interested to see if a homologous gene exists in humans. Your lab determines the DNA sequence of the zebrafish gene and you, as the bioinformatics expert in the lab, are given the task of finding a homologous gene in humans. Unfortunately, a BLAST search against a database of human DNA sequences returns no significant alignments. Describe three approaches to increase the chance of finding a homologous sequence in humans and why each approach may work. DNA sequences mutate more rapidly than protein sequences because mutations in the DNA may not cause mutations in the amino acid sequence. Also, selection works mainly at the level of proteins and many amino acid changes may not effect the protein’s function. Therefore the first approach would be to do a translated BLAST search in which the DNA query sequence is translated to the six possible protein sequences and used as a query against a protein sequence database. Zebrafish and humans are rather remotely related and the appropriate substitution matrix should be used for this translated BLAST search. A low BLOSUM number matrix or a high PAM number matrix should be used. Many other answers are acceptable, a brief list is: Change the gap penalties – remotely related sequences may have more gaps in their alignments, so make gaps cost less. Change word size – decrease word size so that database sequences do not have to match over more than a couple positions. Use SSEARCH – BLAST does not perform alignments against all database sequences, SSEARCH does. Try PSI-BLAST – may find more remote homologs using the PSSM based search. c. (2 points) When using your strategy outlined above, the best result you get from your BLAST search has an E-value of 1. Explain what an E-value of 1 means and tell if it is a significant result or not. An E-value of 1 means that in a database the size of the one you search against, you would expect to find 1 sequence that produces an alignment that scores as well as this one just by chance. A BLAST result with an E- value of 1 is typically not considered a significant result. 2. Dynamic programming (30 points) a. (10 points) Fill out the dynamic programming table for determining an optimal global alignment between the sequences TCAA and TCTGA. All matches are scored +5 and all mismatches and spaces are scored -3. T C T G A 0 -3 -6 -9 -12 -15 T -3 5 2 -1 -4 -7 C -6 2 10 7 4 1 A -9 -1 7 7 4 9 A -12 -4 4 4 4 9 b. (1 point) What is the score of the optimal global alignment? 9 c. (4 points) What is the optimal global alignment? For full credit, you must give all optimal alignments and show your traceback arrows in the table above. TCTGA TCTGA TC-AA TCA-A d. (10 points) Fill out the dynamic programming table for determining the optimal local alignment between the sequences AGTC and AGACT. All matches are scored +5 and all mismatches and spaces are scored -3. A G A C T 0 0 0 0 0 0 A 0 5 2 5 2 0 G 0 2 10 7 4 1 T 0 0 7 7 4 9 C 0 0 4 4 12 9 e. (1 point) What is the score of the optimal local alignment? 12 f. (4 points) What is the optimal local alignment? For full credit, you must give all optimal local alignments and show your traceback arrows in the table above. AGAC AGTC
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved