Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Two Sequence Alignment and Scoring Models for BME 110: CompBio Tools, Assignments of Chemistry

University of California-Santa Cruz Chemistry

An overview of sequence alignment and scoring models used in bioinformatics, specifically for bme 110: compbio tools. Topics include understanding e-values and p-values, homology and its types, dynamic programming, and scoring matrices. The document also covers the importance of selecting the appropriate scoring model and the limitations of blast hit transitivity.

Typology: Assignments

Pre 2010

Uploaded on 09/17/2009

koofers-user-chb 🇺🇸

(1)

10 documents

1 / 31

Partial preview of the text

Download Two Sequence Alignment and Scoring Models for BME 110: CompBio Tools and more Assignments Chemistry in PDF only on Docsity! Two Sequence Alignment & Scoring Models BME 110: CompBio Tools Todd Lowe April 13, 2006 Admin • Reading: – Finish Claverie, Chapters 7, 8 – NCBI Blast Guide / info page http://www.ncbi.nlm.nih.gov/BLAST/Why.shtml • Homework #1 now online, due in 1 week (Thursday, April 20) • Today – Finish BLAST overview – Two sequence alignments – In-class BLAST exercises What is Reliable? • In biology P-value of 0.05 expect would be “good enough” (5 chances in 100 of not being correlated) • Due to BLAST’s estimation of significance, shouldn’t trust P or E values > 1x10-4 • Note: they may still be paralogs with different function! Examine alignment! • For good measure, I don’t have great confidence unless < 1x10-8 Beware Hit Transitivity! • “BLAST hits are not transitive, unless alignments are overlapping” Seq1: AAAAABBBB Seq2: AAAAA Seq3: BBBB • Seq2 and Seq3 not necessarily homologous! Example • Fibrillarin-like protein – DNA: XM_293903, Protein: XP_293903 • How “far” can we go in tree of life using nucleotide v. protein searches? Limit Search Space • If you only want hits to a specific genome or domain, **much** faster to only search that species A Related Note: Homology • Based on inference that two sequences are ancestrally derived from same molecule • If two sequences have high similarity, they may be inferred to be homologous • It is WRONG to say two sequences or genes are 80% homologous (they either are related, or they are not) Homology: Same Function? • Even if two sequences are ancestrally derived from same molecule, they may or may not still have the same function – Orthologs: homologous genes created by speciation • Generally implies function remains the same – Paralogs: homologous genes created by a gene duplication event (in same species) • Implies function may have changed Full-genome Comparisons Here, each dot is a gene match, not a nucleotide match From Zivanovic et al., NAR 30: 1902-10 Pair-wise Sequence Comparison • Basis for relating biological information from a well-studied gene to a new sequence • Many programs exist for pairwise comparison • Some are fast database searching and get “good” alignments – One sequence v. many thousands: • BLAST or FASTA • Some are much slower, but guarantee the “optimal alignment” – Smith-Waterman is the de facto standard What is Optimal?? • How do we get an “optimal” alignment • Optimal to who? • Optimal based on scoring model: – Substitution scoring matrix – Insertion / deletion scoring (penalties) • Caution: Just because it is optimal for a given scoring scheme, doesn’t mean it is biologically correct!! Which is better? Match +1, Mismatch –1, Gap -2 G A T C +1-1-1+1 | | OR (Score = 0) G T G C G A T - C +1-2+1-2+1 | | | (Score = -1) G - T G C Which is better? Match +1, Mismatch –1, Gap -1 G A T C +1-1-1+1 | | OR (Score = 0) G T G C G A T - C +1-1+1-1+1 | | | (Score = 1) G - T G C Moral: Scoring Model Matters!! • For DNA, model can be very simple: • +1 match, -1 mismatch • However, not all mutations have equal likelihood: • Transition: A<–>G or C <–> T – more likely • Transversion: A<–>C or G <–> T – less likely Protein Matrices, Same Idea • Original: Dayhoff matrix aka PAM • PAM = Percent accepted mutations • Based on small number of correctly aligned proteins • Simply count how often each amino acid is substituted for another • Frequency of substitutions based on properties of amino acids relative to each other SMALE G_|P SMALL INON-POLAR POLAR EARGE ¥ LARGE NON-POLAR POLAR * Closer two amino acids are, more similar IN properties Newer “Version” of Protein Matrices: BLOSUM • By Henikoff & Henikoff (1992), based on a much larger group of aligned proteins sequences in the Blocks database • BLOSUM = Blocks substitution matrix • Used most commonly today Similarity v. Homology • Similarity is strictly a measure based on a sequence alignment observation – Two sequences are 80% identical C T A G C G A C T T | | | | | | | | C G A G C C A C T T In-class BLAST practice

Documents

questions

Two Sequence Alignment and Scoring Models for BME 110: CompBio Tools, Assignments of Chemistry

Related documents

Partial preview of the text