Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Pairwise Sequence Alignment - Introduction to Bioinformatics - Notes | CISC 636, Exams of Computer Science

Material Type: Exam; Professor: Liao; Class: Bioinformatics; Subject: Computer/Information Sciences; University: University of Delaware; Term: Spring 2008;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-nto-1
koofers-user-nto-1 🇺🇸

10 documents

1 / 17

Toggle sidebar

Related documents


Partial preview of the text

Download Pairwise Sequence Alignment - Introduction to Bioinformatics - Notes | CISC 636 and more Exams Computer Science in PDF only on Docsity! CISC636, S08, Lec5, Liao CISC 636 Intro to Bioinformatics (Spring 2008) Pairwise sequence alignment Needleman-Wunsch (global alignment) CISC636, S08, Lec5, Liao Sequence Alignment Motivation – Sequence assembly: reconstructing long DNA sequences from overlapping sequence fragments – Annotation: assign functions to newly discovered genes • Raw genomic (DNA) sequences  coding sequences (CDS), candidate for genes  protein sequence  function • Terminologies: cDNA, RNA, mRNA • Evolution: mutation  sequence diversity (versus homology)  (new) phenotype ? • Basis for annotation: sequence similarity  sequence homology  same function – Caveat: homology can only be inferred, not affirmed, since we can not rewind to see how evolution actually happened. CISC636, S08, Lec5, Liao Substitution Score matrix • Alignments are used to reveal homologous proteins/genes • Substitution scores are used to assess how good the alignments of a pair of residues are. • Under the assumption that each mutation (i.e., deletion, insertion, and substitution) is independent, the total score of an alignment is the sum of scores at each position. • Substitution score matrix is a 20 x 20 matrix that gives the score for every pair of amino acids. • The ways to derive a substitution score matrix. – Ad hoc – Physical/chemical properties of amino acids – Statistical CISC636, S08, Lec5, Liao PAM matrices (Margaret Dayhoff, 1978) • point accepted mutation or percent accepted mutation • unit of measurement of evolutionary divergence between two amino acid sequences • substitute matrices (scoring matrices) 1 PAM = one accepted point-mutation event per one- hundred amino acids CISC636, S08, Lec5, Liao PAM (cont’d) caveat: • Sequences s1 and s2 are x PAM divergent does not imply s1 and s2 have x percent sequence difference ( should be equal or less). facts: 1. even amino acid sequences that have diverged by 200 PAM units are expected to be identical in about 25% of their positions. 2. sequences that are 250 PAM units diverged can generally be distinguished from a pair of random sequences. PAH 250 a ee REE gl gs BSUS. gr Rr ae Scr ENP Hp citing gure Rpglt i Aimmap ALA Bi Blech ea AOR Ep Rig Al ph eae terliplpa, SS Soe epee ap nett OSE Se ERY EF ta Rp SO. pe poe Fes OMNrA AMAA Te NM SHH Pee Cc S$ TPAGNODEQHRKEHWHILVEF Y &W& CISC636, S08, LecS, Liao CISC636, S08, Lec5, Liao BLOSUM matrices [Steven and Jorja Henikoff] - BLOSUM x matrix is a 20 by 20 matrix. Its elements are defined like those of PAM matrices but the frequencies are collected from sequences in BLOCKs database that are less than x percent identical (generally x is between 50 and 80). - By their construction, BLOSUM matrices are believed to be more effectively detect distant homology. - Taking the place of PAM 250, BLOSUM 62 is now the default matrix used in database search. BLOSUMS0 Beers Ts RRs Boo oe meee eel mle (ele els oF Beis my aes ae TET T oe Th Cal ae ti 4 oT Eo aaa ome" et ssf, eat ti 97 RRM > ERP Pr PE enero t mms aioe Te f3 [ui of my | aT le 9 my a) 7 oR a “| i Tele TP Rye mT omens ol) “9 i rio |? ry Tah aol mo | SERS eee 2 fara ola Ta 7 Tee" oe ons TT ae my || at Mic ? BS (2 |9 OMe eit ia ee SS sas 3 LS ES |S oo Cee eS ae eo ae ay aa ene POMS MES Sie sist Tsao BoP ee Ome seh ae BvoRET ORT th Toner eT BO CMMSe el CRe Tessier s BOM STR Se 8 Bis eels ties MMe oi cit esis esis ois fee | |S BREE CISC636, S08, LecS, Liao CISC636, S08, Lec5, Liao iv) Trace-back To find the alignment itself, we must find the path of choices (in applying the formulae of ii) when tabular computing that led to this final value. > Vertical move is gap in the column sequence. > Horizontal move is gap in the row sequence. > Diagonal move is a match. CISC636, S08, Lec5, Liao Example: Align HEAGAWGHEE and PAWHEAE. Use BLOSUM 50 for substitution matrix and d=-8 for gap penalty. H E A G A W G H E E 0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73 A -16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60 W -24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37 H -32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19 E -40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5 A -48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2 E -56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1 HEAGAWGHE-E --P-AW-HEAE CISC636, S08, Lec5, Liao Time complexity: O(nm) Space complexity: O(nm) Big-O notation: f(x) = O(g(x)) => f is upper bound by g f(x) = (g(x)) => f is lower bound by g f(x) = (g(x)) => f is bound to g within constant factors
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved