Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Multiple Sequence Alignment: Position Specific Scoring Matrices and Psi-BLAST - Prof. Dren, Exams of Bioinformatics

A lecture note from a bioinformatics course focusing on multiple sequence alignment (msa), specifically discussing position specific scoring matrices (pssms) and psi-blast. It covers the concept of msa, its importance, and the use of pssms and psi-blast for finding homologous sequences and creating new substitution matrices.

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-bf8-1
koofers-user-bf8-1 🇺🇸

7 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download Multiple Sequence Alignment: Position Specific Scoring Matrices and Psi-BLAST - Prof. Dren and more Exams Bioinformatics in PDF only on Docsity! #11 - MSAs; PSSMs & Psi-BLAST 9/17/07 BCB 444/544 Fall 07 Dobbs 1 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 1 BCB 444/544 Lecture 12 Multiple Sequence Alignment (MSA) PSSMs & Psi-BLAST #12_Sept17 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 2 √ Mon Sept 17 - Lecture 12 Position Specific Scoring Matrices & PSI-BLAST • Chp 6 - pp 75-78 (but not HMMs) Wed Sept 19 - Lecture 13 (not covered on Exam 1) Hidden Markov Models • Chp 6 - pp 79-84 • Eddy: What is a hidden Markov Model? 2004 Nature Biotechnol 22:1315 http://www.nature.com/nbt/journal/v22/n10/abs/nbt1004-1315.html Wed Sept 21 - EXAM 1 Required Reading (before lecture) 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 3 Assignments & Announcements Sun Sept 16 - Study Guide for Exam 1 was posted Mon Sept 17 - Answers to HW#2 will be posted ~ Noon Thu Sept 20 - Lab = Optional Review Session for Exam Fri Sept 21 - Exam 1 - Will cover: • Lectures 2-12 (thru Mon Sept 17) • Labs 1-4 • HW2 • All assigned reading: Chps 2-6 (but not HMMs) Eddy: What is Dynamic Programming~ 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 4 Chp 5- Multiple Sequence Alignment SECTION II SEQUENCE ALIGNMENT Xiong: Chp 5 Multiple Sequence Alignment • Scoring Function • Exhaustive Algorithms • Heuristic Algorithms • Practical Issues 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 5 Multiple Sequence Alignments Credits for slides: Caragea & Brown, 2007; Fernandez-Baca, Heber &Hunter 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 6 Overview 1. What is a multiple sequence alignment (MSA)? 2. Where/why do we need MSA? 3. What is a good MSA? 4. Algorithms to compute a MSA #11 - MSAs; PSSMs & Psi-BLAST 9/17/07 BCB 444/544 Fall 07 Dobbs 2 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 7 Multiple Sequence Alignment • Generalize pairwise alignment of sequences to include > 2 homologous sequences • Analyzing more than 2 sequences gives us much more information: • Which amino acids are required? Correlated? • Evolutionary/phylogenetic relationships • Similar to PSI-BLAST idea (not yet covered in lecture): use a set of homologous sequences to provide more "sensitivity" 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 8 Definition: MSA Given a set of sequences, a multiple sequence alignment is an assignment of gap characters, such that • resulting sequences have same length • no column contains only gaps ATTTG- ATTTGC AT-TGC ATTTG ATTTGC ATT-GC ATTT-G- ATTT-GC AT-T-GC YES NONO 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 9 Displaying MSAs: using CLUSTAL W * entirely conserved column : all residues have ~ same size AND hydropathy . all residues have ~ same size OR hydropathy RED: AVFPMILW (small) BLUE: DE (acidic, negative chg) MAGENTA: RHK (basic, positive chg) GREEN: STYHCNGQ (hydroxyl + amine + basic) 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 10 A single sequence that represents most common residue of each column in a MSA Example: What is a Consensus Sequence? FGGHL-GF F-GHLPGF FGGHP-FG FGGHL-GF Steiner consensus seqence: Given sequences s1,…, sk, find a sequence s* that maximizes Σi S(s*,si) 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 11 Applications of MSA • Building phylogenetic trees • Finding conserved patterns, e.g.: • Regulatory motifs (TF binding sites) • Splice sites • Protein domains • Identifying and characterizing protein families • Find out which protein domains have same function • Finding SNPs (single nucleotide polymorphisms) & mRNA isoforms (alternatively spliced forms) • DNA fragment assembly (in genomic sequencing) 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 12 Application: Recover Phylogenetic Tree NYLS NYLS NFLS What was series of events that led to current species? #11 - MSAs; PSSMs & Psi-BLAST 9/17/07 BCB 444/544 Fall 07 Dobbs 5 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 25 Given k sequences of length n: • Space for matrix: O(nk) • Neighbors/cell: 2k-1 • Time to compute SP score: O(k2) • Overall runtime: O(k22knk)  Ouch!!! 3D What Happens to Computational Complexity? 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 26 What's so bad about those exponents? An example: Running Time of DP • Overall runtime: O(k22knk) 9 years6 3 weeks5 5 hours4 2 minutes3 1 second2 running time# sequences Sequences: globins (≈ 150 aa) But: There are fast heuristics. 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 27 Progressive Alignment Heuristic procedure: 1. Align most similar sequences first 2. Add sequences progressively Often: use guide tree to determine order of alignments Examples: Star alignment ClustalW Multiple Alignment by adding sequences 1 2 3 4 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 28 Guide Tree Binary tree • Leaves correspond to sequences • Internal nodes represent alignments • Root corresponds to final MSA ATC ATG TCG ATC ATG ATC- ATG- -TCC TCC TCG TCC -TCG 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 29 Star Alignment - will skip for now, come back to this on Wed Star alignment will NOT be covered on Exam 1 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 30 Chp6 - Profiles & Hidden Markov Models SECTION II SEQUENCE ALIGNMENT Xiong: Chp 6 Profiles & HMMs • Position Specific Scoring Matrices (PSSMs) • PSI-BLAST • Profiles • Markov Model & Hidden Markov Model #11 - MSAs; PSSMs & Psi-BLAST 9/17/07 BCB 444/544 Fall 07 Dobbs 6 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 31 PSI Blast • Position Specific Iterated BLAST • Intuition: substitution matrices should be specific to a particular site: penalize alanine→glycine more in a helix • Basic idea: • Use BLAST with high stringency to get a set of closely related sequences • Align those sequences to create a new substitution matrix for each position • Then use that matrix (iteratively) to find additional sequences 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 32 Psi-BLAST BLAST Query Sequence database PSSM Multiple alignment 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 33 PSI-BLAST pseudocode Convert query to PSSM do { BLAST database with PSSM Stop if no new homologs are found Add new homologs to PSSM } Print current set of homologs 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 34 Convert query to PSSM do { BLAST database with PSSM Stop if no new homologs are found Add new homologs to PSSM } Print current set of homologs PSI-BLAST pseudocode Position-specific scoring matrix 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 35 PSI-BLAST pseudocode Convert query to PSSM do { BLAST database with PSSM Stop if no new homologs are found Add new homologs to PSSM } Print current set of homologs This step requires a user-defined threshold 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 36 Position-specific scoring matrix - PSSM • A PSSM is an n by m matrix, where n is the size of alphabet, and m is length of sequence • Entry at (i, j) is score assigned by PSSM to letter i at the jth position -3-3-1-2-3-3-3-3V 2-33-1-3-2-2-2Y -2-21-2-2-3-4-3W -2-2-2-1-2-10-1T -10-200-11-1S -2-2-4-1-2-2-2-2P -1-36-3-3-3-3-3F -2-300-3-1-2-1M -1-2-31-2202K -3-40-2-4-2-3-2L -3-40-3-4-3-3-3I 8-2-10-2010H -26-3-26-20-2G 0-2-32-2000E 0-2-35-2101Q -3-3-2-3-3-3-3-3C -1-1-30-1-21-2D 10-300060N 0-2-31-2505R -20-2-10-1-2-1A #11 - MSAs; PSSMs & Psi-BLAST 9/17/07 BCB 444/544 Fall 07 Dobbs 7 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 37 Position-specific scoring matrix • A PSSM is an n by m matrix, where n is the size of the alphabet, and m is the length of the sequence. • The entry at (i, j) is the score assigned by the PSSM to letter i at the jth position. -3-3-1-2-3-3-3-3V 2-33-1-3-2-2-2Y -2-21-2-2-3-4-3W -2-2-2-1-2-10-1T -10-200-11-1S -2-2-4-1-2-2-2-2P -1-36-3-3-3-3-3F -2-300-3-1-2-1M -1-2-31-2202K -3-40-2-4-2-3-2L -3-40-3-4-3-3-3I 8-2-10-2010H -26-3-26-20-2G 0-2-32-2000E 0-2-35-2101Q -3-3-2-3-3-3-3-3C -1-1-30-1-21-2D 10-300060N 0-2-31-2505R -20-2-10-1-2-1A “K” at position 3 gets a score of 2 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 38 Position-specific scoring matrix This PSSM assigns sequence NMFWAFGH a score of: 0 + -2 + -3 + -2 + -1 + 6 + 6 + 8 = 12 -3-3-1-2-3-3-3-3V 2-33-1-3-2-2-2Y -2-21-2-2-3-4-3W -2-2-2-1-2-10-1T -10-200-11-1S -2-2-4-1-2-2-2-2P -1-36-3-3-3-3-3F -2-300-3-1-2-1M -1-2-31-2202K -3-40-2-4-2-3-2L -3-40-3-4-3-3-3I 8-2-10-2010H -26-3-26-20-2G 0-2-32-2000E 0-2-35-2101Q -3-3-2-3-3-3-3-3C -1-1-30-1-21-2D 10-300060N 0-2-31-2505R -20-2-10-1-2-1A 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 39 Position-specific scoring matrix • What score does this PSSM assign to KRPGHFLA? 2 + 0 + -2 + 6 + 0 + 6 + -4 + -2 = 6 -3-3-1-2-3-3-3-3V 2-33-1-3-2-2-2Y -2-21-2-2-3-4-3W -2-2-2-1-2-10-1T -10-200-11-1S -2-2-4-1-2-2-2-2P -1-36-3-3-3-3-3F -2-300-3-1-2-1M -1-2-31-2202K -3-40-2-4-2-3-2L -3-40-3-4-3-3-3I 8-2-10-2010H -26-3-26-20-2G 0-2-32-2000E 0-2-35-2101Q -3-3-2-3-3-3-3-3C -1-1-30-1-21-2D 10-300060N 0-2-31-2505R -20-2-10-1-2-1A 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 40 Position-specific iterated BLAST BLAST Query Sequence database PSSM Multiple alignment ? 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 41 Creating a PSSM from 1 sequence -3-3-1-2-3-3-3-3V 2-33-1-3-2-2-2Y -2-21-2-2-3-4-3W -2-2-2-1-2-10-1T -10-200-11-1S -2-2-4-1-2-2-2-2P -1-36-3-3-3-3-3F -2-300-3-1-2-1M -1-2-31-2202K -3-40-2-4-2-3-2L -3-40-3-4-3-3-3I 8-2-10-2010H -26-3-26-20-2G 0-2-32-2000E 0-2-35-2101Q -3-3-2-3-3-3-3-3C -1-1-30-1-21-2D 10-300060N 0-2-31-2505R -20-2-10-1-2-1A BLOSUM62 matrix RNRGQFGH R R 20 by 20 20 by L L 9/17/07BCB 444/544 F07 ISU Dobbs #11 - MSAs; PSSMs & Psi-BLAST 42 Position-specific iterated BLAST BLAST Query Sequence database PSSM Multiple alignment ?
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved