Download Lecture Notes on Protein Modeling - Databases and Software | MCB 221B and more Study notes Biology in PDF only on Docsity! At the beginning, there were thoughts, and observation…. MCB-221b lecture 11 Protein Modeling ‘databases and software’ http://koehllab.genomecenter.ucdavis.edu/ • Data are used to formulate new hypotheses • Data are stored and disseminated via databases of information, allowing open access to the records held within them (bioinformatics) • designing novel algorithms and methods of analyses help solve biological problems (computational biology) From hypothesis-driven to exploratory data analysis: Is there a danger, in molecular biology, that the accumulation of data will get so far ahead of its assimilation into a conceptual framework that the data will eventually prove an encumbrance ? John Maddox, 1988 Genome DBs Protein structure DBs algorithms: classifications (CATH etc) sequencing crystallization, NMR linear aa predictions (CASP) validations • Classical tool biology: It is easier to think about a representative than to embrace the information of all individuals Aristotle: Plants and Animal Linnaeus: binomial system Darwin: systematic classification that reveals phylogeny • Clustering • Domain Definition • 3 Major classifications - SCOP - CATH - DDD • Differences Protein Structure Classification Delineating protein domains: Looking at secondary structure Authors: Sowdhamini and Blundell, Protein Science, 4:506 (1996) Definition of a domain: a cluster of secondary structure. Method: clustering of the secondary structures in a protein. “Distance” between secondary structures: Delineating protein domain: a bottom-up procedure Author: W.R. Taylor, Protein Engineering, 12, 203-216 (1999) Idea: classical methods for defining protein domains starts from an hypothesis / definition of what a domain is, and check how the data verify that hypothesis. Protein Structural Domains Protein Domain: various definitions exist 1) Regions that display significant levels of sequence similarity 2) The minimal part of a gene that is capable of performing a function 3) A region of a protein with an experimentally assigned function 4) Region of a protein structure that recurs in different contexts and proteins 5) A compact, spatially distinct region of a protein Why do proteins fold? Unfolded State Folded State Protein backbone is a linear chain Chain is self-avoiding Protein is closely packed Amino Acid preferences: - inside (hydrophobic) / outside (hydrophilic) - Specific interactions - Interactions with solvent - Interactions with ions - concentration of proteins solvent Backbone + Sidechain Protein tertiary structure: Packing Ω: angle between the 2 axes Helix-helix packing Sheet-sheet packing 20 degree between sheets orthogonal • Parallel sheets tend to be covered by helices on both sides • Anti-parallel sheets tend to have one side covered by a sheet: “sandwich-type” structure. Two types of packing: aligned, or orthogonal • Because the periodicities of helices and strands are different, there is not regular packing patterns. • Helices tend to be on both sides of parallel beta sheets. Helix-sheet packing Glucagon RNA binding protein Protein tertiary structure: Architecture classes lone helix alpha folds helix-turn-helix four helix bundle Myohemerythrin RNA binding protein dimer beta folds beta sandwich (FA binding protein) Greek key topology (5-13 strands) Jellyroll topology (a greek key with extra swirl) beta propellorbeta helix Classification of Protein Structure: CATH C A T Alpha Mixed Alpha Beta Beta Sandwich Tim Barrel Other Barrel Super RollBarrel http://www.cathdb.info/latest/index.html Classification of Protein Structure: SCOP SCOP is organized into 4 hierarchical layers: (1) Classes: similar to CATH alpha, beta, alpha/beta, alpha+beta, multi-domain proteins w/alpha and beta, membrane and cell surface proteins, small proteins, coiled coils, low resolution prot. 3) Superfamily: Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies 4) Family: Clear evolutionarily relationship Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pairwise residue identities between the proteins are 30% and greater (2) Folds: Major structural similarity Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections http://scop.mrc-lmb.cam.ac.uk/scop/ http://scop.berkeley.edu/ Classification of Protein Structure: SCOP
SCOP: Structural Classification of Proteins. 1.69 release
25973 PDB Entries (1 Oct 2004). 70859 Domains. 1 Literature Reference
(excluding nucleic acids and theoretical models)
Class ‘Number of folds|Number of superfamilies Number of families
[All alpha proteins | 218 [ 376 | 608
[All beta proteins | 144 | 290 | 560
Alpha and beta proteins (a/b) | 136 | 222 | 629
[Alpha and beta proteins (a+b) | 279 | 409 | TIT
[Multi-domain proteins | 46 | 46 | 61
[Membrane and cell surface proteins| AT [ 88 | 99
[Small proteins | 15 | 108 | 171
Total | 945 | 1539 | 2845
SCOP: Structural Classification of Proteins. 1.71 release
27599 PDB Entries (18 Jan 2005). 75930 Domains. 1 Literature Reference
(excluding nucleic acids and theoretical models)
Class ‘Number of folds|Number of superfamilies|Number of families
[All alpha proteins 226 392 645
[All beta proteins 149 300 594
(Alpha and beta proteins (a/b) 134 221 661
[Alpha and beta proteins (a+b) 286 424 733
Multi-domain proteins 48 48 64
Membrane and cell surface proteins 49 90 101
‘Small proteins 19 114 186
Total 971 1589 3004
Protein Structure Comparison The protein structure is a 3D shape: the goal is to find algorithms that find the optimal match between two shapes. • Global versus local alignment • Measuring protein shape similarity • Protein structure superposition • Protein structure alignment Global Alignment
3 oe
Global alignment
Local Alignment Local alignment motif Protein Structure Prediction • One popular model for protein folding assumes a sequence of events: – Hydrophobic collapse – Local interactions stabilize secondary structures – Secondary structures interact to form motifs – Motifs aggregate to form tertiary structure Protein Structure Prediction A physics-based approach: - find conformation of protein corresponding to a thermodynamics minimum (free energy minimum) - cannot minimize internal energy alone! Needs to include solvent - simulate folding…a very long process! Folding time are in the ms to second time range Folding simulations at best run 1 ns in one day… The CASP experiment • CASP= Critical Assessment of Structure Prediction • Started in 1994, (Moult, Pederson, Judson, Fidelis, Proteins, 23:2-5 (1995)) • First run in 1994; now runs regularly every second year (CASP6 was held last December) 1) Sequences of target proteins are made available to CASP participants in June-July of a CASP year - the structure of the target protein is known, but not yet released in the PDB, or even accessible 2) CASP participants have between 2 weeks and 2 months over the summer of a CASP year to generate up to 5 models for each of the target they are interested in. 3) Model structures are assessed against experimental structure 4) CASP participants meet in December to discuss results Homology Modeling: Practical guide Approach 1: manually. (BLAST, then a range of steps you’d need to learn) Approach 2: Submit target sequence to automatic servers - Fully automatic: - 3D-Jigsaw : http://www.bmm.icnet.uk/servers/3djigsaw/ - EsyPred3D: http://www.fundp.ac.be/urbm/bioinfo/esypred/ - SwissModel: http://swissmodel.expasy.org//SWISS-MODEL.html - Fold recognition: - PHYRE: http://www.sbg.bio.ic.ac.uk/~phyre/ - Useful sites: - Meta server: http://bioinfo.pl/Meta - PredictProtein: http://cubic.bioc.columbia.edu/predictprotein/ Small proteins can be de novo predicted at least, about 50% at < 5Å Small proteins can be de novo predicted Very good Poor – caught in local free energy minimum