Download Understanding Protein Structures: Amino Acids, Peptide Bonds, and Protein Analysis and more Study notes Chemistry in PDF only on Docsity! 5/24/2007 © David Bernick, 20071 Components of Protein Structures BME 110: Computational Biology Tools Protein Structures: Components and Analysi 5/24/2007 © David Bernick, 20072 Amino acids -- properties and symbols Neutral Polar Neutral Slightly polar Neutral Non-polar Neutral Polar Neutral Polar Basic Polar Neutral Polar Neutral Non-polar Neutral Polar Neutral Non-polar TyrYTyrosineNeutralNon-polarLeuLLeucine TrpWTryptophanBasicPolarLysKLysine ValVValineNeutralNon-polarIleIIsoleucine ThrTThreonineBasicPolarHisHHistidine SerSSerineNeutralNon-polarGlyGGlycine ArgRArganineNeutralNon-polarPheFPhenylalanine GlnQGlutamineAcidicPolarGluEGlutamate ProPProlineAcidicPolarAspDAspartate AsnNAsparagineNeutralSlightly PolarCysCCysteine MetMMethionineNeutralNon-polarAlaAAlanine Amino acidAmino acid ini 5/24/2007 © David Bernick, 20073 the peptide bond http://www.codefun.com/Images/Genetic/tRNA/image004.jpg 5/24/2007 © David Bernick, 20074 Peptides and the peptide bond C-terminus N-terminus 5/24/2007 © David Bernick, 20079 Protein Data Bank www.pdb.org • as of 5/23/2007, there are 43633 stored structures • with 1054 unique folds(SCOP) 5/24/2007 © David Bernick, 200710 structures • Banner, D.W., Bloomer, A.,Petsko, G.A., Phillips, D.C., Wilson, I.A. Atomic coordinates for triose phosphate isomerase from chicken muscle. Biochem.Biophys.Res.Commun. v72 pp.146-155 , 1976 http://www.pdb.org/pdb/explore.do?structureId=1TIM type X-RAY DIFFRACTION Resolution[Å] R-Value R-Free Space Group 2.50 n/a n/a P 21 21 21 5/24/2007 © David Bernick, 200711 PDB structure records (1TIM) ATOM 1 N ALA A 1 43.240 11.990 -6.915 1.00 0.00 1TIM 147 ATOM 2 CA ALA A 1 43.888 10.862 -6.231 1.00 0.00 1TIM 148 ATOM 3 C ALA A 1 44.791 11.378 -5.094 1.00 0.00 1TIM 149 ATOM 4 O ALA A 1 44.633 10.992 -3.937 1.00 0.00 1TIM 150 ATOM 5 CB ALA A 1 44.722 10.051 -7.240 1.00 0.00 1TIM 151 ATOM 6 N PRO A 2 45.714 12.244 -5.497 1.00 0.00 1TIM 152 ATOM 7 CA PRO A 2 46.689 12.815 -4.561 1.00 0.00 1TIM 153 record atom residue coordinates (x, y, z) ! C" ALA ,N ALA = X( ) 2 + Y( ) 2 + Z( ) 2 = 43.240 # 43.888( ) 2 + 11.990 #10.862( ) 2 + #6.915 + 6.231( ) 2 $1.4697 BME110 CompBioTools DL Bernick and CA Rohl '073 Why Examine Protein Structures? • Structure more conserved than sequence • Similar folds often share similar function • Remote similarities may only be detectable at structure level • Interpreting experimental data • Locating sites of interesting mutations • Locating splice sites • Designing experiments • In silico mutagenesis BME110 CompBioTools DL Bernick and CA Rohl '074 Structure Analysis • Identify interesting sites on protein • Measure distances, angles, etc. • Examine surface properties (shape, charge) • Compare two structures • Homologs • Mutants • With and Without Ligands BME110 CompBioTools DL Bernick and CA Rohl '075 Comparing Protein Structures • Defined alignment • Mutant-wildtype, model-native, two different conformations. • Unique solution exists -- we know the true alignment • Derived alignment • Unknown query • Known parent (assumed homolog) • calculate a computationally ‘Optimal’ alignment • infer annotation from parent to query BME110 CompBioTools DL Bernick and CA Rohl '0710 Iterative Dynamic Programming • Algorithm: 1. Make an initial guess for the superposition 2. Calculate all pairwise CA-CA distances and generate a scoring matrix. 3. Find optimal alignment according to this scoring matrix by dynamic programming. 4. Re-superimpose structures using this alignment 5. Repeat step 2-4 until convergence. • No guarantee of optimal solution, final result depends on the initial alignment selected. • Structal: Subbiah et al, 1993 Curr. Biol 3:141) BME110 CompBioTools DL Bernick and CA Rohl '0711 Structural Alignment • Many methods other than dynamic programming are used. • Most methods use some sort of heuristics to speed things up and make good initial guesses: • Sheba Sequence alignment • Mammoth Local structure alignment • VAST aligns secondary structure element vectors • DALI Distance matrix alignment BME110 CompBioTools DL Bernick and CA Rohl '0712 Distance Matrix ALIgnment • Matrix of all pair-wise distances • Characteristic patterns: • Main diagonal runs correspond to helix (i.e local contacts) • Hairpins - start on main diagonal, run perpendicular • Parallel pairs run parallel to main diagonal • Others are long range contacts. • Converts 3D alignment problem to a 2D problem. • Find best subset of rows and columns such that the distance matrices of two proteins are optimally similarMyoglobin BME110 CompBioTools DL Bernick and CA Rohl '0713 Contact Map Comparison Protein G !-helix "-hairpin //-strands Myoglobin BME110 CompBioTools DL Bernick and CA Rohl '0714 Similarity Measures: RMSD • RMSD = root mean square deviation < || xiA-xiB ||2 > 1. Superimpose optimally 2. Pair up residues 3. Calculate RMSD x1A x4A x3A x2A x5A x1B x4B x2B x3B x5B Sensitive to outliers Depends on number of pairs compared A better measure is the significance of this RMSD for similar sized matches BME110 CompBioTools DL Bernick and CA Rohl '0715 Z-scores & P-values • Z-score: # of standard deviations above the mean: • ±1 sd ~66% • ±2 sd ~95% • If we have a histogram, we can just count; Or integrate a function fitted to the histogram. • P-value • Probability of obtaining ! this score under the null model (normally distributed data -- “by chance”) Histogram of scores for random matches P-value for z-score of 1 mean, 0 sd, z-score = 0 1 sd, z-score = 1 2 sd, z-score = 2 z-score = 3 z-score = 4