Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Exploring Protein Structures: Visualization, Comparison, and Classification, Summaries of Bioinformatics

The importance of protein structural visualization using atomic models from PDB data files. It also covers protein structure comparison methods, such as superposition and combined approaches, and their applications in structural classification. The document mentions SCOP and CATH databases and their hierarchical levels for protein structure classification.

Typology: Summaries

2020/2021

Uploaded on 08/08/2021

ahmed-mohammed-16
ahmed-mohammed-16 🇪🇬

5

(1)

3 documents

1 / 23

Toggle sidebar

Related documents


Partial preview of the text

Download Exploring Protein Structures: Visualization, Comparison, and Classification and more Summaries Bioinformatics in PDF only on Docsity! CHAPTER13 Protein Structure Visualization, Comparison, and Classification computer visualization programs is interactivity, which allows users to visually manipulate the structural images through a graphical user interface. At the touch of a mouse button, a user can move, rotate, and zoom an atomic model on a computer screen in real time, or examine any portion of the structure in great detail, aswell as drawit in various formsin different colors PROTEIN STRUCTURAL VISUALIZATION Protein Data Bank (PDB) data file for a protein structure contains only x, y, and z coordinates of atoms the most basic requirement for a visualization program is to build connectivity between atoms to make a view of a molecule. Molecular structure visualization forms: 1. A wire-frame diagram is a line drawing representing bonds between atoms. The wire frame is the simplest form of model representation and is useful for localizing positions of specific residues in a protein structure, or for displaying a skeletal form of a structure when Ca atoms of each residue are connected. 2. Balls and sticks are solid spheres and rods, representing atoms and bonds, respectively. These diagrams can also be used to represent the backbone of a structure. 3. a space-filling representation each atom is described using large solid spheres with radii corresponding to the van der Waals radii of the atoms. 4. Ribbon diagrams use cylinders or spiral ribbons to represent a-helices and broad, flat arrows to represent /-strands. This type of representation is very attractive in that it allows easy Figure 4: Simplified representation showing steps involved in the structure superposition of two protein molecules. (A) Two protein structures are positioned in different places in a three dimensional space. Equivalent positions are identified using a sequence based alignment approach. (B) To superimpose the two structures, the first step is to move one structure (/e/t) relative to the other (right) through lateral and vertical movement, which is called translation. (C) The left structure is then rotated relative to the reference structure until such a point that the relative distances between equivalent positions are minimal. 2. Intramolecular Method -The intramolecular approach relies on structural internal statistics and therefore does not depend on sequence similarity between the proteins to be compared. In addition, this method does not generate a physical superposition of structures, but instead provides a quantitative evaluation of the structural similarity between corresponding residue pairs. -The method works by generating a distance matrix between residues of the same Protein. In comparing two protein structures, the distance matrices from the two Structures are moved relative to each other to achieve maximum overlaps 3. Combined Method A recent development in structure comparison involves combining both inter- and Intramolecular approaches. -In the hybrid approach, corresponding residues can be identified using the intramolecular method. Subsequent structure superposition can be performed based on residue equivalent relationships. - In addition to using RMSD as a measure during alignment, additional structural properties such as secondary structure types, torsion angles, accessibility ,and local hydrogen bonding environment can be used. Dynamic programming is often employed to maximize overlaps in both inter- and intramolecular comparisons. PROTEIN STRUCTURE CLASSIFICATION One of the applications of protein structure comparison is structural classification. The ability to compare protein structures allows classification of the structure data and identification of relationships among structures. The reason to develop a protein structure classification system is to establish hierarchical relationships among protein structures and to provide a comprehensive and evolutionary view of known structures the two most popular classification schemes are SCOP and CATH, both of which contain a number of hierarchical levels in their systems 1-SCOP SCOP : is a database for comparing and classifying protein structures. It is constructed almost entirely based on manual examination of protein structures The proteins are grouped into hierarchies of classes, folds, superfamilies, and families The SCOP families: consist of proteins having high sequence identity (>30%). Thus, the proteins within a family clearly share close evolutionary relationships and normally have the same functionality. The protein structures at this level are also extremely similar. Superfamilies: consist of families with similar structures, but weak sequence similarity. It is believed that members of the same superfamily share a common ancestral origin, although the relationships between families are considered distant. Folds: consist of superfamilies with a common core structure, which is determined manually. This level describes similar overall secondary structures with similar orientation and connectivity between them. Members within the same fold do not always have evolutionary relationships. Some of the shared core structure may be a result of analogy. Classes: consist of folds with similar core structures. This is at the highest level of the hierarchy, which distinguishes groups of proteins by secondary structure compositions such as all a, all 8, a and f, and so on. CHAPTER 17 Genome Mapping, Assembly, and Comparison Genomics is the study of genomes. Genomic studies are characterized by simultaneous analysis of a large number of genes using automated data gathering tools. The topics of genomics range from genome mapping, sequencing, and functional genomic analysis to comparative genomic analysis. Genomic study can be tentatively divided into structural genomics and functional genomics. -Structural genomics refers to the initial phase of genome analysis, which Includes construction of genetic and physical maps of a genome, identification of genes, annotation of gene features, and comparison of genome structures -Functional genomics refers to the analysis of global gene expression and gene functions in a genome. GENOME MAPPING The first step to understanding a genome structure is through genome mapping, which is a process of identifying relative locations of genes, mutations or traits on a chromosome. There are three type of mapping such as > linkage maps > physical maps > cytologic maps which describe genomes at different levels of resolution.Their relations relative to the DNA sequence on a chromosome are illustrated in . More details of each type of genome maps are discussed next. Genetic linkage maps, also called genetic maps, identify the relative positions of genetic markers on a chromosome and are based on how frequent the markers are inherited together. The rationale behind genetic mapping is that the closer the two Physical maps are maps of locations of identifiable landmarks on a genomic DNA regardless of inheritance patterns. The distance between genetic markers is measured directly as kilobases (Kb) or megabases (Mb). Because the distance is expressed in physical units, it is more accurate and reliable than centiMorgans used in genetic maps. 10 Cytologic maps refer to banding patterns seen on stained chromosomes, which can be directly observed under a microscope. The observable light and dark bands are the visually distinct markers on a chromosome. ovcicviclmas CTE TTD oa o9° Genetic map Physical map Se [=== DNA sequence AAG TGACTCATGACTGA Figure : Overview of various genome maps relative to the genomic DNA sequence. The maps represent different levels of resolution to describe a genome using genetic markers. Cytologic maps are obtained microscopically. Genetic maps (grey bar) are obtained through genetic crossing experiments in which chromosome re combinations are analyzed. Physical maps are obtained from overlapping clones identified by hybridizing the clone fragments (grey bars) with common probes (grey asterisks). GENOME SEQUENCING There are two major strategies for whole genome sequencing: 1. -The shotgun approach randomly sequences clones from both ends of cloned DNA. This approach generates a large number of sequenced DNA fragments. The number of random fragments has to be very large. Generally, the genome has to be redundantly sequenced in such a way that the overall length of the fragments covers the entire genome multiple times. -The hierarchical genome sequencing approach is similar to the shotgun approach, but on a smaller scale. The chromosomes are initially mapped using the physical mapping strategy. Longer fragments of genomic DNA (100 to 300 kB) are obtained and cloned into a high-capacity bacterial vector called bacterial artificial chromosome (BAC). Based on the results of physical mapping, the locations and orders of the BAC clones on a chromosome can be determined. advantages and disadvantages > > > > > The hierarchical approach is slower and more costly than the shotgun approach because it involves an initial clone-based physical mapping step. once the map is generated, assembly of the whole genome becomes relatively easy and less error prone. In contrast, the whole genome shotgun approach can produce a draft sequence very rapidly because it is based on the direct sequencing approach. Although the approach has been successfully employed in sequencing small microbial genomes, for a complex eukaryotic genome that contains high levels of repetitive sequences, such as the human genome, the full shotgun approach becomes less accurate and tends to leave more “holes” in the final assembled sequence than the hierarchical approach. Current genome sequencing of large organisms often uses a combination of both approaches. 12 Gene number 50,000 40,000 30,000 20,000 10,000 | i [I] $ - gs £§ £ ¥ eg ¢ é # é é f °° os Figure Gene numbers estimated from several sequenced eukaryotic genomes. 15 CHAPTER EIGHTEEN Functional Genomics The field of genomics encompasses two main areas, structural genomics and functional Genomics > Structural genomics: deals with genome structures with a focus on the study of genome mapping and assembly as well as genome annotation and comparison; > Functional Genomics: is largely experiment based with a focus on gene functions at the whole genome level using high throughput approaches The high throughput analysis of all expressed genes is also termed transcriptome analysis, which is the expression analysis of the full set of RNA molecules produced by a cell under a given set of conditions Transcriptome analysis using ESTs, SAGE, and DNA microarrays forms the core of functional genomics and is key to understanding the interactions of genes and their regulation at the whole-genome level. SEQUENCE-BASED APPROACHES Expressed Sequence Tags(EST) One of the high throughput approaches to genome-wide profiling of gene expression is sequencing expressed sequence tags(ESTs). ESTs are short sequences obtained from cDNA clones and serve as short identifiers of full-length genes. ESTs are typically in the range of 200 to 400 nucleotides in length obtained from either the 5_ end or 3_ end of cDNA inserts. EST sequences are often of low quality because they are automatically generated without verification and thus contain high error rates. Although these limitations, EST technology is still widely used. This is because EST libraries can be easily generated from various cell lines, tissues, organs, and at various developmental stages. Vv Vv VV VE 16 N vv wv SAGE (Serial analysis of gene expression) SAGE is another high throughput, sequence-based approach for global gene expression profile analysis. Unlike EST sampling, SAGE is more quantitative in determining mRNA expression in a cell. In this method, short fragments of DNA (usually 15 base pairs [bp]) are excised from cDNA sequences and used as unique markers of the gene transcripts. This approach is much more efficient than the EST analysis in that it uses a short nucleotide tag to define a gene transcript and allows sequencing of multiple tags in a single clone. In a SAGE experiment, sequencing is the most costly and time-consuming step. It is difficult to know how many tags need to be sequenced to get a good coverage of the entire transcriptome. Another obvious drawback with this approach is the sensitivity to sequencing errors owing to the small size of oligonucleotide tags for transcript representation. One or two sequencing errors in the tag sequence can lead to ambiguous or erroneous tag identification. Another fundamental problem with SAGE is that a correctly sequenced SAGE tag sometimes may correspond to several genes or no gene at all. To improve the sensitivity and specificity of SAGE detection, the lengths of the tags need to be increased for the technique. MICROARRAY-BASED APPROACHES The most commonly used global gene expression profiling method in current genomics research is the DNA microarray-based approach. A microarray (or gene chip) is a slide attached with a high-density array of immobilized DNA oligomers (sometimes cDNAs) representing the entire genome of the species under study. Atypical DNAmicroarray experiment involves amulti step procedure: o fabrication of microarrays by fixing properly designed oligonu cleotides representing specific genes; o hybridization of cDNA populations onto the microarray; o scanning hybridization signals and image analysis transformation and normalization of data ; o analyzing data to identify differentially expressed genes as well as sets of genes that are coregulated 17 k-Means Clustering. In contrast to hierarchical clustering algorithms, k-means clustering does not produce a dendrogram, but instead classifies data through a single step partition. Thus, it is a divisive approach. In this method, data are partitioned into k-clusters, which are prespecified at the outset. The value of k is normally randomly set but can be adjusted if results are found to be unsatisfactory e * -@ °. e . e s °@. |° -® -@ Make random assignments to k Reassign points to nearest centroids Reassign data points, until distances clusters (k = 4) and compute centroids and re-compute centroids. Retain of points to centroids are stable. (big dots) nearest points to centroids. Figure Example of k-means clustering using four partitions. Closeness of data points is indicated by resemblance of colors (see color plate section). Self-Organizing Maps. Clustering by SOMs is in principle similar to the k-means method. This pattern recognition algorithm employs neural networks. It starts by defining a number of nodes. The data points are initially assigned to the nodes at random. The distance between the input data points and the centroids are calculated. The data points are successively adjusted among the nodes, and their distances to the centroids are recalculated. After many iterations, a stabilized clustering pattern are reached with the minimum distances of the data points to the centroids. COMPARISON OF SAGE AND DNA MICROARRAYS SAGE and DNA microarrays are both high throughput techniques that determine Global mRNA expression levels. A number of comparative studies have indicated That the gene expression measurements from these methods are largely consistent with each other .However, the two techniques have important differences. >» First, SAGE does not require prior knowledge of the transcript sequence, whereas DNA microarray experiments can only detect the genes spotted on the microarray. Because SAGE is able to measure all the mRNA expressed in a sample, it has the potential to allow discovery of new, yet unknown gene transcripts. > Second, SAGE measures “absolute” mRNA expression levels without arbitrary reference standards, where as DNA microarrays indicate the 20 relative expression levels. Therefore, SAGE expression data are more comparable across experimental conditions and platforms. This makes public SAGE databases more informative by allowing comparison of data from reference conditions with various experimental treatments. Third, the PCR amplification step involved in the SAGE procedure means that it requires only a minute quantity of sample mRNA. This compares favorably to the requirement for a much larger quantity of mRNA for microarray experiments, which may be impossible to obtain under certain circumstances. Fourth, collecting a SAGE library is very labor intensive and expensive compared with carrying out a DNA microarray experiment, however. Therefore, SAGE is not suitable for rapid screening of cells whereas the microarray analysis is. Fifth, Gene identification from SAGE data is also more cumbersome because the mRNA tags have to be extracted, compiled, and identified computationally, whereas in DNA microarrays, the identities of the probes are already known. In SAGE, comparison of gene expression profiles to discover differentially expressed genes and coexpressed genes is performed manually, whereas for microarrays, there are a large number of software algorithms to automate the process. 21 CHAPTER NINETEEN Proteomics Proteome refers to the entire set of expressed proteins in a cell. In other words, it is the full complement of translated product of a genome. Proteomics is simply the study of the proteome. ®:eo-== Understanding the proteome allows for: " Characterisation of proteins " Understanding protein interactions " Identification of disease biomarkers PROTEOMICS APPLICATIONS > Proteome Mining:Identifying as many as possible of the proteins in your sample > Protein-protein interactions Protein network mapping Determining how the proteins interact with each other in living systems > Post-translational modifications Identifying how and where the proteins are modified > Structural Proteomics > Functional proteomics TYPES OF PROTEOMICS > Expression proteomics > Structural proteomics > Functional proteomics 22
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved