Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Bioinformatics notes, Lecture notes of Bioinformatics

These bioinformatics notes explain in detail about all the biological databases and tools used in bioinformatics

Typology: Lecture notes

2020/2021

Available from 03/01/2023

knowledgebyprats
knowledgebyprats 🇮🇳

3 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Bioinformatics notes and more Lecture notes Bioinformatics in PDF only on Docsity! BIOINFORMATICS ASSIGNMENT BIOLOGICAL DATABASES In simple language, a database is a systematic collection of data or information, stored and accessed electronically from a computer system. Thus, a biological database is organised collection of biological information which can be accessed, managed and updated easily. There are different types of biological databases like nucleotide databases, gene databases, protein databases, metabolic pathway databases etc. There are two types of biological databases: 1) primary database 2) derivative or secondary database A primary database is that database in which experimental results are directly converted into databases. These are the original submissions by the experimentalists. Examples are GenBank and GEO. Secondary or derived database are those databases which contain the results of analysis of the primary databases. Examples are Refseq and Uniprot. Features of Databases The features of a database are: 1) A database should be easy to understand. 2) A database should be simple. 3) It should be easy to search and locate. 4) It should be annotated but not that much. 5) A database should have minimum redundancy that is data stored in a database should not exist in multiple locations. 6) It should be cross referenced. LITERATURE DATABASE PubMed PubMed is a literature database and is maintained and created by National Library of Medicine, National Center for Biotechnology and National Institutes of Health. It basically contains the abstracts on journal articles and on various topics like life science, chemical science, MEDLINE and bioinformatics. It also provides additional links from various websites related to the search. All citations in MEDLINE are assigned MeSH Terms and Publication Types from NLM;s controlled vocabulary. The biggest disadvantage of PubMed is that it does not contain the full articles for most journals. It may link a bibliographic record to the full text on the journal website. Whether the article will be free for public or not depends on the author. SEQUENCE DATABASES GenBank GenBank is a publicly available comprehensive database mostly used for nucleotide sequences and proteins. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. It is a primary database. It exchanges data on daily basis exchange with the European Nucleotide Archive and the DNA Data Bank of Japan which ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. 1) Family: Proteins are kept in this group on the basis of two criteria – first, all proteins that have residue identities of 30% and greater; second, proteins with lower sequence identities but whose functions and structures are very similar. 2) Superfamily: Families that are not that much similar but their functional features suggest that they have a common evolution. 3) Fold: The superfamilies and families that have the same major secondary structures in the same arrangement and with the same topological connections. The structural similarities of proteins in the same fold category probably arise from the physics and chemistry of proteins favouring certain packing arrangements and chain topologies. 4) Class: The different folds have been grouped into classes. Most of the folds are divided in four classes as follows: (a) all-α, those whose structure is formed by α-helices; (b) all-β, those whose structure is essentially formed by β-helices; (c) α/β, those with α-helices and β-strands (d) α+β, those in which α-helices and β-strands are largely segregated SCOP includes not only the proteins in the current version of PDB, but many proteins for which they are published descriptions but whose co-ordinates are not yet available. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database so far. CATH CATH stands for Class Architecture Topology Homologous superfamily. It is a free protein structure classification database. It classifies proteins on the basis of: # (C)lass: The three main classes are – α proteins, those whose structure is essentially formed by α-helices; all-β proteins, those whose structure is essentially formed by β-sheets; α/β proteins, those with α-helices and β-strands; # (A)rchitecture: Architecture of a protein is the shape of the domain; it does not include the connectivity. # (T)opology: The topology level contains the structures with the same numbers, arrangement and connectivity. #(H)omologous superfamily: The proteins having high structural similarity is kept in this hierarchy level, which suggests us that they have evolved from a common ancestor. # Sequence family: The proteins having similarity greater than 35% are kept in this category which again suggests us that they have evolved from a common ancestor. One big disadvantage of CATH is that it classifies only the protein structures that are in PDB bank. CATH-Gene 3D As we know, CATH is a protein database which takes it’s structures from PDB. Gene 3D uses the protein structure information from CATH and they are split into their consecutive polypeptide chains where applicable. Now their protein domains are identified and classified on the basis of CATH hierarchy level. Uses of CATH: It tells us that how secondary structures are connected with each other, how proteins are evolved, helps in finding out the conserved sites, predicts the 3D structure of protein. KEGG (Kyoto Encyclopaedia of Genes and Genomes) KEGG is a biological database which provides information about the genes and genomes, chemical reactions, systems for the basic understanding of biological systems and diseases and drugs. It is a group of sixteen databases which are categorized as systems information, genomic information, chemical information and health information. It also has a special feature of pathway maps which helps us in understanding how these pathways or the reactants and substrates of these pathways are interrelated to each other. It is a secondary type of database; it collects data from GenBank and others. It is managed by the Kanehisa Laboratories in the Bioinformatics Center of Kyoto University and the Human Genome Center of the University of Tokyo. Pathway Maps: KEGG Pathway Maps are manually drawn maps/diagrams that show us the interactions of different molecules. They mainly talk about the metabolic pathways and how they are interconnected to each other. The KEGG database only has one disadvantage that is does not contain information about most of the organisms. ADVANTAGES OF PROTEIN DATABASES: 1) Although the DNA encodes the necessary information for life but it is the proteins that carry out the dynamic process of life maintenance, replication, defence and reproduction. Also, the recognition system of immune system depends on protein interactions. So, for understanding all these mechanisms the understanding of protein structures and sequences is required. 2) Protein structures or the predicted protein structures help in drug screening and designing. 3) Different protein databases help in comparison between proteins and provide information about the relationship between proteins within a genome or across different species and hence offer much more information that can be obtained by studying only one isolated protein. 4) The biological catalysts enzymes are also a type of protein and the beneficial knowledge provided by understanding of these enzymes can help us in synthesizing any new catalyst by mimicking or modifying it to perform better for industrial use.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved