Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Genomics and Proteomics, Study notes of Genomics

The notes have been prepared by a thorough reading of the book and understanding of lectures.

Typology: Study notes

2019/2020

Available from 06/17/2022

Ipsi_ta07
Ipsi_ta07 🇮🇳

3 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download Genomics and Proteomics and more Study notes Genomics in PDF only on Docsity! BIOLOGICAL DATABASE Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures. Biological databases are an important tool in assisting scientists to understand and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications and in discovering basic relationships amongst species in the history of life. Biological knowledge is distributed amongst many different general and specialized databases. This sometimes makes it difficult to ensure the consistency of information. Biological databases cross-reference other databases with accession numbers as one way of linking their related knowledge together. An important resource for finding biological databases is a special yearly issue of the journal Nucleic Acids Research (NAR). The Database Issue of NAR is freely available, and categorizes many of the publicly available online databases related to biology and bioinformatics. Biological data is highly complex and interrelated. Vast amount of biological information needs to be stored organized and indexed so that the information can be retrieved and used. There are five major types of databases namely nucleotide databases, protein databases, protein structure databases, metabolic pathway databases and the bibliographic databases. Genome Browser: NCBI ( National centre for Biotechnological information) : NCBI is one of the leading online resources known for providing Biological sequence information. NCBI is maintained by two organizations in US ,National Library of Medicine ( NLM) and National Institute of science ( NIH). As a national resource for molecular biology information, NCBI's mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. More specifically, the NCBI has been charged with creating automated systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics. NCBI is connected to various other sequence databases in order to be more efficient in answering sequence queries. The user queries and sequence information are delivered through NCBI’s search tool called the “entrez” Home Page: NCBI has a simplified homepage from where the user can navigate to different resources. The left side pane of the Homepage has a site map followed by different categories which narrows down the possibility of finding the right sequence. On the right side , you can see the list of popular resources which is very useful for first time users. GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information (NCBI) as part of the International Nucleotide Sequence Database Collaboration (INSDC). The National Center for Biotechnology Information is a part of the National Institutes of Health in the United States. GenBank and its collaborators receive sequences produced in laboratories throughout the world from more than 100,000 distinct organisms. In more than 20 years since its establishment, GenBank has become the most important and most influential database for research in almost all biological fields, whose data were accessed and cited by millions of researchers around the world. GenBank continues to grow at an exponential rate, doubling every 18 months. Entrez: The NCBI database accepts queries and delivers data via a custom made search engine called Entrez. The Home page of NCBI has a search box which directs the user to entrez. Entrez is internally connected to various biological databases which increases the probability of getting the correct information. BLAST: BLAST stands for Basic Local Alignment Search Tool. BLAST is a tools that is used to find the sequences homologous to a particular sequence. The BLAST program was developed by Stephen Altschul of NCBI in 1990 and has since become one of the most popular programs for sequence analysis. BLAST compares all the sequences in the database with the one that is searched for and provides many hits which are usually arranged in the increasing order of the scored obtained. BLAST is available at the URLhttp://blast.ncbi.nlm.nih.gov/ Variants of BLAST BLAST-N: compares nucleotide sequence with nucleotide sequences BLAST-P: compares protein sequences with protein sequences BLAST-X: Compares nucleotide sequences against the protein sequences tBLAST-N: compares the protein sequences against the translations of nucleotide sequences To capture the underlying DNA sequence for the chromosomal position showing in the annotation display, click on the DNA link from the navigation bar. This page contains configuration options for the DNA output format. BLAT BLAT (BLAST-Like Alignment Tool) is a sequence alignment tool. It has the ability to align both DNA and protein sequence to the underlying genome. BLAT on DNA works by keeping an index of the entire genome in memory—it is very fast. BLAT on DNA sequence is designed to quickly find sequences of 95% or greater similarity, of a length of 40 bases or more. Navigate to the BLAT tool by clicking on the BLAT link in the top blue navigation bar. Configure the BLAT page by choosing the genome and assembly to which you would like to align your DNA or protein sequence. Configure your sequence in FASTA format for submission to the BLAT tool. FASTA is a very simple plain text format for displaying nucleotide or protein sequence. For each record, there is one header line that begins with ">" and contains a description or name of the record, followed by one or more lines whose letters represent the DNA or protein sequence. VISTA: VISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. The VISTA family of tools is developed and hosted at Genomics Division of Lawrence Berkeley National Laboratory. This collaborative effort is supported by the Programs for Genomic Applications grant from the NHLBI/NIH and the Office of Biological and Environmental Research, Office of Science, US Department of Energy. Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/vista/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, to submit their own sequences of interest to several VISTA servers for various types of comparative analysis and to obtain detailed comparative analysis results for a set of cardiovascular genes. There are two ways of using VISTA - you can submit your own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. There are multiple VISTA servers, each allowing different types of searches. • mVISTA can be used to align and compare your sequences to those of multiple other species • GenomeVISTA allows the comparison of sequences with whole genome assemblies. It will automatically find the ortholog, obtain the alignment and VISTA plot. It allows the viewing of an alignment together with pre-computed alignments of other species in the same interval. • wgVISTA allows the alignment of sequences up to 10Mb long (finished or draft) including microbial whole-genome assemblies. VISTA TOOLS The web page http://www-gsd.lbl.gov/vista/ serves as a portal for access to the suite of VISTA tools. One of them is VISTA Browser, which allows the user to view pre-computed whole- genome alignments of many species. There are three VISTA servers, GenomeVISTA, mVISTA and rVISTA, that allow the user to submit DNA sequences for analysis. For GenomeVISTA the user submits a single sequence (draft or finished) which is compared with publicly available completed whole-genome assemblies. mVISTA is the original program, designed for comparison of orthologous sequences of different species. MODEL ORGANISIM’S GENOMES AND DATABASES: Most of our knowledge about the basic properties of metabolism, growth, and division in living cells is a result of studies on species described as “model organisms”. These species include the bacterium Escherichia coli, bakers’ yeast (Saccharomyces cerevisiae), the fruit fly (Drosophila melanogaster), the nematode worm (Caenorhabditis elegans), and the mouse (Mus musculus). Model organism databases (MODs) host the genomic and functional information produced by organism-specific research projects and provide query and visualization tools to access these data. At every stage of the scientific process, MODs contribute to basic and applied research. By consulting MODs, researchers can easily find background information on large sets of genes, such as those involved in a biological process or implicated in a disease. MOD users can thus plan experiments efficiently, combine their data with existing knowledge, and construct novel hypotheses. The genome of the bacterium Escherichia coli Most prokaryotic cells contain their genetic material in the form of a large circular piece of double-stranded DNA, usually less than 5 Mb long. In addition, they may contain plasmids. The protein-coding regions of bacterial genomes do not contain introns. In many prokaryotic genomes the protein-coding regions are partially organized into operons – tandem genes transcribed into a single messenger RNA molecule under common transcriptional control. The typical prokaryotic genome contains only a relatively small amount of non-coding DNA (in comparison with eukaryotes), distributed throughout the sequence. In E. coli only ~11% of the DNA is non-coding. E. coli, strain K-12, has long been the workhorse of molecular biology, the genome of strain MG1655, published in 1997 by the group of F. Blattner at the University of Wisconsin, contains 4639 221 bp in a single circular DNA molecule, with no plastids. Approximately 89% of the sequence codes for proteins or structural RNAs. An inventory reveals: • 4285 protein-coding genes • 122 structural RNA genes • non-coding repeat sequences Genome database of E. coli: The EcoGene database provides a set of gene and protein sequences derived from the genome sequence of Escherichia coli K-12. EcoGene is a source of re-annotated sequences for the SWISS-PROT and Colibri databases. EcoGene is used for genetic and physical map compilations in collaboration with the Coli Genetic Stock Centre. The EcoGene12 release includes 4293 genes.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved