Download Bioinformatics: Building Local Genomic Databases and Key Gene Ontology Features and more Study notes Computer Science in PDF only on Docsity! 8/19/2005 Su-Shing Chen, CISE 1 CAP 5510-2 BIOINFORMATICS Su-Shing Chen CISE 8/19/2005 Su-Shing Chen, CISE 2 Building Local Genomic Databases Genomic research integrates sequence data with gene function knowledge. Gene ontology to represent the knowledge in local genomic databases. Multiple organisms and gene products (e.g., proteins) with their functions.NCBI Entrez database with functions collected from other databases: Local SEED database, SWISS PROT, KEGG 8/19/2005 Su-Shing Chen, CISE 5 Key Gene Ontology Features http://www.geneontology.org/doc/gene_ontology_discussion.html Where is a gene expressed? Spatial problem: organism’s anatomy. What is the subcellular localization of a gene product? Subcellular anatomy. When is a gene expressed? Temporal problem:organism’s ontogeny. What is the function of a gene product? Functional classification of gene products. 8/19/2005 Su-Shing Chen, CISE 6 Key Gene Ontology Features http://www.geneontology.org/doc/gene_ontology_disc ussion.html Of what larger process is the gene product function a part? Process hierarchy. By what process is a gene’s activities controlled? Regulatory hierarchy. Of what larger complex is this function a component? Parts-list of multicomponent complexes. What genes in species A have the function of gene X in species B? Functional classification of species A and B. 8/19/2005 Su-Shing Chen, CISE 7 Gene Ontology Consortium GO Consortium: SGD (Saccharomyces), FlyBase (Drosophila), MGD/GXD (Mouse), TAIR (Arabidopsis), Caenorhabditis elegans. Goals: 1. To compile a comprehensive structured vocabulary of terms, synonyms, biological dimensions (DNA metabolism, molecular function, cell). 2. To describe biological objects using these terms. 3. To provide tools for querying and manipulating vocabularies. 4. To provide tools to assign GO terms to biological objects (sequence, annotation, microarray, protein binding experiments). 8/19/2005 Su-Shing Chen, CISE 10 Ontology Structure & Standards The ontologies are structured vocabularies in the form of directed acyclic graphs (DAG’s) that represent a network of childs and parents (is-a or part-of). See http://www.geneontology.org/. 8/19/2005 Su-Shing Chen, CISE 11 Database Management Systems A DBMS is a software for keeping computerized records about an enterprise and for querying information in the records. DBMS models: hierarchical, network, relational, and object-oriented. SQL (Structured Query Language) is a database language. Logical database design: Entity-relation and object-orientation. Physical database design: Indexing, storage, organization. 8/19/2005 Su-Shing Chen, CISE 12 A database is a set of named tables (relations) Columns (Attributes) Rows (Tuples) A relational schema = the set of attributes of a table 8/19/2005 Su-Shing Chen, CISE 15 Entity-Relationship for Gene Product Gene-Product Enzyme-ReactionMetabolic Pathway Reaction Species Genome Linkage-Group MapLocus Term | ; -(0
Objects | Protocols | Controlled Values | Named Primitives | ASDT's | Options | Refresh | Print | Close | Help |
(SRR eee eis BEI Es
Objects | Protecols | Controlled Values | Named Primitives | ASDT's | Options | Refresh | Print | Close | Help |
Object Classes
Anything Ee EnzymeCatalyzedReaction
AUDIT_TRAIL
breakpoints
ccref —{io_ |} (INTEGER)
CloneLibrary
Column -—__Name_}—_____— VARCHAR(120}
Column_Joins
constraints Reaction
cp
DNARNAlsolatio GeneProduct
Entity Fa
Environment + Species |__| SpeciesGenome
Environments pe
[max }- roar
Field! __
fields EnzymeCatalyzedReaction |_| Substratespecificity VARCHAR(255)
GelPattern i
Sonar OptimumpH FLOAT
GeneProduc —(__opaimunterp roar
genes
ID_DATE —{_temments | text
Journal
KaryotypicVariat <| Next | Bp [ NextStep } EnzymeCatalh,
LCbin a
Ig <{ Prev | & {__ PreviousStep Enzym
LinkageGroup
Locus
Map ~
4 r a
OBJECT CLASS [EnzymeCatalyzedReaction]; HAS IDS ID_. (Only local attributes shown in this browser.)
is tele
Ll
#130) |Java Applet Window
ie |
8/19/2005 Su-Shing Chen, CISE 17 Generalization Hierarchies Several types of entities with common attributes can be generalized into a higher-level entity type. Conversely an entity can be decomposed into lower-level entities. 8/19/2005 Su-Shing Chen, CISE 20 Object Model - Biological Objects Genomic Objects Enzyme Objects Sequence Objects Structure Objects Experiment Objects Variation Objects Mapping Objects Citation - Literature + References Registry - People + Organizations External Links - Databases 8/19/2005 Su-Shing Chen, CISE 21 Dynamic Model Biochemical Processes Metabolic Pathways Signal Transduction Pathways Neural Networks 8/19/2005 Su-Shing Chen, CISE 22 DATA TYPES: An instance or object of the class contains values for the class attributes stored in the database Text (clone name) Number (insert size) Restricted Value (DNA type) List (people) Table (complex related attributes) Association (gene to gene-product: protein) Sequence Pointer (other databases) 8/19/2005 Su-Shing Chen, CISE 25 Object-oriented concepts Object and object identity Encapsulation Message passing Complex object Object class/type Inheritance Polymorphism and run-time binding Persistance 8/19/2005 Su-Shing Chen, CISE 26 Private memory Data + Operation Public Interface Operation Spec Any thing (physical object, abstract concept, event, function, process) can be modeled as object. Data: instance variables, attributes, slots. Operations: methods, actions, behaviors. OBJECT 8/19/2005 Su-Shing Chen, CISE 27 CLASS protein DATA sequence structure OPERATION function Object type declaration OBJECT CLASS Container of object instances Protein instances Protein Class 8/19/2005 Su-Shing Chen, CISE 30 MESSAGE PASSING A B Source object (sender) Target object (receiver) Message= (objectB, methodX, parameter, return value) Return message 8/19/2005 Su-Shing Chen, CISE 31 COMPLEX OBJECT CLASS - Gene Product Class Gene product RNA protein gene Trypsin PRSS1 8/19/2005 Su-Shing Chen, CISE 32 COMPLEX OBJECT CLASS - (Biological) Polymorphism Class (Biological) Polymorphism Class Polymorphism Object Allele Set Fragments in kb’s Sizes detected in a polymorphism Allele frequency Population Alleles Detection method 8/19/2005 Su-Shing Chen, CISE 35 Type: Genetic map Physical map Contig map Transcript map Radiation hybrid map Cytogenetic map Mapped Entity: Amplimer Sequencing region Bin Syndromic region Breakpoint Syntenic region Chromosome Cell line Chromosome reagent Library Clone Contig CpG Island Cytogenetic marker EST Gene Gene element Regulatory region Repeat 8/19/2005 Su-Shing Chen, CISE 36 INHERITANCESEukaryote operations: exons, introns Animal FungiPlant operations: leaves SUPERCLASS CLASS Hominidae CanidaeSUBCLASS Man operations: chromosomeY Woman Dog Wolf Coyote SUB-SUB- CLASS exons introns chromosomeY exons introns leaves 8/19/2005 Su-Shing Chen, CISE 37 Advantages of Inheritance Reuse of object type declaration. Reuse of software implementations. Modularization of complex problems. 8/19/2005 Su-Shing Chen, CISE 40 nucleotide sequence gene clone amplimer (PCR primer) Relation: aplimers are contained in genes Relation: aplimers are contained in clones Relation: aplimers from clones overlap genes POLYMORPHISM - MUTATION aggregation 8/19/2005 Su-Shing Chen, CISE 41 Persistent Databases Class Libraries Design Tools Query Tools API Database Manager Object Manager Object-Oriented DBMS Architecture query, transaction, schema management, concurrency control, type management, versioning, object caching page management, object locking, disk access, logging, recovery, transaction commit 8/19/2005 Su-Shing Chen, CISE 42 ---------------- next grouping is phylogenetic data [family/superfamily classification] [species]+[tissue]+[cell type]+[localization in cell]+[state of maturity(embryo, juvenile, adult, unspecified)] [genus] [phylum] [kingdom] [cDNA sequence] [aa sequence] [bibliography for sequences] PHYLOGENETIC DATA 8/19/2005 Su-Shing Chen, CISE 45 Expression Molecular Dynamics Degradation Turnover MOLECULAR DYNAMICS 8/19/2005 Su-Shing Chen, CISE 46 next grouping is for applications significance [human or veterinary health significance, if any known] [bibliography for human or veterinary health significance] [biotech significance, if any known] [bibliography for biotech significance] [agricultural significance, if any known] [bibliography for agricultural significance] APPLICATIONS 8/19/2005 Su-Shing Chen, CISE 47 Health Biotech Agriculture Applications APPLICATIONS 8/19/2005 Su-Shing Chen, CISE 50 next set of entries is for structural information [experimentally determined structures] [bibliography for experimentally determined structures] [model-built structures] [bibliography for model-built structures] [partial structural information -- cd spectra, solution nmr, cysteine scanning, antibody labelling, identification of glycosylation or phosphorylation sites, etc.] [bibliography for partial structural information] STRUCTURAL INFORMATION 8/19/2005 Su-Shing Chen, CISE 51 Structural Information Experimental Structures Bibliography Model Structures Bibliography Bibliography Partial Structure Information STRUCTURAL INFORMATION Entity
Primitives
Reactors
CellComponents
8/19/2005 Su-Shing Chen, CISE 52
8/19/2005 Su-Shing Chen, CISE 55 2002 Fall Home Work 3 Due 11/7 Use NCBI Entrez structure database to get all structure (if available) coordinates data of your data set (2 bacteria and all BLAST annotations) Create flat files of structure data and visual data using Cn3D. 8/19/2005 Su-Shing Chen, CISE 56 Gene SequenceRefSeq Locus D/E/G Protein Sequence CDS Anno. G. Sequence Anno. P. Sequence BLAST BLAST CDS FunctionsFunctionsFunctions GO Databases CAP 5510 Bacteria & Fungi Functional Database Protein Structure A. P. Structure