Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Comparative Genomics in Bioinformatics: Distances, Phylogenies, and Rearrangements - Prof., Study notes of Computer Science

Various aspects of comparative genomics in the context of bioinformatics. Topics include genomic distances, gene-order phylogenies, hybridization, genome duplication, and comparative mapping. Tasks such as explaining differences in gene orders, reconstructing ancestral gene orders, inferring ancestral hybrid genomes, and reconstructing ancestral pre-duplication genomes. Techniques such as the signed version of the problem, reciprocal translocation, and genome rearrangement sorting are also discussed.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-kx0-1
koofers-user-kx0-1 🇺🇸

10 documents

1 / 25

Toggle sidebar

Related documents


Partial preview of the text

Download Comparative Genomics in Bioinformatics: Distances, Phylogenies, and Rearrangements - Prof. and more Study notes Computer Science in PDF only on Docsity! CISC636, S08, Lec23, Liao CISC 636 Intro to Bioinformatics (Spring 2008) Comparative Genomics • Comparative genomics is the analysis and comparison of genomes from different species. The purpose is to gain a better understanding of how species have evolved and to determine the function of genes and noncoding regions of the genome. What are the comparative genome sizes of humans and other organisms being studied? organism estimated size Home sapiens 2900 million bases (human) Rattus norvegicus 2,750 million bases (rat) Afus musculus 2400 million bases (mouse) Drosophila melanogaster (Gruit £9) 180 million bases Arabidopsis thaliana (plant) Caenorhabditis elegans (oundworm) 125 million bases 97 million bases Saccharomyces cerevisiae Gucats 12 million bases Escherichia coli (bacteria) Hi. influenzae (bacteria) 4.7 million bases 1.8 million bases estimated gene number ~30,000 ~30,000 ~30.000 13,600 2 va Un 00 19.100 6300 3200 1700 *“hybrmation extracted fom genome publication papers below. average gene density chromosome number 1 gene per 100,000 bases 46 1 gene per 100.000 bases 42 1 gene per 100.000 bases 40 1 gene per 9.000 bases. g 1 gene per 4000 bases 10 1 gene per £000 bases 12 1 gene per 2000 bases 32 1 gene per 1400 bases 1 1 gene per 1000 bases 1 Genomic Distances. • Tasks: to explain differences in gene orders in two or more genomes in terms of a limited number of rearrangement processes. • For single-chromosome genomes, this requires the calculations of an edit distance between two linear orders on the same set of objects, representing the ordering of homologous genes in two genomes. • In the ''signed'' version of the problem, a plus or minus is associated with each gene, representing the direction of transcription. One edit operation consists of the inversion, or reversal, of any number of consecutive terms in the ordered set, which, in the case of signed orders, also reverses the polarity of each term within the scope of the inversion. • The calculation of the distance for unsigned genomes with inversions only is NP-hard; for signed problem it is of polynomial complexity. For multi-chromosome genomes, another important edit operation is reciprocal translocation, representing the exchange of terminal fragments between two chromosomes. Some formulations of the distance problem for translocation are of polynomial complexity (Hannenhalli-Pevzner algorithms) Gene-order Phylogenies. • Tasks: the reconstruction of ancestral gene orders. This is NP-hard, even with only three input genomes. The number of breakpoints is an alternate, easily computed, genomic distance which, however, is also theoretically hard to generalize to the phylogenetic context. Comparative Mapping. • The simplest model of genomic divergence, deriving from a 1984 study by Nadeau and Taylor, assumes the spatial homogeneity of both breakpoint and gene distributions along the chromosomes. The main focus has been the severe underestimation of the number of segments in comparisons where there are relatively few genes common to the data sets for a pair of species. Genome rearrangement chromosome 1 { ~~ abed — reciprocal —_,_ concen wxy translocation ° chromosome ¢ — inversion—» wxabcyz transposition wx yabez chromoso! me 1’ wxweba yz wabecx yz Gates and Papadimitriou (1979) any permutation can be sorted by at most (n+1)5/3 prefix reversals. Profiling • Treat all metabolic pathways independently • Impose an arbitrary order for enumerating the pathways for all organisms • Encode presence/absence as 1s and 0s. Implication of profiling • In a metabolic pathway profile, an organism is represented as a string of zeros and ones. • The task of comparing genomes by their pathway profile boils down to comparison of strings • No need for alignment, but a scoring scheme is needed. • Profiles can be used from either an organism perspective or a pathway perspective (e.g. find patterns of correlation among pathways, perform pathway reconstruction). Information-based weight approach • To capture the correlations • An intuitive way to including the hierarchical structure of pathways – correlation 1: pi and pj are sibling, or distantly related – correlation 2: frequency (probability) of mismatching at pi • Definitions – Master Tree: All known metabolic pathways are represented as a tree using WIT’s categories – p-tree: derived from the master tree by dropping leaves that correspond to pathways absent from an organism. The Algorithm 1. Overlay two p-trees 2. Score mismatches/matches between two p-trees, label scores at the corresponding leaves on the master tree. 3. Propagate scores bottom-up as follows - average score from siblings - multiply the averaged score by inverse of the depth 1 1 1 1 0 1 1 1 1 1 Example = 0.5 0 1 1 1 1 = 0.9375
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved