Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Comparative Genomics in Bioinformatics: Distances, Phylogenies, and Rearrangements - Prof., Study notes of Computer Science

University of Delaware (UD)Computer Science

Prof. Li Liao

Various aspects of comparative genomics in the context of bioinformatics. Topics include genomic distances, gene-order phylogenies, hybridization, genome duplication, and comparative mapping. Tasks such as explaining differences in gene orders, reconstructing ancestral gene orders, inferring ancestral hybrid genomes, and reconstructing ancestral pre-duplication genomes. Techniques such as the signed version of the problem, reciprocal translocation, and genome rearrangement sorting are also discussed.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-kx0-1 🇺🇸

10 documents

1 / 25

Partial preview of the text

Download Comparative Genomics in Bioinformatics: Distances, Phylogenies, and Rearrangements - Prof. and more Study notes Computer Science in PDF only on Docsity! CISC636, S08, Lec23, Liao CISC 636 Intro to Bioinformatics (Spring 2008) Comparative Genomics • Comparative genomics is the analysis and comparison of genomes from different species. The purpose is to gain a better understanding of how species have evolved and to determine the function of genes and noncoding regions of the genome. What are the comparative genome sizes of humans and other organisms being studied? organism estimated size Home sapiens 2900 million bases (human) Rattus norvegicus 2,750 million bases (rat) Afus musculus 2400 million bases (mouse) Drosophila melanogaster (Gruit £9) 180 million bases Arabidopsis thaliana (plant) Caenorhabditis elegans (oundworm) 125 million bases 97 million bases Saccharomyces cerevisiae Gucats 12 million bases Escherichia coli (bacteria) Hi. influenzae (bacteria) 4.7 million bases 1.8 million bases estimated gene number ~30,000 ~30,000 ~30.000 13,600 2 va Un 00 19.100 6300 3200 1700 *“hybrmation extracted fom genome publication papers below. average gene density chromosome number 1 gene per 100,000 bases 46 1 gene per 100.000 bases 42 1 gene per 100.000 bases 40 1 gene per 9.000 bases. g 1 gene per 4000 bases 10 1 gene per £000 bases 12 1 gene per 2000 bases 32 1 gene per 1400 bases 1 1 gene per 1000 bases 1 Genomic Distances. • Tasks: to explain differences in gene orders in two or more genomes in terms of a limited number of rearrangement processes. • For single-chromosome genomes, this requires the calculations of an edit distance between two linear orders on the same set of objects, representing the ordering of homologous genes in two genomes. • In the ''signed'' version of the problem, a plus or minus is associated with each gene, representing the direction of transcription. One edit operation consists of the inversion, or reversal, of any number of consecutive terms in the ordered set, which, in the case of signed orders, also reverses the polarity of each term within the scope of the inversion. • The calculation of the distance for unsigned genomes with inversions only is NP-hard; for signed problem it is of polynomial complexity. For multi-chromosome genomes, another important edit operation is reciprocal translocation, representing the exchange of terminal fragments between two chromosomes. Some formulations of the distance problem for translocation are of polynomial complexity (Hannenhalli-Pevzner algorithms) Gene-order Phylogenies. • Tasks: the reconstruction of ancestral gene orders. This is NP-hard, even with only three input genomes. The number of breakpoints is an alternate, easily computed, genomic distance which, however, is also theoretically hard to generalize to the phylogenetic context. Comparative Mapping. • The simplest model of genomic divergence, deriving from a 1984 study by Nadeau and Taylor, assumes the spatial homogeneity of both breakpoint and gene distributions along the chromosomes. The main focus has been the severe underestimation of the number of segments in comparisons where there are relatively few genes common to the data sets for a pair of species. Genome rearrangement chromosome 1 { ~~ abed — reciprocal —_,_ concen wxy translocation ° chromosome ¢ — inversion—» wxabcyz transposition wx yabez chromoso! me 1’ wxweba yz wabecx yz Gates and Papadimitriou (1979) any permutation can be sorted by at most (n+1)5/3 prefix reversals. Profiling • Treat all metabolic pathways independently • Impose an arbitrary order for enumerating the pathways for all organisms • Encode presence/absence as 1s and 0s. Implication of profiling • In a metabolic pathway profile, an organism is represented as a string of zeros and ones. • The task of comparing genomes by their pathway profile boils down to comparison of strings • No need for alignment, but a scoring scheme is needed. • Profiles can be used from either an organism perspective or a pathway perspective (e.g. find patterns of correlation among pathways, perform pathway reconstruction). Information-based weight approach • To capture the correlations • An intuitive way to including the hierarchical structure of pathways – correlation 1: pi and pj are sibling, or distantly related – correlation 2: frequency (probability) of mismatching at pi • Definitions – Master Tree: All known metabolic pathways are represented as a tree using WIT’s categories – p-tree: derived from the master tree by dropping leaves that correspond to pathways absent from an organism. The Algorithm 1. Overlay two p-trees 2. Score mismatches/matches between two p-trees, label scores at the corresponding leaves on the master tree. 3. Propagate scores bottom-up as follows - average score from siblings - multiply the averaged score by inverse of the depth 1 1 1 1 0 1 1 1 1 1 Example = 0.5 0 1 1 1 1 = 0.9375

Documents

questions

Comparative Genomics in Bioinformatics: Distances, Phylogenies, and Rearrangements - Prof., Study notes of Computer Science

Related documents

Partial preview of the text