Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Comparative Genomics - Lecture Slides | BIT 150, Study notes of Bioinformatics

Material Type: Notes; Class: Applied Bioinformatics; Subject: Biotechnology; University: University of California - Davis; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-yca-1
koofers-user-yca-1 🇺🇸

4.3

(3)

10 documents

1 / 24

Toggle sidebar

Related documents


Partial preview of the text

Download Comparative Genomics - Lecture Slides | BIT 150 and more Study notes Bioinformatics in PDF only on Docsity! Comparative genomics Chapter 15 Features that are investigated include 1. Genome size variation 2. Base composition biases 3. Gene order 1. Colinearity 2. Identification of orthologs 3. Functional predictions 4. Whole genome alignments 1. Identify genes 2. Improve gene annotation 3. Identify regulatory regions (phylogenetic footprinting) 5. Evolution of genomes Comparative genomics Genome Size • There are huge differences in genome sizes among related organisms • More complex organisms not necessarily have larger genomes. • “C-value paradox” • Repetitive elements responsible for large proportion of the differences • Amount of DNA affect cell size, speed of cell cycle, and other traits Comparative genomics Composition bias Effect of repetitive elements in GC content •Grass genes 51% GC, Arabidopsis genes 44% GC • The overall GC content in wheat 46% is 2% higher than Arabidopsis or rice •The GC content of barley and wheat is very similar to the GC content of the most abundant retroelements. Different organisms show different rates of chromosome rearrangements Comparative maps of the wheat genome described in terms of the rice genome (A) and the Aegilops umbellulata genome (B). (A) Data from Kurata et al. (1994), Van Deynze et al. (1995) (B) Data from Zhang et al. (1998). Comparative genomics Gene order A. Wheat- rice 50 million years B. Wheat- Aegilops 5 million years Human – Mouse (80 million years): Many rearrangements (180 blocks) but some large conserved regions! Human – Chimpanzee (5 million years) almost completely colinear. • Allows transfer of genetic mapping data between organisms. • Allows functional analysis of human genes via precise deletion of their colinear regions in the mouse genome Homology search for the mouse genome • Homology search of all genes in the mouse genome • 27% in other metazoans • 29% in other eukaryotes • 6% in other chordates • 14% in other mammals • Less than 1% rodent specific Eukaryote Other Rodent specific Chordate Metazoan Mammal What have we learned? Decay of synteny in intergenic regions Million years 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 Wheat -barley A-B-D wheat A- Am C(t) A B C D E 20 kb Am genome T. monococcum A genome durum • Small groups of genes are dispersed within a sea of repetitive elements • Rapid divergence in intergenic regions: Comparison orthologous A and Am VRN2 regions • 1.1 MYA divergence • 31% similarity • Fast decay of % similarity is also affecting genes • Fast rate of gene deletions and inactivation by retro insertions. •Small phenotypic changes in polyploid wheat. Valuable source of diversity Dubcovsky and Dvorak 2007, Science 2007 316: 1862-1866 HvsCh HvsCh Human vs. Chimp 7 mya, 3% divergence Dotplots to discover duplications in rice • A large genome duplication occurred approximately 70 million years ago before the divergences of most grass subfamilies. •Diploid grasses are actually ancient polyploids •Figures shows the arrangement of duplicated protein-encoding genes in rice in the order found in the current sequence assembly. Both X and Y axes represent 45,174 genes in their chromosomal order. • Colors indicate same (red) or opposite (green) transcriptional orientations. • Differential Gene Loss Contributes to apparent Incongruities in Comparative Grass Genomics. •Paterson et al. 2004 PNAS • A similar result has been described for Arabidopsis suggesting that this species is also an ancient polyploid Comparative genomics Aligning Genomic sequences Evidence of genome duplication in Arabidopsis • The Arabidopsis genome shows evidence of genome duplications • Compare blocks of genes that have related sequences – Blocks imply genome, rather than gene, duplication – Distribution of block ages points to multiple duplications • Most of the plants species are recent or ancient polyploids 1 2 3 4 5 Algorithms for Aligning Genomic sequences Finding orthologous region between two genomes is nontrivial Existing methods based on dynamic programming algorithms (e.g. Needleman and Wunsch / Smith and Waterman) or hashing (e.g. BLAST / FASTA) rapidly run out of memory. Even Megablast and BLAT can not handle large genome comparisons Specific programs • BLASTZ • LAGAN • AVID Substitution matrices used by BLAST and BLASTZ. BLASTZ matrix based on observed substitutions in aligned conserved regions of mouse and human. BLASTZ aligned mouse sequences to 40% of the human genome. BLASTN BLASTZ BLASTZ (http://zpicture.dcode.org/, see zpicture.pdf) • Local Alignment program. Aligns 2-Mb<1min. • Search stretches of 19-bp /12 matches (1 transition OK) • After initial match a gap free extension is performed until a cumulative score reaches a threshold (3000) • If threshold is met, then realigned now allowing gaps • Alignments with scores >5000 move to next phase • Alignment scores are calculated using refined substitution matrices based on aligned human-mouse non-coding non-repetitive regions (see below) • Connect individual alignments separated by <50-kb Comparative genomics LAGAN Limited Area Global Alignment of Nucleotides LAGAN: global pair-wise and multiple alignment of finished sequences. Detects closely and distantly related sequences If some of the sequences are in a draft format your query will be redirected to AVID Multiple alignment will be visualized by VISTA This is the only alignment program available through the VISTA server that produces true multiple alignments. LAGAN performs better than BLASTZ for distantly related organisms AVID AVID: global pair-wise alignment. • Fast alignment of large seq. • Detects weak homologies • One of the sequences should be finished, but all others can be either finished or in a draft format. • For all finished sequences in the set, AVID generates all-against-all pair-wise alignments • Draft sequences are aligned to the finished sequence. • View using VISTA Clean Matches (1st) & Repeat matches Repeat for each inter-anchor region from the previous step Set of non-overlapping, non-crossing matches Recursion Red: non-overlapping, non-crossing matches MULTI-LAGAN http://lagan.stanford.edu/lagan_web/index.shtml Rice Wheat Rice Wheat Rice Wheat Rice Wheat Barley view Gene 1 2563-6313 6 exons Gene 2 30279-34393 13 exons Missed Reverse Gene3 64887-73061 18 exons Gene4a 74857-80916 18 exons Gene 1 4548-7287 6 exons Gene 2 8663-13534 13 exons Missed barley Gene3 14166-22601 18 exons Gene4 24136-29057 18 exons Rice view Wheat Wheat Barley Barley Gene4b 84895-88925 18 exons MULTI-LAGAN: similar to ClustalW It produces dynamic alignments and also PDF of the alignments presented here. Any of the genomes can be used as reference In the alignment between rice-barley-wheat it fails to detect the inverted Gene 2 in barley. It produces an evolutionary tree for the submitted sequences Different levels of conservation ot T §9,950k Precomputed genomic alignments Major sources of Precomputed Whole-genome alignments • Ensembl http://www.ensembl.org/index.html [human-chimp-mouse-rat-chicken-fugu-zebrafish-Drosophila-C.elegans] • VISTA browser http://pipeline.lbl.gov/cgi-bin/gateway2 [human-chimp-mouse-rat-chicken] MULTI-LAGAN • UCSC Genome Browser http://genome.ucsc.edu/ [human-chimp-mouse-rat-chicken-fugu] BLASTZ Vista UCSC Chimp Mouse Rat Chicken Rep. Masker Chimp Mouse Rat Chicken Conservation
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved