Download Algorithms for Biosequence Analysis: CMSC858P Lecture 1 - Prof. Mihai Pop and more Assignments Computer Science in PDF only on Docsity! CMSC858P: Algorithms for Biosequence Analysis Lecture 1 Instructor: Mihai Pop TuTh: 2-3:15pm, CSIC 3118 INTRODUCTIONS • Instructor: Mihai Pop (mpop at umiacs.umd.edu) Office hours: Wednesdays 11-12, AVW 3223 • You • Class webpage: http://www.cbcb.umd.edu/confcour/CMSC858P.shtml Grading & workload • Homework (10%) • Goal: 5-10 assignments – simple – small programming assignments – “discovery” exercises (find something in public databases or using public software) • Programming projects (15% + 15%) – Project 1 – assigned by instructor (suffix tree) – Project 2 – chosen by student • In-class midterm (25%) & final (35%) • Late policy: 1 day late – 10 points off; 2 days late – 20 points off; 3 days late – 0 points Academic Honesty • No cheating on homeworks/projects/exams • No making up data/results • No copying of other people’s code • You can work together on homeworks/projects but WRITE THE ANSWER BY YOURSELF I pledge on my honor that I have not given or received any unauthorized assistance on this examination. http://www.studenthonorcouncil.umd.edu/code.html Advice: how to do well in the class • Start early on assignments – at least read the assignment after class • Ask questions – during class, exams, office hours, using email (I’m available most time by email) • Be inquisitive – follow up on topics discussed in class: Google, Wikipedia • Be social – get to know some biologists – learn what they do, what they are interested in • Get to know your colleagues Central dogma
DIA
OMRON IVERER Replication
| nf DNA duplicates
Infor matia
DNA Infor bation
Transcription
RNA synthesis
i oT
nucleus
cytoplasm
Translation
Protein synthesis
Protein
Protein
The Central Dogma of Molecular Biology
AGGTACGCGTACCT GACAGG
Phage CRO Represor on DNA, Andrew Coulson & Roger
Sayle with RasMel, University of Edinburgh, 1293
http://www.accessexcellence.org/RC/VL/GG/central. html
Genes, transcription, translation • DNA – RNA - Thymine replaced by Uracil (T-U) • The transcribed segments are called genes • AUG – start codon (also amino-acid Methionine) • UAA, UAG, UGA – stop codons • Genes are read in sets of 3 nucleotides during translation – 43 = 64 possible combinations • Each combination codes for one of 20 amino-acids – the building blocks for proteins ACCGUACCAUGUUA...AUAGGCUGAGCA First letter
Amino-acid translation table
Second letter
} Tyr
UUU
UUC
UUA }
UUG J
CUU )
CUC
CUA
CUG J
AUU )
AUC
AUA J
AUG
GUU }
GUC
GUA
GUG J
} Phe
Leu
Leu
lle
Met
Val
UCU )
UCC
UCA
UCG J
CCU )
CCC
CCA
CCG J
ACU )
ACC
ACA
AGG J
GCU}
GCC
GCA
Ser
Pro
Thr
Ala
GCG J
UAU
UAC
UAA Stop
UAG Stop
taal
uac | CYS
UGA Stop
UGG _ Trp
49}9] PAY]
oO
ov
Translation — complications
pre-mRNA
5’ UTR Exon Intron Exon Intro Exon 3’ UTR
mRNA
Alternative splicing examples
(a) Alternative selection of promoters (e.g., myosin primary transcript)
7 ms
P P S S “ =o
(b) Alternative selection of cleavyage/polyadenylation sites (e.g., tropomyosin transcript)
7 [Dn
~ Polyadenylation — “ Hl
sites
(c) Intron retaining mode (e.g., transposase primary transcript)
$e
(d) Exon cassette mode (e.g., roponin primary transcript)
fae oe
RECAP • DNA is a string formed with letters A, C, T, G (called nucleotides or bases) • DNA is double-stranded – allows replication: transfer of genetic “code” from parents to offspring • DNA is naturally oriented from 5’ to 3’ and the two strands are anti-parallel • If you know the sequence of one strand, you can obtain the sequence of the other by reverse- complementation 5’ AGACCTAGTGCACGGCTACTACC 3’ 5’ CCATCATCGGCACGTGATCCAGA 3’ Reverse 5’ GGTAGTAGCCGTGCACTAGGTCT 3’ Complement Polymerase chain reaction (PCR) 1. Denature 2. Anneal (attach primer) 3. Extend 4. Repeat How does PCR work? • 1. Start: 1 double-stranded molecule • 1. Denature: 2 single- stranded molecules • 1. Anneal: 2 single-stranded molecules with primers attached • 1. Extend: 2 double-stranded molecules – one “long” (L) strand and one “short” (S) (terminated at a primer) • 2. Start: 2 double-stranded molecules: L+S, L+S • 2. Denature: 2 x L strands, 2 x S strands • 2. Anneal: all strands with primers attached • 2. Extend: 2 double-stranded molecules: L+S, L+S, 2 double-stranded molecules: S+SS, S+SS SS – strand terminated at both ends with a primer Quantitative PCR • Measure # of PCR cycles needed to reach a certain concentration of DNA – depends on initial # of molecules • Used in diagnostics: e.g. is this a random Anthrax spore from the environment or lots of spores from an attack http://www.dxsgenotyping.com/technology_main.htm The future of sequencing • Roche/454 Life Sci. – approx. 60-100 Mbp, 250 bp reads / 4 hr • Illumina/Solexa – approx. 1-2 Gbp, 30-40 bp reads / 3 day run • Applied Biosystems/SOLiD – approx 1 Gbp, 25-35 bp reads • Helicos – single molecule sequencing ~ 1Gbp/hour, 30-40 bp Not yet available: • nanopore sequencing The future of sequencing • Single molecule sequencing - current technology requires many copies of DNA being sequenced - requires DNA amplification • Massively-parallel sequencing - 100k sequencing reactions occuring at the same time Sequencing by synthesis Micro-fluidics http://www.genetics.ucla.edu/sequencing/pyro.php http://www.usgenomics.com AGATTATCTAACAGCTACCCTTCCATCA TCTAATAGA How they work • Amplify DNA – Roche/454 – emulsion PCR on beads (water droplets in oil) – Illumina/Solexa – PCR on surface – ABI SOLiD – emulsion PCR • Sequence – Roche/454 – pyrosequencing – Illumina/Solexa – reversible terminators – ABI SOLiD – sequencing by ligation two-color encoding