Download Introduction to Bioinformatics - Lecture Slides | BIOLOGY 101 and more Study notes Animal Anatomy and Physiology in PDF only on Docsity! Lecture 1 Introduction to Bioinformatics May 30, 2007 Intro/Overview Biological concepts • Instructor: Ameet Soni ( soni@cs.wisc.edu ) – 3rd year graduate student, Computer Sciences – Advisor: Jude Shavlik – Research: Machine learning and protein structure prediction • Office: 6740 Medical Sciences Center • Course Webpage: http://www.cs.wisc.edu/~soni/IBS07/ Overview • Molecular Biology 101 • Sequence Alignment • Phylogenetic Tree Inference • Gene Expression Analysis • Protein Structure Prediction Today • Molecular Biology 101 – DNA – RNA – Proteins – Central Dogma • Sequence Alignment • Phylogenetic Tree Inference • Gene Expression Analysis • Protein Structure Prediction Genes • genes are the basic units of heredity • a gene is a sequence of bases that carries the information required for constructing a particular protein (polypeptide really) • such a gene is said to encode a protein • the human genome comprises ~ 35,000 genes The “Central Dogma” of Molecular Biology DNA RNA PROTEIN Transcription Translation Deoxyribonucleic Acid (DNA) image from the DOE Human Genome Program http://www.ornl.gov/hgmis DNA – The “Blueprint” • genetic instructions for guiding development and function of living organisms • polymer: large molecule consisting of similar units • DNA consists of two complementary polymers bound together DNA as helix • DNA molecules usually consist of two strands arranged in the famous double helix Watson-Crick Base Pairs • in double-stranded DNA A always bonds to T C always bonds to G Double Stranded DNA
~ ~
” Ww
| |
c— H
SSS BES DART ET Tng
O
UO
UO
UO
U
BH
=§
O
U
BE
|
“om
Genomes • the term genome refers to the complete complement of DNA for a given species • the human genome consists of 46 chromosomes. • every cell (except sex cells and mature red blood cells) contains the complete genome of an organism The “Central Dogma” of Molecular Biology DNA RNA PROTEIN Transcription Translation RNA – the “Messenger” • RNA is like DNA except: – backbone different – usually single stranded – the base uracil (U) replaces thymine (T) • a strand of RNA can be thought of as a string composed of the four letters: A, C, G, U • Many functions –mRNA, rRNA, tRNA, etc. Steps in Transcription • Reads DNA 5’3’ • Initiation – RNA polymerase docks on promoter region • Elongation – Build the chain (multiple simultaneously possible) • Termination – Certain sequence creates a hairpin loop Transcription
i <i
DNA 4TSCCGTTAGACCGTTAGCGGACCTGA
TACGGCAATCTGGCAATCGCCTGGACT
x oy
mRNA
Ped ey
mRNA AUGCCGUUAGACCGUUAGCGGACCUGAC
top strand
coding strand
sense strand
bottom strand
template strand
antisense strand
DNA-RNA Base Pairs
A=U
5’ -AGCTAAGGGGC T TAAGGGAA-—3’
3’-t¢éattetece trratt¢eett—5’
3’ -UCGAUUCCCCG |ARBUUUUAUUCCCUU- 5’
CG=G
The “Central Dogma” of Molecular Biology DNA RNA PROTEIN Transcription Translation Proteins – the “Workers” • proteins are molecules large molecules constructed of one or a complex of polypeptides • a polypeptide is a polymer composed of units called amino acids – 20 types • a polypeptide can be thought of as a string composed from a 20-character alphabet Amino Acids Alanine Ala A Arginine Arg R Aspartic Acid Asp D Asparagine Asn N Cysteine Cys C Glutamic Acid Glu E Glutamine Gln Q Glycine Gly G Histidine His H Isoleucine Ile I Leucine Leu L Lysine Lys K Methionine Met M Phenylalanine Phe F Proline Pro P Serine Ser S Threonine Thr T Tryptophan Trp W Tyrosine Tyr Y Valine Val V Ribbon Model: Hemoglobin Protein Functions • structural support • storage of amino acids • transport of other substances • coordination of an organism’s activities • response of cell to chemical stimuli • movement • protection against disease • selective acceleration of chemical reactions The “Central Dogma” of Molecular Biology DNA RNA PROTEIN Transcription Translation Codons and Reading Frames
a7
Cc Codon 1
U__|
on
¢ Codon 2
G_]
G
A Codon 3
G_|
es
u Codon 4
u _
Cc
G Codon 5
G_|
A
G Codon6
Cc
UI
A Codon?
G_|
First letter
The Genetic Code
Second letter
c
A
OF ACA FP A Cl ARF NC HAFAN E
Translation
This process repeats
until reaching a stop
codon
Growing.
poly- 3
peptide
RNA Splicing
LY
Chromosomal DNA t
Transcription
(RNA synthesis)
Protein Product
Free Cee we Ce wa 8 CR he a
Person Tf = a We
la | Arg Sp Asn | Cys...
2 3 EI 5
er AGA GAT AAT TGTee.
Person 2
arr rea ree are oa)
fs m cs a
P
Fj
AAA GAT AAT TGT...
ie)
E fi ir 3° wo roe or
F a
Fe] cs Ki
-
image from the DOE Human Genome Program
http://www.ornl.gov/hgmis
Post-translation • Central dogma is over, but in reality most polypeptides (particularly in eukaryotes) have to be modified in several ways – Poly A tail, methylated cap, chaperones help fold protein • Polypeptides also form proteins Overview of the E. coli Metabolic Pathway Map image from the KEGG database Gene Regulation Example: the lac Operon these proteins metabolize lactosethis protein regulates the transcription of LacZ, LacY, LacA lactose is absent ⇒ the protein encoded by lacI represses transcription of the lac operon Gene Regulation Example: the lac Operon Number of Base Pairs More
Impressive
Growth of the
International Nucleotide Sequence Database Collaboration
SUDA Ul Spey Soe gq
PPP PY
Base Pairs corlribuled by GenBark®—8© EMBL=" ODBJ=<8
g
2
But Wait, There’s More… • > 300 other publicly available databases pertaining to molecular biology (see pointer to Nucleic Acids Research directory on course home page) • GenBank > 65 million sequence entries > 120 billion bases • SWISS-PROT > 230 thousand protein sequence entries > 85 million amino acids • Protein Data Bank 37,269 protein (and related) structures * all numbers current about 9/06 Figure from Spellman et al., Molecular Biology of the Cell, 9:3273-3297, 1998 More Data:Gene Expression • this figure depicts one yeast gene- expression data set – each row represents a gene – each column represents a measurement of gene expresssion at some time point – red indicates that a gene is being expressed more than some baseline; green means less Other Data Types • Mass spectrometry: measure proteins , metabolites • Protein-protein interactions • ChIP-Chip: identify where protein binds DNA • Single nucleotide polymorphisms (SNPs) • High-throughput molecule screening Other Data Types • Auxotrophic growth experiments • 2D gels: dimensions are charge and molecular weight. • RNAi: a quicker way to knock out genes • X-ray crystallography and NMR (nuclear magnetic resonance) for protein structures Bioinformatics Revisited Representation/storage/retrieval/ analysis of biological data concerning – sequences (DNA, protein) – structures (protein) – functions (protein, sequence signals) – activity levels (mRNA, protein, small molecules) – networks of interactions (metabolic pathways, regulatory pathways, signaling pathways) of/among biomolecules