Download CS174 Bioinformatics: Course Information and Introduction to Molecular Biology and more Assignments International Women's Voices in PDF only on Docsity! CS174 Bioinformatics Instructor: Xiaohui Xie University of California, Irvine Today’s Goals • Course information • Challenges in bioinformatics/computational biology • Brief intro to molecular biology • Python tutorial References • Recommended Textbooks: – N.C. Jones and P.A. Pevzner. An Introduction to Bioinformatics Algorithms – R. Durbin, S. Eddy, A. Krogh and G. Mitchison. Biological Sequence Analysis • Course Website: http://www.ics.uci.edu/~xhx/courses/CS174/ where lectures, references and problem sets can be found. Why bioinformatics? Bioinformatics = Biology + Information AGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCAT ACATGCATGCTTCAATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAA TGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCTCCTTATCCTTATAGTTCA TACATGCTTCAACTACTTAATAAATGATTGTATGATAATTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCA TGCTTCAACTGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCC TTATAGTTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTA TGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCTCCTTATCCTTAT AGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAA TGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAAT GTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCT AGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTTCAATGTAAGAGATT TCGATTATCCTTATAGTTCATATGCTTCAACTACTTAATAAATGATCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTA TAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGAATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTT TCAATGTAAGAGATTTCGATTATCTTATAGTTCATACACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTAT AGTTCATACATGCATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTT CAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATGCTTCAA CTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGTATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGA TGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTA GCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGA AGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTA TGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTAT TTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTG TATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGA AGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAA ACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTG CGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGA CTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCATTCAGGTTGGTACGATAAACTTTACGAATGTTCTTGTCCAG AGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAA ATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAAC CAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAAT ACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACA AACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGTATGATAATGATATGACTACCATTTTGTTATTGTA CGTGGGGCAGTTGACGTCTTATCATATGTCAAAGAAAATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCA ACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGC GTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGT TGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATGCAGGAGAACGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATT GAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTA TGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGTAATACGCTGAAAAACCTCAATACAGCTCATTCTGGAAGAAATAGTGTTTCTTGTACAACCAGGACTTGAAGC CCGTCGAAAAAGAAAGGCGGGTTTGGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCA TCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTA TCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGC CAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAA CTTTAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACA TGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATG ATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAAATAAAGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCC TTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTGTATGATTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATA ATGTTTTCAATGTAAGATTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAAT GATTCATACATGCTTCAACTACTGTAAATAATTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCATAGTTCA TACATGCTTCAACTACTT he human genome is 400,000 longer than the sequence shown here. StochasticDeterministic Histone codeEncapsiluation VirusVirus ModularityClass Regulatory codeMethod GeneData RedundantPrecise GenomeComputer Program Genome as a computer program DOE forms Joint Genome Institute Incorporation of 30,000
J GI Cc genes into human genome map
Bee JOnT SPNOME NSTI New five-year plan for
the HGP in the U.S. published
NCHGR becomes NHGRI
gis
National Hunan Genome
Research instinte
E coli genome
RIKEN Genomic Sciences
Center Japon) established
Roundworm (C. elegans)
genome sequenced
Genoscope (French
National Genome
Sequencing Center) founded
Chinese National Human Genome Centers
{in Beijing and Shanghai) established
Full-scale human
sequencing begins
Sequence of first human
chromosome
(chromosome 22)
Draft version of
human genome
sequence completed
President Clinton and
Prime Minister Bloir
support free access to
genome information
Executive order bans genetic
discrimination in U.S. federal workplace
Draft version of human
genome sequence published
Atal cones
¢ Nature
i
10,000 full-length
humon cDNAs sequenced
ammalian
ene
ollection
Draft version of mouse genome
sequence completed and published
Draft version of
Tat genome sequence completed
HGP ends with
all goals achieved
Draft version of rice genome
sequence completed and published to be
continued.
Four Aspects • Biology – What’s the underlying problem? • Algorithm – How to solve the problem efficiently? • Learning – How to model biology systems and learn from observed data? • Statistics – How to differentiate true phenomena from artifacts?
Organism
{human)
‘7, TOS gene {
° gene {
™ |
eae gene {
Each cell Each ae
nucleus One specific chromosome sa
Ahuman body ¢ontainsan chromosome isonelongDNA double helix.
is made up identical pair molecule, and
of trillions complement of genes are
of cells. chromosomes. functional regions
of this DNA.
Different Life Forms Share a Common Genetic Framework
©1998 GARLAND PUBLISHING
, Teron rani
Genomes • The term genome refers to the complete complement of DNA for a given species • The human genome consists of 46 chromosomes – Male: 22 pairs of autosomes + XY – Female: 22 pairs of autosomes + XX • Every cell (except sex cells and mature red blood cells) contains the complete genome of an organism Human Genome (Male) 22 pairs of autosomes + sex chromosomes (XY) Human Genome (Female) 22 pairs of autosomes + sex chromosomes (XX) RNA • RNA is like DNA except: – backbone is a little different – usually single stranded – the base uracil (U) is used in place of thymine (T) • A strand of RNA can be thought of as a string composed of the four letters: A, C, G, U The Genetic Code 64 combinations: 20 amino acids + stop codon Proteins • Proteins are molecules composed of one or more polypeptides • A polypeptide is a polymer composed of amino acids • Cells build their proteins from 20 different amino acids • A polypeptide can be thought of as a string composed from a 20-character alphabet