Download BSC 4934: Q'BIC Capstone Workshop in Bioinformatics - Course Information and Overview - Pr and more Study Guides, Projects, Research Biology in PDF only on Docsity! 06/24/09 Q'BIC Bioinformatics 1 BSC 4934: Q BIC Capstone Workshop Giri Narasimhan ECS 254A; Phone: x3748 giri@cis.fiu.edu http://www.cis.fiu.edu/~giri/teach/BSC4934_Su09.html 24 June through 8 July, 2009 06/24/09 Q'BIC Bioinformatics 2 Overview of Course Sequence Alignment; Multiple Sequence Alignment Sequence Analysis Sequencing and Mapping Phylogenetic Analysis Gene prediction techniques Pattern discovery techniques Protein structure alignment and analysis Genomics, Functional Genomics, Proteomics Gene Expression Data Analysis RNA Secondary structure RNA interference and small RNA Ribozymes and Riboswitches Databases & Software Packages Statistics for Bioinformatics Computational Learning & Predictive Methods Biomedical Image Analysis Emerging Biotechnologies 06/24/09 Q'BIC Bioinformatics 5 Evaluation Homework Assignments (35 %) Exam (35 %) Semester Project (25 %) Class Participation (5 %) Course Homepage http://www.cis.fiu.edu/~giri/teach/BSC4934_Su09.html Lecture notes, required reading material, homework, announcements, etc. 06/24/09 Q'BIC Bioinformatics 6 Introduction 1. What is Bioinformatics? Analysis of biological data with computing & statistical tools. 2. The different aspects of Informatics? Data Management (Database Technology, Internet Programming) Analysis/Interpretation of Data (Data Mining, Modeling, Statistical Tools) Development of Algorithms/ Data Structures Visualization and Interface Design (HCI, Graphics) 3. How to assist biological research? propose new models or correlations based on data from experiments verify a proposed model using known data propose new experiments based on model or analysis use predicted information to narrow down search in a biological investigation 06/24/09 Q'BIC Bioinformatics 7 Overall Goals Gene Protein Structure Function DNA Sequence Gene Regulatory Networks Molecular Interaction and Reaction Networks PPI Networks Metabolic Pathways 06/24/09 Q'BIC Bioinformatics 10 Genome Sizes Organism Size Date Est. # genes HIV type 1 9.2 Kb 1997 9 H. influenzae 1.8 Mb 1995 1,740 M. genitalium 0.58 Mb 1998 525 E. coli 4.7 Mb 1997 4,000 S. cerevisiae 12.1 Mb 1996 6,034 C. elegans 97 Mb 1998 19,099 A. thaliana 100 Mb 2000 25,000 D. melanogaster 180 Mb 2000 13,061 M. musculus 3 Gb 2002 ~30,000 H. sapiens 3 Gb 2001 32,000+ 06/24/09 Q'BIC Bioinformatics 11 Short Homework Find the organism with the largest genome known! How many chromosomes does it have? Do you think a larger genome implies a “more evolved” organism or a “less evolved” organism? 06/24/09 Q'BIC Bioinformatics 12 Caenorhabditis Elegans Entire genome – 1998; 8 year effort 1st animal; 2nd eukaryote (after yeast) Nematode (phylum) Easy to experiment with; Easily observable 97 million bases; 20,000 genes; 12,000 with known function; 6 Chromosomes; GC content 36% 959 cells; 302-cell nervous system 36% of proteins common with human 15 Kb mitochondrial genome Results in ACeDB 25% of genes in operons Important for HGP: technology, software, scale/efficiency 182 genes with alternative splice variants universe-review.ca www.ucl.ac.uk 06/24/09 Q'BIC Bioinformatics 15 Drosophila Eyeless vs. Human Aniridia Query: 57 HSGVNQLGGVFVGGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG 116 HSGVNQLGGVFV GRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG Sbjct: 5 HSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRYYETG 64 Query: 117 SIRPRAIGGSKPRVATAEVVSKISQYKRECPSIFAWEIRDRLLQENVCTNDNIPSVSSIN 176 SIRPRAIGGSKPRVAT EVVSKI+QYKRECPSIFAWEIRDRLL E VCTNDNIPSVSSIN Sbjct: 65 SIRPRAIGGSKPRVATPEVVSKIAQYKRECPSIFAWEIRDRLLSEGVCTNDNIPSVSSIN 124 Query: 177 RVLRNLAAQKEQ 188 RVLRNLA++K+Q Sbjct: 125 RVLRNLASEKQQ 136 Query: 417 TEDDQARLILKRKLQRNRTSFTNDQIDSLEKEFERTHYPDVFARERLAGKIGLPEARIQV 476 +++ Q RL LKRKLQRNRTSFT +QI++LEKEFERTHYPDVFARERLA KI LPEARIQV Sbjct: 197 SDEAQMRLQLKRKLQRNRTSFTQEQIEALEKEFERTHYPDVFARERLAAKIDLPEARIQV 256 Query: 477 WFSNRRAKWRREEKLRNQRR 496 WFSNRRAKWRREEKLRNQRR Sbjct: 257 WFSNRRAKWRREEKLRNQRR 276 E-Value = 2e-31 06/24/09 Q'BIC Bioinformatics 16 Motif Detection in Protein Sequences MTDKMQSLALAPVGNLDSYIRAANAWPMLSADEERALAEKLHYHGDLEAA KTLILSHLRFVVHIARNYAGYGLPQADLIQEGNIGLMKAVRRFNPEVGVR LVSFAVHWIKAEIHEYVLRNWRIVKVATTKAQRKLFFNLRKTKQRLGWFN QDEVEMVARELGVTSKDVREMESRMAAQDMTFDLSSDDDSDSQPMAPVLY LQDKSSNFADGIEDDNWEEQAANRLTDAMQGLDERSQDIIRARWLDEDNK STLQELADRYGVSAERVRQLEKNAMKKLRAAIEA MTDKMQSLALAPVGNLDSYIRAANAWPMLSADEERALAEKLHYHGDLEAA KTLILSHLRFVVHIARNYAGYGLPQADLIQEGNIGLMKAVRRFNPEVGVR LVSFAVHWIKAEIHEYVLRNWRIVKVATTKAQRKLFFNLRKTKQRLGWFN QDEVEMVARELGVTSKDVREMESRMAAQDMTFDLSSDDDSDSQPMAPVLY LQDKSSNFADGIEDDNWEEQAANRLTDAMQGLDERSQDIIRARWLDEDNK STLQELADRYGVSAERVRQLEKNAMKKLRAAIEA 06/24/09 Q'BIC Bioinformatics 17 Patterns in Protein Structures 06/24/09 Q'BIC Bioinformatics 20 SIDS 18000 Amish people in Pennsylvania Mostly intermarried due to religious doctrine rare recessive diseases occurred with high frequencies. SIDS: 3000 deaths/year (US); 21 deaths (Amish community) Many research centers failed to identify cause Collaboration between Affymetrix, TGEN & Clinic for special children solved the problem in 2 months Studied 10000 SNPs using microarray technology Their experiments showed that all the sick infants had two mutant copies of a specific gene, and their parents were carriers of the mutant gene. Conclusion: Disease caused by 2 abnormal copies of TSPYL gene Identified genes expressed in key organs (brainstem,testes) http://www.affymetrix.com/community/wayahead/modern_miracle.affx 06/24/09 Q'BIC Bioinformatics 21 Molecular Biology Background 06/24/09 Q'BIC Bioinformatics 22 2 star molecular players DNA Protein 06/24/09 Q'BIC Bioinformatics 25 The building blocks of DNA & RNA Fig 1.1, Zvelebil/Baum 06/24/09 Q'BIC Bioinformatics 26 DNA double helix structure Fig 1.3, Zvelebil/Baum RNA molecule
Fig 1.5, Zvelebil/Baum
hinge
(B)
a
(A)
Ss oO
a ao
aaa
see DD00<o ina Sienna
ft muess <cteeeeee ee
=
<<
hinge
‘ecccuc
203%,
oy
7 Boy 2 < g
IST, > H
<vbes <<< 000005’ > blll ir inordnt
fe “Meier riiit Hibs stttitlieg
2905005 <<ogpec<—oe <9 2003 <onu<co<ed
wo st © % 2
§0o oO
ae 2 x 2
Sovd>o<e s—<0v—— <k02050<0080
sTLtttid. ap eee LIL bee beg
ee eres oe%. ae Cosi aoa sao
o - 2
& oes * E
o<'3 a
Q <u
P3-P9
Q'BIC Bioinformatics
P4-P6
27
06/24/09
06/24/09 Q'BIC Bioinformatics 30 Central Dogma DNA acts as a template to replicate itself. DNA is transcribed into RNA. RNA is translated into Protein. DNA RNA Protein Transcription Translation Replication 06/24/09
Central Dogma
DNA replication
( ) DNA
Rene
3’ 5’
RNA synthesis
(transcription)
RNA
Serr >
protein synthesis
(translation)
PROTEIN
Hoh ie > 0-4 COOH
amino acids
Q'BIC Bioinformatics
Fig 1.6, Zvelebil/Baum
31
DNA Replication
Fig 1.4, Zvelebil/Baum
template strand A
Ce
ALA A
new strand B
strandA
new strand A
strand B
parent DNA double helix a al Al B "
template strand B
06/24/09 Q'BIC Bioinformatics 32
06/24/09 Q'BIC Bioinformatics 35 Chromosomes The chromosomal locations of several genes believed to be associated with the human BRCA1 gene implicated in breast cancer are highlighted. 06/24/09 Q'BIC Bioinformatics 36 Human Chr 22 Symbol Position Description ABCD1P4 22q11 ATP-binding cassette, sub-family D (ALD) SNAP29 22q11.21 synaptosomal-associated protein • • • 06/24/09 Q'BIC Bioinformatics 37 DNA Molecule
Figure 1.1: E. coli Ala RNA
QBIC Bioinformatics
06/24/09
Genes
EEE
DNA
[ la aS la
GeneA GeneB GeneC
06/24/09 Q'BIC Bioinformatics
GeneD GeneE
41
Nucleatides
lic
pil »
‘
= Sit)
‘
uae
Protein
fe Amino Acids
06/24/09 Q'BIC Bioinformatics 42
06/24/09 ily, Harvanit University) "BIC Bioinformatics
Replication
Eee anne ep Re DNA duplicates =
OPLPPDBRANSI LOLA
’
wwUVYY, Transcription
RNA synthesis
RNA |
mRNA
ATT rp rere TTT MIEN
Information
cytoplasm
nuclear envelope
Translation
Protein synthesis
Protein
\
Protein
The Central Dogma of Molecular Biology
06/24/09 46
Transcription
Fig 1.7, Zvelebil/Baum
(A) (B)
DNA 35” DNA double
3 He. RNA polymerase as
DNA
rewinding
coding strand
noncoding strand
| reanscRiPrion
direction of
transcription
active site
5/
/
newly synthesized _ short region of
RNA transcript DNA/RNA helix
06/24/09 Q'BIC Bioinformatics 47
06/24/09
DNA Transcription
g
Wy exon
WA
== QU
Chromosomal DNA
4
intron 1
Ld
intron 2
Transcription
(RNA synthesis)
Wd
Wd
Nuclear RNA
WU
WY
exon2
Messen: ger RNA
RNA synthesis and processing
50
start of transcription
Transcription ae
1A)
Initiation t— @
| a TEND
{B) a B
TFB
1c TFUF other factors
OQ SOS
TFUE
RNA polymerase Il
TEU
10)
oe ATP
{€)
06/24/09 QBIC [- cr. ore
51
Transcription
06/24/09
RNA polymerase
stop signal for
RNA polymerase
promoter
DNA double helix
DNA HELIX oN artate for transcription
OPENING
8 a
3 8
INITIATION OF RNA CHAIN
BY JOINING OF FIRST TWO
RIBONUCLEOSIDE
TRIPHOSPHATES
8 3 3
3 a
RNA CHAIN ELONGATION
IN 5'-to-3' DIRECTION
BY ADDITION OF
RIBONUCLEOSIDE
TRIPHOSPHATES
continuous _—
RNA strand ,
i short region of
displacement
and DNA helix DNAVRNA helix
re-formation
5
3!
TERMINATION AND
RELEASE OF POLYMERASE AND.
COMPLETED RNA CHAIN
3
5
224 Chapter 6 : Basic Genetic Mechanisms
Q'BIC Bioinformatics
PSPS LAST YPII SIR
Figure 6-2 The synthesis of an RNA
molecule by RNA polymerase. The
enzyme binds to the promoter
sequence on the DNA and begins its
synthesis at a start site within the
promoter. It completes its synthesisa
a stop (termination) signal,
whereupon both the polymerase and
its completed RNA chain are released.
During RNA chain elongation,
polymerization rates average about 3)
nucleotides per second at 3
Therefore, an RNA chain of 5000
nucleotides takes about 3 minutes to
complete.
52
06/24/09 Q'BIC Bioinformatics 55 Protein Synthesis: Incorporation of amino acid into protein 56
@ ;
3 :
5s) > @<+ 6-8
ec ‘
35
3 :
20
1 ey
ie :
ak At
it
"z SIE Oa<o6<6<50<05>
‘ i
§
‘|
-
=< ‘
Z |ZOWOU<E <UG+ U< <UL OF <OFO<d
c r §& i c
Ss e Ss
g = g
wo = a
a
3S
+
g
Ss
3S