Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Bioinformatics Course Material: CAP5510 Introduction to Bioinformatics - Prof. Giri Narasi, Study Guides, Projects, Research of Computer Science

Various resources for a university course on bioinformatics, specifically for the cap5510 course taught by giri narasimhan. The resources cover topics such as protein structure, energy terms, gene expression, microarray technology, clustering, and support vector machines. The document also mentions various research papers and databases related to these topics.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 09/17/2009

koofers-user-jv9
koofers-user-jv9 🇺🇸

3

(1)

10 documents

1 / 97

Toggle sidebar

Related documents


Partial preview of the text

Download Bioinformatics Course Material: CAP5510 Introduction to Bioinformatics - Prof. Giri Narasi and more Study Guides, Projects, Research Computer Science in PDF only on Docsity! 2/22/07 CAP5510 1 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/BioinfS07.html 2/22/07 CAP5510 2 Structure Prediction Flowchart http://www.russell.embl- heidelberg.de/gtsp/flowchart2.html Prokaryotic Gene Characteristics DNA PATTERNS IN THR FE. coly JexA GENE GENE SEQUENCE PATTERN 1 GAATTICGATAAATC TCTGGTITATIGNGC AGTITALGGLT: CTGNNNNNNNNNNCEG Tr TIGACA 41 CCARARICGCCTITTGCEG TATA TACTOACAGCATBACTG CTGNNNNNNNNNNCAG CCBA -35 -10 TATACT > TATAAT, > mRNA start 81 TATA TACACCCAGGGGGCGGAATGAAAGCGTTAACGGCCA CIGNNNNNNNNNNCAG +10 GGGGG Ribosomal binding site GGAGG 121 GGCRACARGAGGTGTTITGATCTCATCCGTGATCACRTICAG 161 CCAGACAGETATSCCECCEACECE TECEGABATC GCGCAG ATG 201 CGTTTEGGGTICCETICCCCABACGCGGCTGRAGBACATC 241 TGAAGGCGC TGGCACGCAAAGGCGTTATTGABATTGTITC 281 CGGCGCATC ACGCGGGATICGICTGTIGCAGGAAGAGGAA 321 GRAGGGTIGCCECTGGTAGETCETETSGC TACCGGTEAAC 361 CACTICTGGCGCAACAGCATATTGBRAGGTCATTATCAGGT OPEN READING FRAME 401 CGATCCTICCTTATTICARGCCGARTSCTGATTICCISCTS 441 CGCGTCAGC EGGATGTCGATGARA GATATCGGCATTATGC 481 AIGGTGACTIGCTGGCAGTGCATARRACTCAGGATGTACE 521 TAACGGICAGGICGTTGICGCACGTATIGATGACGAAGTT TUL RAATIMRCARAMRABRRALCRGAACRATRBEGTCOGRBRC 601 TGTTGCCAGARAATAGCGAGTITAARACCAATIGTCGTIGA 641 CCTICGTICAGCAGAGCTIC ACCATTGBAGGGCTEGCGGTT 681 GGGGTTATTCGCAACGGCGACTGGCTGTAACATATCICTG TAA 721 AGRCCECGATECCSCCTESGCHICECGSTTTGLITTICATC 761 WICTICATCAGGCTIGICTGCATGECATICCTCACTICA 801 TCTGATARAGCACTCTGGCATCTCGCCTTACCCATGATTT 841 TCTCCAATATCACCGTICCSTIGC TEGGACTGSTCGATAC 8E1 GGCGGTAATTGGICATCTTGRTAGCCCGGTITATTIGGGC 921 GECETGGCEGTTSSCECARCGGCEGRCCAGCT Shown are matches to approximate consensus binding sites for LexA repressor (CTGNNNNNNNNNNCAG), the -10 amd -35 promoter regions relative to the start of the mRNA (TTGACA and TATAAT), the ribosomal binding site on the mRNA (GGAGG), and the open reading frame (ATG...TAA}, Only the second two of the predicted Lexa binding sites actually bind the repressor. FIGURE 9.6. ‘The promoter and open reading frame of the F. coli le 2/22/07 CAP5510 2/22/07 CAP5510 6 Gene Expression Process of transcription and/or translation of a gene is called gene expression. Every cell of an organism has the same genetic material, but different genes are expressed at different times. Patterns of gene expression in a cell is indicative of its state. 2/22/07 CAP5510 7 Hybridization If two complementary strands of DNA or mRNA are brought together under the right experimental conditions they will hybridize. A hybridizes to B ⇒ A is reverse complementary to B, or A is reverse complementary to a subsequence of B. It is possible to experimentally verify whether A hybridizes to B, by labeling A or B with a radioactive or fluorescent tag, followed by excitation by laser. Microarray Data 2/22/07 CAP5510 10 Gene Expression Level Gene1 Gene2 Gene3 … Gene Chips 2/22/07 CAP5510 11 2/22/07 CAP5510 12 Gene g Probe 1 Probe 2 Probe N… What’s on the slide? Shining a laser light at GeneChipe array causes tagged DNA fragments that hybridized to glow jeneChip® array 2/22/07 CAP5510 15 DNA Chips & Images 2/22/07 CAP5510 16 2/22/07 CAP5510 17 2-color DNA microarray oem 2/22/07 CAP5510 20 http://www.arabidopsis.org/info/2010_projects/comp_proj/AFGC/RevisedAFGC/Friday/ 2/22/07 CAP5510 21 Sample Treated Sample(t1) Expt 1 Treated Sample(t2) Expt 2 Treated Sample(t3) Expt 3 … Treated Sample(tn) Expt n Study effect of treatment over time 2/22/07 CAP5510 22 Variations in cells/individuals. Variations in mRNA extraction, isolation, introduction of dye, variation in dye incorporation, dye interference. Variations in probe concentration, probe amounts, substrate surface characteristics Variations in hybridization conditions and kinetics Variations in optical measurements, spot misalignments, discretization effects, noise due to scanner lens and laser irregularities Cross-hybridization of sequences with high sequence identity. Limit of factor 2 in precision of results. Sources of Variations & Errors Need to Normalize data Hierarchical Clustering: Example 2/22/07 CAP5510 25 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 A Dendrogram 2/22/07 CAP5510 26 2/22/07 CAP5510 27 Hierarchical Clustering [Johnson, SC, 1967] Given n points in Rd, compute the distance between every pair of points While (not done) Pick closest pair of points si and sj and make them part of the same cluster. Replace the pair by an average of the two sij Try the applet at: http://www.cs.mcgill.ca/~papou/#applet 2/22/07 CAP5510 30 Clustering of gene expressions Represent each gene as a vector or a point in d-space where d is the number of arrays or experiments being analyzed. 2/22/07 CAP5510 31 From Eisen MB, et al, PNAS 1998 95(25):14863-8 Clustering Random vs. Biological Data 2/22/07 a eee t 2 ose ogee : eR ee ee ee a a) Expression Profiles for Respiration Genes CAP5510 32 2/22/07 CAP5510 35 K-Means Clustering: Example Example from Andrew Moore’s tutorial on Clustering. Start K-means (eg. k=5) 2. Randomly guess k cluster Center locations pre © 204 seam Mae K-means 3. Each datapoint finds ‘out which Center it's closest to. (Thus each Center “owns” 2 set of datapoints) 2/22/07 CAP5510 K-means (eg. k=5) 2. Randomly guess k cluster Center locations 3. Each datapoint finds ‘out which Center it's dlosest to. 4. Each Center finds the centroid of the points it owns pre © 204 seam Mae K-means 3. Each datapoint finds ‘out which Center it's dlosest to. 4, Each Center finds the centroid of the points it owns... 5S. ..and jumps there 6. ...Repeat until terminated! 36 K-means continues er © St An Mee 2/22/07 CAP5510 K-means continues Sere 6 201 dado Mae K-means continues er © St An Mee 37 2/22/07 CAP5510 40 K-Means Clustering [McQueen ’67] Repeat Start with randomly chosen cluster centers Assign points to give greatest increase in score Recompute cluster centers Reassign points until (no changes) Try the applet at: http://www.cs.mcgill.ca/~bonnef/project.html 2/22/07 CAP5510 41 Comparisons Hierarchical clustering Number of clusters not preset. Complete hierarchy of clusters Not very robust, not very efficient. K-Means Need definition of a mean. Categorical data? More efficient and often finds optimum clustering. 2/22/07 CAP5510 42 Functionally related genes behave similarly across experiments 2/22/07 CAP5510 45 SOM Algorithm Select SOM architecture, and initialize weight vectors and other parameters. While (stopping condition not satisfied) do for each input point x winning node q has weight vector closest to x. Update weight vector of q and its neighbors. Reduce neighborhood size and learning rate. 2/22/07 CAP5510 46 SOM Algorithm Details Distance between x and weight vector: Winning node: Weight update function (for neighbors): Learning rate: iwx − )]()()[,,()()1( kwkxixkkwkw iii −+=+ µ i i wxxq −= min)( ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ −− = 2 2 )( exp)(),,( 0 σ ηµ xqi rr kixk 2/22/07 CAP5510 47 World Bank Statistics Data: World Bank statistics of countries in 1992. 39 indicators considered e.g., health, nutrition, educational services, etc. The complex joint effect of these factors can can be visualized by organizing the countries using the self-organizing map. World Poverty Map 2/22/07 CAP5510 50 “6214 substances (155 selected) 7 microarras|PC4 - 6 components - Log Summary Graph Visualizations |Fesults | Parameters | Report Features: 0 (0 selected) Ratio of Medians (635/532) sox Summary | Graph Yisualizations |Fesults | Parameters Report 6214 substances (155 selected) 7 microarras 50M - 4 x 4 clusters - Euclidean -Log Features: 0 (0 selected) [Ratio of Media 7 2/22/07 CAP5510 55 Neural Networks ΣInput X Synaptic Weights W ƒ(•) Bias θ Output y 2/22/07 CAP5510 56 Learning NN × × × Σ Σ Adaptive Algorithm Input X 1 Weights W − + Desired Response Error 2/22/07 CAP5510 57 Types of NNs Recurrent NN Feed-forward NN Layered Other issues Hidden layers possible Different activation functions possible 2/22/07 CAP5510 60 A A A B A B B B Learning Problems 2/22/07 CAP5510 61 SVM – Binary Classification Partition feature space with a surface. Surface is implied by a subset of the training points (vectors) near it. These vectors are referred to as Support Vectors. Efficient with high-dimensional data. Solid statistical theory Subsume several other methods. 2/22/07 CAP5510 62 Learning Problems Binary Classification Multi-class classification Regression SVM | SVM Dataset Features | FP | FN | FP | FN Ovarian(origmal) | 97802 [46 [48 | 6 3 = ——- - Qvarian{madified) | 97302 [44] 34] 0 | 0 fot - Festure | FE _ EN a as AML/ALL train 72g [06] 28| o 6 corona 2 Se 2G ‘AML treatment m2 | 48/35] 3 2 Gore uendduct 3 192 2 Is Colon 200 [salar] 3 | 3 dleprodduet 402 Bo TapaTact I? TH dol-prod 32 u Table 5: Results for the perceptron on all data seta. The results are averaged over 5 shufllings of the data dot-product 5 32 Hu as this algorithm is sensitive to the order in which it receives the data points. The first column is the dataset fot mnt i = used and the second is number of features in the dataset. For the ovarian and colon datasets, the number aotared 33 2 of normal tissues misclassified (FP) and the number of tumor tissues misclassified (FN) is reported. For the dot-produet 3 53 2 AML/ALL training dataset, the number of AML samples misclassified (FP) and the number of ALL patients dot-product UM x3 w misclassified (FN) is reported. For the AML treatment dataset, the number of unsuccessfully treated patients ae ys a misclassified (FP) and the number of successfully treated patients misclassified (FN) is reported. The last two detect 5 24 i colurnns report the best score obtained by the SVM on that dataset. dot-pruiduet 10 43 is det-product 0 7 3 a Sot-prodluct 2 593 UB dol-prosduel 3 303 uu LB product 10 5 3 il rr 9 2 2 s dot-roduct 5 7 3 4 dot-prodtuct 10 5 3 Table 1: Brror rates for ovarian cancer tissue experiments. For eadh setcing of t consisting of a Lemel snd diagonal favcor (DF), exch dame wes elaseified. varved axe the munber of nomnsl «issues misclassified (FP sified exarectly (TP). 2 2 ia the mmber of features felomes! used. issues misclassified (FN), tumor tissues nd normal ied oonreetly (TN). si ob lhe (Meat I" Figure 1: which is the distance of an negative value indicates an incor seoam] raost negative poi: oorr VM classification margins for ovarian tissues. When dlassifving, the SV mple from the decision boundary it las eared, In this grap cach tisme sample caleulated using (10) is shown, © 1 calenlates a margin . ‘Lhe most negative HWBC3. 2/22/07 CAP5510 66 SVM – General Principles SVMs perform binary classification by partitioning the feature space with a surface implied by a subset of the training points (vectors) near the separating surface. These vectors are referred to as Support Vectors. Efficient with high-dimensional data. Solid statistical theory Subsume several other methods. SVM Example (Radial Basis Function) 2/22/07 CAP5510 67 2/22/07 CAP5510 70 Classification of (Separable) 2-D data 2/22/07 CAP5510 71 Classification of (Separable) 2-D data •Margin of a point •Margin of a point set +1 -1 2/22/07 CAP5510 72 x Separator w•x + b = 0 w•xi + b > 0 w•xj + b < 0 x Classification using the Separator 2/22/07 CAP5510 75 Perceptron Algorithm (Dual) Given a separable training set S a = 0; b0 = 0; R = max xi repeat for i = 1 to N if yi (Σaj yj xi•xj + b) ≤ 0 then ai = ai + 1 b = b + yiR2 endif Until no mistakes made within loop Return (a, b) Non-linear Separators 2/22/07 CAP5510 76 2/22/07 Main idea: Map into feature space Input space , Feature space a * ° Soya) . =. he s . Figure 2, The dea of SV machines: map the trang dala noninearty intoa higher-dimersional feature space via 5, and construct a separating iyperplane with maximum nergn them, Ths yells a nonlinear decsion boundary in npul space, By the use ofa kemel function, is possible lo @mpuite the separating hyperplane without explicitly carrying aut the map into the feature space. CAP5510 77 2/22/07 CAP5510 80 Perceptron Algorithm (Dual) Given a separable training set S a = 0; b0 = 0; R = max xi repeat for i = 1 to N if yi (Σaj yj (xi ,xj) + b) ≤ 0 then ai = ai + 1 b = b + yiR2 Until no mistakes made within loop Return (a, b) (xi ,xj) = Φ(xi)• Φ(xj) 2/22/07 CAP5510 81 Different Kernel Functions Polynomial kernel Radial Basis Kernel Sigmoid Kernel dYXYX )(),( •=κ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ −− = 2 2 2 exp),( σ κ YX YX ))(tanh(),( θωκ +•= YXYX 2/22/07 CAP5510 82 SVM Ingredients Support Vectors Mapping from Input Space to Feature Space Dot Product – Kernel function Leamed threshold Optimized threshold Class Method FP_FN_TP__TN Cost|FP_FN TP TN Cost Tricatboxylic acid Radial SVM Ss 8 9 242 24] 4 7 10 266 18 Dotproduct-l SVM| 119 8 2439-29] 36 Ls 2471S Dotproduct2SVM| 5 10 7 2445 25] 4 6 11 2446 16 Dot-product3 SVM| 4 12 5 2446 28] 4 6 11 2446 16 Parzen 4°12 5 2446 28] 0 12 5 2450 24 FLD 9 10 7 241 29/7 8 9 2443 28 CAs 717 0 41 Mocl 3016 1 3s] - - - - - Respiration Radal SVM 76 Dl] Ss 4 26 29 16 Dotproduct-1 SVM | 21 10 41] 6 9 21 231 24 Dotproduct2 SVM | 7 14 35] 7 6 24 243019 Dotproduct-3 SVM | 3-15 33] 7 6 24 2430 19 Parzen 2 10 a2] 7 12 18 2430 31 FLD 10 10 30/14 4 26 2423-22 as 1s 17 2 Mocl R26 @i|- - - - - Ribosome Radial SVM y 4 w| 6 1 2 240 6 Dotproduct-1 SVM | 136 25/11 1 120 23353 Dotproduct-2 SVM | 7 10 27/9 1 120 2337 Dotproduct3 SVM | 3 18 39] 7 1 120 23399 Parzen 6 8 2/5 8 13 241 21 FLD BS 25] 8 3 8 2238 4 CAS 3121 Bl- - - - - Moc 26 26 78 ‘Table 2: Comparison of error rates for various classification methods. Classes are as described in Table 1. The methods are the radial basis function SVM, the SVMs using the scaled dot product kernel raised to the first, second and third power, Parzen windows, Fisher’s linear discriminant, and the two decision tree learners, C4.5 and MOCI, The next five columns are the false positive, false negative, true positive and tue negative rates summed over three cross-validation splits, followed by the cost, which is the number of false positives plus twice the number of false negatives. These five columns are repeated twice, first using the threshold learned from the training set, and then using the threshold that minimizes the cost on the test set. The threshold optimization is not possible for the decision tree methods, since they do not produce ranked results, Leamed threshold Optimized threshold Class Method FP_FN TP TN Cost| FP_FN TP TN Cost Proteasome Radial SVM 3 7 28 29 I7| 4 5 30 2S Dot-product-1 SVM | 14 11 24 2418 36| 2 7 28 2430 16 Dotproduct2 SVM | 4 13 22 2428 30] 4 6 29 2428 16 Dot-product3 SVM | 3 18 17 2429 30] 2 7 28 2430 16 Parzen 215 30 241 31] 3 9 26 2429 2 FLD 7 12 23 2425 31/12 7 28 2420 26 cas 17 10 25 2415 37) - - = - Moc 1017 18 2422 44 Histone Radial SVM 02 9 256 4[0 2 +9 U6 4 Dot-product-lSVM| 0 4 7 24596 8] 0 2 9 256 4 Dotproduct2SVM| 0 5S 6 2456 10] 0 2 9 2456 4 Dot-product-3 SVM] 0 8 3 2456 16] 0 2 9 2456 4 Parzen 203 8 m5 06s] 1 3 8 SS 7 FLD 0 3 8 2456 6) 2 1 10 254 4 C45 22 9 24 6 MOCcL 205 6 4M RB] - = - - Helic-tumheix Radial SUM T 16 0 2450 33] 0 16 0 asl 32 Dotproduct-1 SVM] 20 16 0 2431 52] 0 16 0 2451 32 Dotproduct-2SVM| 4 16 0 247 36] 0 16 0 2451 32 Dot-product3 SVM] 1 16 0 2450 33] 0 16 0 2451 32 Parzen 4 16 0 2437 46] 0 16 0 245132 FLD 14 16 0 2437 46] 0 16 0 2451 32 cas 2 16 0 2449 34 MOC 6 16 0 2445 38] - = - Table 3: Comparison of error rates for various classification methods (continued). See caption for Table 2 2/22/07 CAP5510 85 Class Kermel Cost for each split Total Tricarboxyhe acid Radial 1s 21 5 7 Dot-product-l } 15 22 18 100 Dot-product-2] 16 22 17 99 Dot-produet-3 | 162217 100 Respiration Radial 16 18 28 w Dot-produet-1 | 24 24 29 IZ 19 19 26 ul 19 19 26 107 Ribosome ® 12 15 39 Dot-product-l} 13 18 14 7 Dot-produet2] 11 16 14 nR Dot-product3 | 9 15 1 65 Proteasome Radial 10 9 35 Dot-produet-l } 16 12. 12 16 Dot-product2] 16 13. 15 78 Dot-product-3 | 16 _13_ 16 79 Histone Radial 4 4 4 4 4] 20 Dotproductl] 4 4 4 4 4] 20 Dotproduct2] 4 4 4 4 4] 20 Dot-produ 4 4 4 4 4] 20 Table 4: Comparison of SVM performance using various kernels. For each of the MYGD classifications, SVMs were trained using four different kemel functions on five different random three-fold splits of the data, training on two-thirds and testing on the remaining third. The first column contains the class, as described in Table 1. The second column contains the kernel function, as described in Table 2, The next five columns contain the threshold-optimized cost (i.e., the number of false positives plus twice the number of false negatives) for each of the five random turee-fold splits. The final column is the total cost across all five splits. 2/22/07 Family Gene Locus Error Description TCA — YPROOIW CIT3 FN _milochondnial amare synthase YORI42W LSCL FN subunit of succinyl-CoA ligase YNROOIC CITL — FN mitochondrial citrate synthase YLRI74W IDP2 EN _ isocitrate dehydrogenase YILI2W KGD1 FN _a-ketoglutarate dehydrogenase YDRI48C KGD2 FN _ component of a-ketoglutarate dehydrogenase complex in mitochondria YDLO66W IDP1 EN _ mitochondrial form of isocitrate dehydrogenase YBLOISW ACHI FP. acetyl CoA hydrolase Resp YPRISIW QCR2___ FN __ubiquinol eytochrome-c reductase core protean 2 YPL271W ATPIS EN ATP synthase epsilon subunit YPL262W FUM1 FP fumarase YML120C NDI FP mitochondrial NADH ubiquinone 6 oxidoreductase YKLO&SW MDH1 FP mitochondrial malate dehydrogenass YDLO67C _COX9 FN _ subunit Vila of cytochrome ¢ oxidas Ribo YPL037C_BGDI FP __? subunit of the nascent-polypeptide-assoaiated complex (NAC) YLR406C_RPL31B FN ribosomal protcin L31B (L34B) (YL28) YLRO7SW RPLIO FP ribosomal protein L10 YALO03W_EFBI___FP__tanslation clongation factor EF-1)3 Prot YHRO27C_RPNI___FN subunit of 268 proteasome (PA700 subunit) YGR270W YTA7 FN member of CDC48/PASL/SECI8 family of ATPases YGRO48W UFDI FP ubiquitin fusion degradation protein YDRO6IC DOA4 FN __ubiquitinisopeptidase YDLO20C__RPN4 FN _ involved in ubiquitin degradation pathway Tist YOLOI2C HTA3 FN histone-related protein YKLO49C CSE4 FN required for proper kinetochore function Table 6: Consistently misclassified genes. The table lists all 25 genes that are consistently mis- classified by SVMs trained using the MYGD classifications listed in Table 1. Two types of errors ary included: a false positive (FP) occurs when the SVM includes the gene in the given elass but the MYGD classification does not; a false negative (FN) occurs when the SVM does not include the gene in the given class but the MYGD classification does, CAP5510 86 SVM | SVM Dataset Features | FP | FN | FP | FN Ovarian(origmal) | 97802 [46 [48 | 6 3 = ——- - Qvarian{madified) | 97302 [44] 34] 0 | 0 fot - Festure | FE _ EN a as AML/ALL train 72g [06] 28| o 6 corona 2 Se 2G ‘AML treatment m2 | 48/35] 3 2 Gore uendduct 3 192 2 Is Colon 200 [salar] 3 | 3 dleprodduet 402 Bo TapaTact I? TH dol-prod 32 u Table 5: Results for the perceptron on all data seta. The results are averaged over 5 shufllings of the data dot-product 5 32 Hu as this algorithm is sensitive to the order in which it receives the data points. The first column is the dataset fot mnt i = used and the second is number of features in the dataset. For the ovarian and colon datasets, the number aotared 33 2 of normal tissues misclassified (FP) and the number of tumor tissues misclassified (FN) is reported. For the dot-produet 3 53 2 AML/ALL training dataset, the number of AML samples misclassified (FP) and the number of ALL patients dot-product UM x3 w misclassified (FN) is reported. For the AML treatment dataset, the number of unsuccessfully treated patients ae ys a misclassified (FP) and the number of successfully treated patients misclassified (FN) is reported. The last two detect 5 24 i colurnns report the best score obtained by the SVM on that dataset. dot-pruiduet 10 43 is det-product 0 7 3 a Sot-prodluct 2 593 UB dol-prosduel 3 303 uu LB product 10 5 3 il rr 9 2 2 s dot-roduct 5 7 3 4 dot-prodtuct 10 5 3 Table 1: Brror rates for ovarian cancer tissue experiments. For eadh setcing of t consisting of a Lemel snd diagonal favcor (DF), exch dame wes elaseified. varved axe the munber of nomnsl «issues misclassified (FP sified exarectly (TP). 2 2 ia the mmber of features felomes! used. issues misclassified (FN), tumor tissues nd normal ied oonreetly (TN). si ob lhe (Meat I" Figure 1: which is the distance of an negative value indicates an incor seoam] raost negative poi: oorr VM classification margins for ovarian tissues. When dlassifving, the SV mple from the decision boundary it las eared, In this grap cach tisme sample caleulated using (10) is shown, © 1 calenlates a margin . ‘Lhe most negative HWBC3. 2/22/07 CAP5510 90 Genomics (Cont’d) Gene Expression Microarray experiments & analysis Probe design (CODEHOP) Array image analysis (CrazyQuant) Identifying genes with significant changes (SAM) Clustering 2/22/07 CAP5510 91 Proteomics Study of all proteins in a genome, or comparison of whole genomes. Whole genome annotation & Functional proteomics Whole genome comparison Protein Expression: 2D Gel Electrophoresis 2/22/07 em ~ . 2D Gel Electrophoresis CAP5510 92 2/22/07 CAP5510 95 Gene Networks & Pathways Genes & Proteins act in concert and therefore form a complex network of dependencies. 2/22/07 Pathway Example from KEGG beta LACTAM RESIS TANCE o0312 Weard2 Staphylococcus aureus | | | | | 4 beta-Lactamase “0 bete-Lactam val Mecl eo DNA Penicillin-binding protein CAP5510 96 Pseudomonas aeruginosa METHIONINE METABOLISM Homocystine © © 2-Oxobutanoate Glytine, serine and threonine metabolism O-Suecinyl- L-homoae rine vi ©) Homoserine 13] Cystathionine % 0 lazaaal O-Acetyl- Laatt Loysteine LAdtt [42.99.3) L-homoserine $-D-Ribogyl- [42.122][ 4418] L-homocysteine L-Serine O 8-Ade nosyl- L-homocysteine N-Fomyr 35.131 Béide nosy] L-methionine eee ‘L-Methionine 64.25 L-methionine Aunninoacyl- TWeacl L-Methionine L-methionie [SA 1512 | L1845 | FO oxide 4-Methylthio- 0 2129-0 6.11.10 {1.432 }—0 5 robetanoate H-Formylme thioxyl- TNA Ooz71 sede L-Methionyl-tRNA 97
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved