Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Proteomics Lecture 38: Microarray Analysis and Clustering Algorithms - Prof. Drena Leigh D, Lab Reports of Bioinformatics

A portion of lecture notes from a proteomics course (bcb 444/544) at iowa state university (isu) during fall 2007. The notes cover microarray analysis, pattern recognition, hierarchical clustering, and k-means clustering. The document also includes questions and answers, clustering metrics, and a comparison of hierarchical clustering and k-means.

Typology: Lab Reports

Pre 2010

Uploaded on 09/02/2009

koofers-user-kbl
koofers-user-kbl 🇺🇸

5

(1)

10 documents

1 / 7

Toggle sidebar

Related documents


Partial preview of the text

Download Proteomics Lecture 38: Microarray Analysis and Clustering Algorithms - Prof. Drena Leigh D and more Lab Reports Bioinformatics in PDF only on Docsity! #38- Proteomics 11/28/07 BCB 444/544 Fall 07 Dobbs 1 1BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 BCB 444/544 Lecture 38 Review: Microarrays Proteomics #38_Nov28 Thanks to Doina Caragea, KSU 2BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 3 √ Mon Nov 26 - Lecture 37 Clustering & Classification Algorithms • Chp 18 Functional Genomics 2 Wed Nov 28 - Lecture 38 Proteomics & Protein Interactions • Chp 19 Proteomics Thurs Nov 30 - Lab 12 R Statistical Computing & Graphics (Garrett Dancik) http://www.r-project.org/ 1 Fri Dec 1 - Lecture 39 (Last Lecture!) Systems Biology (& a bit of Metabolomics & Synthetic Biology) Required Reading (before lecture) 3BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Assignments & Announcements Mon Nov 26 - HW#6 Due (5 PM Mon Nov 26 or ASAP) Mon Dec 3 - BCB 544 Project Reports Due (NO CLASS that day!!) ALL BCB 444 & 544 students are REQUIRED to attend ALL project presentations next week!!! Tentative Schedule: Wed Dec 5: #!: Xiong & Devin (~20’) #2: Tonia (10-15’) Fri Dec 7: #3: Kendra & Drew (~20’) #4: Addie (10-15’) Thurs Dec 6 - Optional Review Session for Final Exam Mon Dec 10 - BCB 444/544 Final Exam (9:45 - 11:45AM) Will include: 40 pts In Class: New material (since Exam 2) 20 pts In Class: Comprehensive 40 pts In Lab Practical (Comprehensive) 4BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html Nov 29 Thurs - Baker Center Seminar 2:10 Howe Hall Auditorium, • Greg Voth Univ. of Utah • Multiscale Challenge for Biomolecular Systems: A Systematic Approach Nov 29 Thurs - BBMB Seminar 4:10 in 1414 MBB • Sue Gibson Univ. of Minnesota • How do soluble sugar levels help regulate plant development, carbon partitioning and gene expression? Nov 30 Fri - BCB Faculty Seminar 2:10 in 102 ScI • Shashi Gadia ComS, ISU • Harnessing the Potential of XML Nov 30 Fri - GDCB Seminar 4:10 in 1414 MBB • John Abrams Univ Texas Southwestern Medical Center • Dying Like Flies: Programmed & Unprogrammed Cell Death 5BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Chp 18 – Functional Genomics SECTION V GENOMICS & PROTEOMICS Xiong: Chp 18 Functional Genomics • Sequence-based Approaches • Microarray-based Approaches • Comparison of SAGE & DNA Microarrays 6BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Gene Expression Analysis #38- Proteomics 11/28/07 BCB 444/544 Fall 07 Dobbs 2 7BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Pattern Recognition in Microarray Analysis • Clustering (unsupervised learning) • Uses primary data to group measurements, with no information from other sources • Classification (supervised learning) • Uses known groups of interest (from other sources) to learn features associated with these groups in primary data and create rules for associating data with groups of interest 8BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Microarray Analysis - Questions & Answers • How do hierarchical clustering algorithms work? • How do we measure the distance between two clusters? (similarity criteria) • Single link • Complete link • Average link • What are “good clusters”? • Big difference between INTRA-cluster distance and INTER- cluster distance, i.e., INTRA-cluster distance is minimized while INTER-cluster distance is maximized • What are pros & cons of: • Hierarchical vs K-means clustering • Clustering vs Classification 9BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Clustering Metrics • A key issue in clustering is to determine what similarity / distance metric to use • Often, such metric has a bigger effect on the results than actual clustering algorithm used! • When determining the metric, we should take into account our assumptions about the data and the goal of the clustering 10BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 How Determine Distances? Intra-cluster distance • Min/Max/Avg the distance between - All pairs of points in the cluster OR - Between centroid and all points in the cluster Inter-cluster distance • Single link • distance between two most similar members • Complete link • distance between two most similar members • Average link • Average distance of all pairs • Centroid distance What is the centroid? the "average" of all points of X. The centroid of a finite set of points can be computed as the arithmetic mean of each coordinate of the points. Wikipedia 11BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 INTRA- vs INTER-Cluster Distances Good! Bad! 12BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Methods for Clustering (Unsupervised Learning) • Hierarchical Clustering • K-Means • Self Organizing Maps • (in lab, won’t discuss in lecture) • …many others…. #38- Proteomics 11/28/07 BCB 444/544 Fall 07 Dobbs 5 25BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 K-Nearest Neighbor (KNN) • Idea: Use k closest neighbors to label new data points (e.g., for k = 4) 26BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Basic KNN Algorithm INPUT: • Set of data with labels (training data) • K • Set of data needing labels • Distance metric 1. For each unlabeled data point, compute distance to all labeled data 2. Sort distances, determine closest K neighbors (smallest distances) 3. Use majority voting to predict label of unlabeled data point 27BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Variations on KNN • Can classify into multiple classes easily • Weighted KNN - an weight votes of nearby training samples based on their distance from unknown sample • Can set a threshold, p, for the # of votes needed to win. (If no winner, then either NULL result or set default winner) 28BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Compare in Graphical Representation Apply external labels: RED group & BLUE group ClassificationClustering 29BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Tradeoffs for Clustering vs Classification • Clustering is not biased by previous knowledge, but therefore needs stronger signal to discover clusters • Classification uses previous knowledge, so can detect weaker signal, but may be biased by WRONG previous knowledge 30BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Chp 19 – Proteomics SECTION V GENOMICS & PROTEOMICS Xiong: Chp 19 Proteomics • Technology of Protein Expression Analysis • Post-translational Modification • Protein Sorting • Protein-Protein Interactions #38- Proteomics 11/28/07 BCB 444/544 Fall 07 Dobbs 6 31BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Facilities: Proteomics Facility (Carver Co-lab) http://www.plantgenomics.iastate.edu/proteomics/ Protein Facility (MBB) http://www.protein.iastate.edu/ Experiments: Plant: Rodermel, Wise, Voytas Animal: Greenlee, perhaps others soon? Computational Analysis: Honavar, Wise, Dobbs ISU Proteomics Resources & Researchers 32BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Proteomics: What do all those proteins do?? Copyright © 2006 A. Malcolm Campbell Biological processes for yeast proteins 33BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Proteome Analysis: “Traditionally” using Two-dimensional (2D) gels Copyright © 2006 A. Malcolm Campbell 1st D: Isoelectric focusing (IEF) in pH gradient: Proteins migrate to isoelectric points & stop moving 2nd D: SDS-PAGE (SDS detergent, polyacrylamide gel electrophoresis): Proteins migrate according to molecular weight 34BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Proteins identified on 2D gels (IEF/SDS-PAGE) Direct protein microsequencing by Edman degradations -- done at facilities (here at ISU) -- typically need 5 picomoles -- often get 10 to 20 amino acids of sequence Protein mass analysis by MALDI-TOF -- Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight Spectroscopy -- done at facilities (here at ISU) -- often detect post-translational modifications (such as phosphorylated Ser, Thr, Tyr) Page 250-1 35BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Evaluation of 2D gels (IEF/SDS-PAGE) Advantages: Visualize hundreds to thousands of proteins Improved identification of protein spots Disadvantages: Limited number of samples can be processed Mostly abundant proteins visualized Technically difficult Page 251Jonathan Pevsner 36BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Tandem Mass Spectrometry (TS) to Identify Proteins Copyright © 2006 A. Malcolm Campbell Figure 8.19 Tandem mass spectrometry for protein identification a) ESI creates ionized proteins, represented by colored shapes with positive charges. Each shape represents many copies of identical proteins. b) Ionized proteins are separated based on their mass to charge ratio (m/z) and sent one at a time into the activation chamber. Separation and selection take place in the first of the two MS devices. The solid purple protein has been selected for analysis; the other three are temporarily stored for later analysis. c) The group of m/z selected ionized proteins enters a collision cell that is filled with inert argon gas. Gas molecules collide with proteins, which causes them to break into two peptide pieces (labeled b and y). d) Ionized peptide pieces are sent into second MS device, which again measures the m/z ratio. A computer compares spectrum of peptide pieces to a database of ideal spectra to identify the original group of identical proteins. #38- Proteomics 11/28/07 BCB 444/544 Fall 07 Dobbs 7 37BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 MS data: Protein identification through peptide fragment identification & separation Copyright © 2006 A. Malcolm Campbell Figure 8.20 When a group of identical proteins is broken into peptide pieces, more than one pair of b and y peptides will be formed. a) One protein sequence and its calculated mass on top, with the b peptides/masses (gray) and the y peptides/masses (purple) below. b) An experimentally determined mass/charge spectrum from the peptide in panel a). Some peaks are higher than others, which means that some b/y peptide pieces were more abundant than others. The spectrum is used to determine each peptide’s amino acid sequence and protein identity. 38BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07 Databases of 2D Gel Information http://ca.expasy.org/ch2d/2d-index.html 39BCB 444/544 F07 ISU Dobbs #38 - Proteomics 11/28/07Jonathan Pevsner
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved