Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Classification Performance Metrics and Techniques, Slides of Biomaterials

An overview of various classification performance metrics and techniques, including sensitivity, specificity, ppv, npv, accuracy, validation, roc curves, lda, and logistic regression. It also discusses the importance of threshold selection and the use of confusion matrices. The document also covers the concepts of true error rate, holdout method, and k-fold cross-validation.

Typology: Slides

2018/2019

Uploaded on 03/13/2019

lectura-lectura
lectura-lectura 🇸🇪

1 document

1 / 65

Toggle sidebar

Related documents


Partial preview of the text

Download Classification Performance Metrics and Techniques and more Slides Biomaterials in PDF only on Docsity! AER * Sensitivity and specificity * PPVand NPV e. Accuracy e Validation * ROC curves e LDA e Logistic regression * KNN World health report 2013 Sensitivity The two main terms used to describe how well a test  performs are sensitivity and specificity.  Sensitivity – how often a test turns positive for people who  have the disease (are positive)  HIV + ELISA Test shows positive True positive HIV + ELISA Test shows negative False negative ELISA test for HIV‐antibodies has a sensitivity of 99.7%. ‐> Hence  only 3 out of 1000 HIV+ persons will not show positive FP = False positive = 1 FN = False negative = 2 TP = True positive = 5 TN = True negative = 6 + 7 O o Tue Positive (TP) False Positive (FP) _ E = D 2 ad ao | $ o o O => B False Negative (EN) True Negative (TN) Exercise Let's say you have got the following data. Suggest an appropriate threshold level for PSA that separates healthy individuals from prostate cancer patients. Sensitivity is more important than specificity: == PSA (Hg/L) 2 Confusion matrix A confusion matrix is a contingency table that shows the performance of a  classification model (TP = true positive, TN = true negative, FP = false positive, FN = false  negative): Predicted class Actual  class Cancer Healthy Cancer TP FN Healthy FP TN Predicted class Actual  class Cancer Healthy Cancer 5 2 Healthy 1 6 Specificity (true negative rate) True negative rate ‐ how often the test shows a negative result for the  ones who are healthy Predicted class Actual  class Cancer Healthy Cancer TP=5 FN=2 Healthy FP=1 TN=6 6 86% 6 1 TNSpecificity TN FP      Positive predictive value (PPV) PPV ‐ proportion of individuals with a positive test who actually have the disease Predicted class Actual  class Cancer Healthy Cancer TP=5 FN=2 Healthy FP=1 TN=6 5 83% 5 1 TPPPV TP FP      Probability that you have cancer given a positive test (note the  difference from sensitivity, which is the probability that the test  shows positive if you have cancer). We assume that the  sample includes the  true prevalence Negative predictive value (NPV) NPV ‐ proportion of individuals with a negative test who actually are  healthy Predicted class Actual  class Cancer Healthy Cancer TP=5 FN=2 Healthy FP=1 TN=6 6 75% 6 2 TNNPV TN FN      Probability that you are healthy, given a negative test (note the difference from  specificity, which is the probability that the test shows negative if you are healthy) We assume that the  sample includes the  true prevalence Example of PPV Let’s assume that we have 1000 individuals who do the test 135+71 = 206 will  show a positive test.  The fraction of these  who actually have  cancer is: 71/206 =  0.34 ‐> PPV = 34% TP FN FP TN Validation Using a cutoff level of 2.3 results in a sensitivity of 71% and a specificity of 85%. This cutoff (2.3 µg/ml) was set to maximize accuracy of this particular data. What if we collect 14 new subjects, would a cutoff of 2.3 be optimal for that data set? Validation Validation techniques are used: • To estimate performance of the classifier based on “new” data • For model selection – which model performs best on new data Performance, accuracy, sensitivity etc, is usually measured by “true” error rate based on new data, which can be seen as the error rate for the population of interest. The holdout method If we do not have a big data set, we will have to few data points to check sensitivity and specificity. Example, we have 20 data points ‐> use 20% as test set ‐> only 4 data points are used to determine sensitivity and specificity. Using only 4 data points could, just by chance, result in 100% or 0% sensitivity even though the “true” sensitivity is 50%. Cross validation When we do not have a large data set, we can use cross‐validation to predict the accuracy of our model. Cross‐validation will give us an insight on how the model would perform on an unknown data set. A common cross‐validation technique is the K‐fold cross validation, where the data is splited into K equal samples K‐fold cross validation Data is divided into K equal samples, example: 4‐fold cross validation: Test set Test set Test set Test set Training set Training set Training set Training set 1. Fit model and find appropriate cutoff based on training data. 2. Use test data to estimate accuracy, sensitivity etc. 3. Repeat K‐folds 4. Average (combine) performance results. Generating a ROC curves Consider the following example a Jj - Cutoff = 0.8 o | Sensitivity = 100% > Specificity = 0% ÉS, “do: 3 D ee —= Ma ó S A a 8 O 2 m3 D u a ] X o o O 8 FAB ay > 27 T T 10 08 06 04 02 00 C Healthy Specificity Generating a ROC curves Consider the following example o = 7 Cutoff = 0.85 o | Sensitivity = 100% 2 Specificity = 14% ÉS, “do: 3 D ee —= Ma ó S A a 8 O 2 m3 D u A XL] O o 7 O g o A 8 O ay > 27 T T 10 08 06 04 02 00 C Healthy Specificity Generating a ROC curves Consider the following example a = 7 Cutoff = 1.05 o | Sensitivity = 100% 2 Specificity = 28 % ÉS, “do: 3 D ee —= Ma ó 5 = 3 8 O 2 m3 D u A XL] O o 7 poo So go 7 B ay > 27 T T 10 08 06 04 02 00 C Healthy Specificity Generating a ROC curves Consider the following example a = 7 Cutoff = 2.05 o | Sensitivity = 86 % > Specificity = 86 % 28 - Y 7 o 3 O ee —= Ma B3- ¿ E o O Dee rn, a | l O 9 O 8 z a ay > 27 T T 10 08 06 04 02 00 c Healy Specificity Generating a ROC curves Consider the following example Sa = 7] Cutoff = 2.30 o | Sensitivity = 71 % > Specificity = 86 % g57 “dos 3 D ee —= Ma ó S A a 8 O za ATT a a ] X o o O 8 Ml a ay > 27 T T 10 08 06 04 02 00 c Heallhy Specificity Generating a ROC curves Consider the following example 1.0 7] Cutoff = 2.60 o | Sensitivity = 71% 2 Specificity = 100 % gs E 3 D ee —= Ma 33 - 30d IO a 2 m3 D u A XL] O o 7 O g Ml a ay > 27 T T 10 08 06 04 02 00 c Heallhy Specificity Exercise Based on the data below, draw a ROC curve and calculate the area under the  curve. Do you think that the area is significantly different from 0.5? Sensitivity = 4/5 = 80% Specificity = 5/5 = 100% Sensitivity = 4/5 = 80% Specificity = 4/5 = 80% Sensitivity = 5/5 = 100% Specificity = 4/5 = 80% Area = 1 – 0.2*0.2 = 0.96 ROC curves What we want is a high specificity and a high sensitivity, example: 90% sensitivity and 90% specificity 50% sensitivity and 50% specificity – the classifier is not better than  just chance ¡Ue Is the ROC curve significant different from the reference line? 2 a y e_ o a a 7] $4 Z39 =S7 2 der R e 2 E h 3 Z E Z y] M e ao ZO al 2-7 aL] 2 T T T T T T 2 10 08 06 04 02 00 10 08 06 04 02 00 Specificity Specificity HO: The area under the ROC curve is equal to 0.5 DJS LDA combines variables by maximizing the separation between the groups e o a A D | O v o e | D 2 D o 2 O 5 o 8 7 8 s o - 7 » a 3“ 8 o Lor O £ A a $] > E a g 0 o E 0D £ o D D 7 a É 7 D e JJ o O O o D 8 o 2 NT E o O 2 a Z o e - . D T T T T A T T Virus Bacteria Virus Bacteria ñ Virus Bacteria D=0.1.CRP+0.7- Temp LDA ‐ cutoff ( ) ( ) (28.87 32.29) 30.58 2 2cutoff D Virus D BacteiraT     A reasonable cutoff could be set  to the mean of the means of the  discriminant scores of the two  groups. 0.1 0.7D CRP Temp    LDA ‐ predict 30.58cutoffT For example, let’s say that a patient  has a body temperature of 40o C, and  a CRP concentration of 70 mg/L.  Does the patient have bacterial or  viral infection? 0.10 0.70 0.10 70 0.70 40 35D CRP Temp         0.1 0.7D CRP Temp    Logistic regression Logistic regression is a mathematical modeling approach that can be used to describe the relationship between an dichotomous (YES/NO) dependent variable and independent variables. BMI Ch ol es te ro l Linear regression BMI Di se as e  (Y ES /N O ) Logistic regression 0 1 (x)y a bx  logit(x) a bx  Logistic regression Logistic regression is commonly used to predict the risk (or odds ratio) to get a disease based on some explanatory variables: Cancer [YES/NO] = SMOKING + BMI + AGE However, in this course we will only use logistic regression as a classifier. Logistic regression VS LDA Linear discriminant analysis relies on two major assumptions: • independent variables must have a multivariate normal  distribution,  • the variance‐covariance matrix of all the independent variables  must be homogenic among the population groups. Logistic regression does not require any specific assumptions  Binary logistic regression can only have 2 groups, whereas LDA can  have several groups. Logistic regression involves model building and can generate p‐values  that can tell if an explanatory variable is significantly contributing to  the prediction.  Example BMI Pr ob ab ili ty  o f d ise as e 0 1 20 0.5 30 40 50 Calculate probabilities and fit a curve 1/5 = 20% 2/5 = 40% 3/5 = 60% 4/5 = 80% BMI Pr ob ab ili ty  o f d ise as e 0 1 20 0.5 30 40 50 For example, we see that there is an 80% risk (probability) of getting  the disease if an individual has a BMI between 45 and 55. 1 1( ) 1(x) 1 a b x p e    Logistic regression ( ) 1 1 a b x p e     Fit the function: a=‐5.754, b= 2.747 ( 5.75 2.75 ) 1 1 PSA p e      Logistic regression ‐ predict  Example: PSA = 2 µg/L ( ) ( 5.75 2.75 2) 1 1 1 0.43 1 a b xp e e             Cutoff = 50% The probability that the patient has prostate  cancer is 43% ‐> classify as healthy. Logistic regression — ROC curve Coordinates of the Curve 2-7 e.. ea Test Result Variable(s): Predicted probability 383 Positive if »o Greater Than Bl orEqualTo* | Sensitivity 1 - Specificity E S7 .0000000 1,000 1,000 e 7 > Heat 0319626 1,000 857 a _— o Cancer Ss AA 0575184 1,000 714 A AA eos 1039871 1,000 571 PSA g/L 1667372 1,000 ,429 5 ROC Curve 2284887 .857 429 : 3439260 857 ,286 08. 4693766 .857 143 .6280978 714 143 gos 7966036 714 000 5 os 8572791 ¿571 000 .8876535 429 ,000 we .9371522 ,286 ¿000 00 9819278 143 000 00 02 04 06 08 10 4 - Specificity 1,0000000 000 000 Petal Width 25 20 1.5 10 TATI io UA Tolo ll Example from the Iris data set. Classify only Versicolor and Virginica. Model if Term Removed y ES Change in -2 a ro. Model Log Log Sig. ofthe e, Variable Likelihood Likelinood Change 7 ES o o Step1 — Sepal length (cm) -6,633 1,367 1 242 _ ETE Sepal width (cm) -7,746 9,594 1 .058 on Petal length (cm) 12,951 14,003 1 .D00 LA Petal width (cm) -11,886 11,873 1 .D01 e Step 2 — Sepalwidth (cm) -10,282 7,298 1 ¿007 PetalLengin Petal length (cm) -13,700 14,133 1 ¿000 Petal width (cm) -15,756 18,246 1 ¿000 Removing sepal length results in lower AIC value compared to the fu Il model. K‐nearest neighbor (KNN) • KNN is a non‐parametric method that can be used for  classification. Since it is a non‐parametric method, it does not  require normal distribution of the variables and is robust  against outliers. • The method does not use any model to fit data.  • It involves calculation of the distances from the data points  with known group membership to the new observations with  unknown membership • The majority of the k nearest neighbors decides the class of  the new observations. KNN algorithm 1. Determine the Euclidean distance between the new  observation and all data points in the training set. 2. Sort the distances CRP      Temp    Group               D 7     42.0     37.6     Bacteria       3.124100 1     40.0     36.0     Virus             4.000000 11   45.7     38.6     Bacteria       5.869412 8     31.1     42.2     Bacteria       9.167879 9     50.0     38.5     Bacteria       10.111874 3     30.0     36.5     Virus            10.594810 4     21.4     39.4     Virus            18.609675 … KNN algorithm 3. Select the k closest neighbors. Example k = 5 4. Determine the class of the new observation based on group  majority of the k nearest neighbors ‐> 4 bacteria, 1 Virus   CRP      Temp    Group               D 7     42.0     37.6     Bacteria       3.124100 1     40.0     36.0     Virus             4.000000 11   45.7     38.6     Bacteria       5.869412 8     31.1     42.2     Bacteria       9.167879 9     50.0     38.5     Bacteria       10.111874 … Note the different scales of the axes KNN performance In contrast to LDA and logistic regression, KNN does not provide a threshold that can separate the groups. Instead we can calculate performance similarly as what is done in LOOCV: 1. Select one data point and check its k closest neighbors 2. Predict its class based on majority vote 3. Check if the prediction is correct or not 4. Repeat step 1‐3 for all data points 5. Calculate performance
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved