Download Classification Performance Metrics and Techniques and more Slides Biomaterials in PDF only on Docsity! AER
* Sensitivity and specificity
* PPVand NPV
e. Accuracy
e Validation
* ROC curves
e LDA
e Logistic regression
* KNN
World health report 2013 Sensitivity The two main terms used to describe how well a test performs are sensitivity and specificity. Sensitivity – how often a test turns positive for people who have the disease (are positive) HIV + ELISA Test shows positive True positive HIV + ELISA Test shows negative False negative ELISA test for HIV‐antibodies has a sensitivity of 99.7%. ‐> Hence only 3 out of 1000 HIV+ persons will not show positive
FP = False positive = 1 FN = False negative = 2
TP = True positive = 5 TN = True negative = 6
+ 7
O
o Tue Positive (TP) False Positive (FP)
_ E
= D
2 ad ao |
$ o o
O
=> B
False Negative (EN) True Negative (TN)
Exercise
Let's say you have got the following data. Suggest an appropriate threshold
level for PSA that separates healthy individuals from prostate cancer patients.
Sensitivity is more important than specificity:
==
PSA (Hg/L)
2
Confusion matrix A confusion matrix is a contingency table that shows the performance of a classification model (TP = true positive, TN = true negative, FP = false positive, FN = false negative): Predicted class Actual class Cancer Healthy Cancer TP FN Healthy FP TN Predicted class Actual class Cancer Healthy Cancer 5 2 Healthy 1 6 Specificity (true negative rate) True negative rate ‐ how often the test shows a negative result for the ones who are healthy Predicted class Actual class Cancer Healthy Cancer TP=5 FN=2 Healthy FP=1 TN=6 6 86% 6 1 TNSpecificity TN FP Positive predictive value (PPV) PPV ‐ proportion of individuals with a positive test who actually have the disease Predicted class Actual class Cancer Healthy Cancer TP=5 FN=2 Healthy FP=1 TN=6 5 83% 5 1 TPPPV TP FP Probability that you have cancer given a positive test (note the difference from sensitivity, which is the probability that the test shows positive if you have cancer). We assume that the sample includes the true prevalence Negative predictive value (NPV) NPV ‐ proportion of individuals with a negative test who actually are healthy Predicted class Actual class Cancer Healthy Cancer TP=5 FN=2 Healthy FP=1 TN=6 6 75% 6 2 TNNPV TN FN Probability that you are healthy, given a negative test (note the difference from specificity, which is the probability that the test shows negative if you are healthy) We assume that the sample includes the true prevalence Example of PPV Let’s assume that we have 1000 individuals who do the test 135+71 = 206 will show a positive test. The fraction of these who actually have cancer is: 71/206 = 0.34 ‐> PPV = 34% TP FN FP TN Validation Using a cutoff level of 2.3 results in a sensitivity of 71% and a specificity of 85%. This cutoff (2.3 µg/ml) was set to maximize accuracy of this particular data. What if we collect 14 new subjects, would a cutoff of 2.3 be optimal for that data set? Validation Validation techniques are used: • To estimate performance of the classifier based on “new” data • For model selection – which model performs best on new data Performance, accuracy, sensitivity etc, is usually measured by “true” error rate based on new data, which can be seen as the error rate for the population of interest. The holdout method If we do not have a big data set, we will have to few data points to check sensitivity and specificity. Example, we have 20 data points ‐> use 20% as test set ‐> only 4 data points are used to determine sensitivity and specificity. Using only 4 data points could, just by chance, result in 100% or 0% sensitivity even though the “true” sensitivity is 50%. Cross validation When we do not have a large data set, we can use cross‐validation to predict the accuracy of our model. Cross‐validation will give us an insight on how the model would perform on an unknown data set. A common cross‐validation technique is the K‐fold cross validation, where the data is splited into K equal samples K‐fold cross validation Data is divided into K equal samples, example: 4‐fold cross validation: Test set Test set Test set Test set Training set Training set Training set Training set 1. Fit model and find appropriate cutoff based on training data. 2. Use test data to estimate accuracy, sensitivity etc. 3. Repeat K‐folds 4. Average (combine) performance results. Generating a ROC curves
Consider the following example
a Jj
- Cutoff = 0.8
o | Sensitivity = 100%
> Specificity = 0%
ÉS, “do:
3 D
ee —= Ma
ó S A a 8 O
2 m3 D u
a ] X o
o O 8
FAB
ay
> 27 T T
10 08 06 04 02 00 C Healthy
Specificity
Generating a ROC curves
Consider the following example
o
= 7 Cutoff = 0.85
o | Sensitivity = 100%
2 Specificity = 14%
ÉS, “do:
3 D
ee —= Ma
ó S A a 8 O
2 m3 D u
A XL] O
o 7 O g
o A 8 O
ay
> 27 T T
10 08 06 04 02 00 C Healthy
Specificity
Generating a ROC curves
Consider the following example
a
= 7 Cutoff = 1.05
o | Sensitivity = 100%
2 Specificity = 28 %
ÉS, “do:
3 D
ee —= Ma
ó 5 = 3 8 O
2 m3 D u
A XL] O
o 7 poo So go
7 B
ay
> 27 T T
10 08 06 04 02 00 C Healthy
Specificity
Generating a ROC curves
Consider the following example
a
= 7 Cutoff = 2.05
o | Sensitivity = 86 %
> Specificity = 86 %
28 - Y 7 o
3 O
ee —= Ma
B3- ¿ E o
O Dee rn,
a | l O
9 O 8
z a
ay
> 27 T T
10 08 06 04 02 00 c Healy
Specificity
Generating a ROC curves
Consider the following example
Sa
= 7] Cutoff = 2.30
o | Sensitivity = 71 %
> Specificity = 86 %
g57 “dos
3 D
ee —= Ma
ó S A a 8 O
za ATT a
a ] X o
o O 8
Ml a
ay
> 27 T T
10 08 06 04 02 00 c Heallhy
Specificity
Generating a ROC curves
Consider the following example
1.0
7] Cutoff = 2.60
o | Sensitivity = 71%
2 Specificity = 100 %
gs E
3 D
ee —= Ma
33 - 30d IO a
2 m3 D u
A XL] O
o 7 O g
Ml a
ay
> 27 T T
10 08 06 04 02 00 c Heallhy
Specificity
Exercise Based on the data below, draw a ROC curve and calculate the area under the curve. Do you think that the area is significantly different from 0.5? Sensitivity = 4/5 = 80% Specificity = 5/5 = 100% Sensitivity = 4/5 = 80% Specificity = 4/5 = 80% Sensitivity = 5/5 = 100% Specificity = 4/5 = 80% Area = 1 – 0.2*0.2 = 0.96 ROC curves What we want is a high specificity and a high sensitivity, example: 90% sensitivity and 90% specificity 50% sensitivity and 50% specificity – the classifier is not better than just chance ¡Ue
Is the ROC curve significant different from the reference line?
2 a y
e_ o
a a 7]
$4
Z39 =S7
2 der R e 2
E
h 3 Z E Z y]
M e
ao ZO al
2-7 aL]
2 T T T T T T 2
10 08 06 04 02 00 10 08 06 04 02 00
Specificity Specificity
HO: The area under the ROC curve is equal to 0.5
DJS
LDA combines variables by maximizing the separation
between the groups
e
o
a A D
| O
v o
e | D
2 D o 2 O
5 o 8 7 8
s o - 7
» a 3“ 8 o Lor
O £ A
a $] > E a g 0 o
E 0D £
o D D 7 a É 7 D
e JJ o O O o D
8 o 2 NT E
o O
2 a Z
o
e - . D
T T T T A T T
Virus Bacteria Virus Bacteria ñ
Virus Bacteria
D=0.1.CRP+0.7- Temp
LDA ‐ cutoff ( ) ( ) (28.87 32.29) 30.58 2 2cutoff D Virus D BacteiraT A reasonable cutoff could be set to the mean of the means of the discriminant scores of the two groups. 0.1 0.7D CRP Temp LDA ‐ predict 30.58cutoffT For example, let’s say that a patient has a body temperature of 40o C, and a CRP concentration of 70 mg/L. Does the patient have bacterial or viral infection? 0.10 0.70 0.10 70 0.70 40 35D CRP Temp 0.1 0.7D CRP Temp Logistic regression Logistic regression is a mathematical modeling approach that can be used to describe the relationship between an dichotomous (YES/NO) dependent variable and independent variables. BMI Ch ol es te ro l Linear regression BMI Di se as e (Y ES /N O ) Logistic regression 0 1 (x)y a bx logit(x) a bx Logistic regression Logistic regression is commonly used to predict the risk (or odds ratio) to get a disease based on some explanatory variables: Cancer [YES/NO] = SMOKING + BMI + AGE However, in this course we will only use logistic regression as a classifier. Logistic regression VS LDA Linear discriminant analysis relies on two major assumptions: • independent variables must have a multivariate normal distribution, • the variance‐covariance matrix of all the independent variables must be homogenic among the population groups. Logistic regression does not require any specific assumptions Binary logistic regression can only have 2 groups, whereas LDA can have several groups. Logistic regression involves model building and can generate p‐values that can tell if an explanatory variable is significantly contributing to the prediction. Example BMI Pr ob ab ili ty o f d ise as e 0 1 20 0.5 30 40 50 Calculate probabilities and fit a curve 1/5 = 20% 2/5 = 40% 3/5 = 60% 4/5 = 80% BMI Pr ob ab ili ty o f d ise as e 0 1 20 0.5 30 40 50 For example, we see that there is an 80% risk (probability) of getting the disease if an individual has a BMI between 45 and 55. 1 1( ) 1(x) 1 a b x p e Logistic regression ( ) 1 1 a b x p e Fit the function: a=‐5.754, b= 2.747 ( 5.75 2.75 ) 1 1 PSA p e Logistic regression ‐ predict Example: PSA = 2 µg/L ( ) ( 5.75 2.75 2) 1 1 1 0.43 1 a b xp e e Cutoff = 50% The probability that the patient has prostate cancer is 43% ‐> classify as healthy. Logistic regression — ROC curve
Coordinates of the Curve
2-7 e.. ea Test Result Variable(s): Predicted probability
383 Positive if
»o Greater Than
Bl orEqualTo* | Sensitivity 1 - Specificity
E S7 .0000000 1,000 1,000
e 7 > Heat 0319626 1,000 857
a _— o Cancer
Ss AA 0575184 1,000 714
A AA eos 1039871 1,000 571
PSA g/L
1667372 1,000 ,429
5 ROC Curve 2284887 .857 429
: 3439260 857 ,286
08. 4693766 .857 143
.6280978 714 143
gos 7966036 714 000
5 os 8572791 ¿571 000
.8876535 429 ,000
we .9371522 ,286 ¿000
00 9819278 143 000
00 02 04 06 08 10
4 - Specificity 1,0000000 000 000
Petal Width
25
20
1.5
10
TATI io UA Tolo ll
Example from the Iris data set. Classify only Versicolor and
Virginica.
Model if Term Removed
y ES Change in -2
a ro. Model Log Log Sig. ofthe
e, Variable Likelihood Likelinood Change
7 ES o o Step1 — Sepal length (cm) -6,633 1,367 1 242
_ ETE Sepal width (cm) -7,746 9,594 1 .058
on Petal length (cm) 12,951 14,003 1 .D00
LA Petal width (cm) -11,886 11,873 1 .D01
e Step 2 — Sepalwidth (cm) -10,282 7,298 1 ¿007
PetalLengin Petal length (cm) -13,700 14,133 1 ¿000
Petal width (cm) -15,756 18,246 1 ¿000
Removing sepal length results in lower AIC value compared to the
fu
Il model.
K‐nearest neighbor (KNN) • KNN is a non‐parametric method that can be used for classification. Since it is a non‐parametric method, it does not require normal distribution of the variables and is robust against outliers. • The method does not use any model to fit data. • It involves calculation of the distances from the data points with known group membership to the new observations with unknown membership • The majority of the k nearest neighbors decides the class of the new observations. KNN algorithm 1. Determine the Euclidean distance between the new observation and all data points in the training set. 2. Sort the distances CRP Temp Group D 7 42.0 37.6 Bacteria 3.124100 1 40.0 36.0 Virus 4.000000 11 45.7 38.6 Bacteria 5.869412 8 31.1 42.2 Bacteria 9.167879 9 50.0 38.5 Bacteria 10.111874 3 30.0 36.5 Virus 10.594810 4 21.4 39.4 Virus 18.609675 … KNN algorithm 3. Select the k closest neighbors. Example k = 5 4. Determine the class of the new observation based on group majority of the k nearest neighbors ‐> 4 bacteria, 1 Virus CRP Temp Group D 7 42.0 37.6 Bacteria 3.124100 1 40.0 36.0 Virus 4.000000 11 45.7 38.6 Bacteria 5.869412 8 31.1 42.2 Bacteria 9.167879 9 50.0 38.5 Bacteria 10.111874 … Note the different scales of the axes KNN performance In contrast to LDA and logistic regression, KNN does not provide a threshold that can separate the groups. Instead we can calculate performance similarly as what is done in LOOCV: 1. Select one data point and check its k closest neighbors 2. Predict its class based on majority vote 3. Check if the prediction is correct or not 4. Repeat step 1‐3 for all data points 5. Calculate performance