Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Data Mining: Predictive Modeling and Evaluation - Prof. Jennifer L. Neville, Study notes of Data Analysis & Statistical Methods

Purdue University Data Analysis & Statistical Methods

Prof. Jennifer L. Neville

This document from purdue university covers various aspects of predictive modeling and evaluation in data mining. Topics include score functions, cost-sensitive models, roc curves, bias-variance analysis, ensemble methods, and pathologies. Measures such as accuracy, precision, recall, and f1 score are discussed, along with concepts like overfitting, oversearching, and attribute selection errors.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-htd-1 🇺🇸

10 documents

1 / 18

Partial preview of the text

Download Data Mining: Predictive Modeling and Evaluation - Prof. Jennifer L. Neville and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Data Mining CS57300 / STAT 59800-024 Purdue University February 19, 2009 1 Predictive modeling: evaluation 2 Score functions • Zero-one loss • Accuracy • Sensitivity/specificity • Precision/Recall/F1 • Absolute loss • Squared loss • Root mean-squared error • Likelihood/conditional likelihood • Area under the ROC curve • True positive rate (TPR) = TP/(TP+FN) • False positive rate (FPR) = FP/(FP+TN) Rec ll = TP/(TP+FN) = TPR • Precision = TP/(TP+FP) • Specificity = TN/(FP+TN) • Sensitivity = TPR Simple measures on tables P re d ic te d Actual TNFN– FPTP+ –+ 3 Cost-sensitive models • Define a score function based on a cost matrix • If ~y is the predicted class and y is the true class, then need to define a matrix of costs C(~y,y) • Reflects the severity of classifying an instance with true class y to class ~y • True positive rate (TPR) = TP/(TP+FN) • False positive rate (FPR) = FP/(FP+TN) • Recall = TP/(TP+FN) = TPR • Precision = TP/(TP+FP) • Specificity = TN/(FP+TN) • Sensitivity = TPR Simple measures on tables P re d ic te d Actual TNFN– FPTP+ –+ 4 Bias/variance tradeoff Expected MSE Size of parameter space Low bias High variance High bias Low variance 9 Ensemble methods • Motivation • Too difficult to construct a single model that optimizes performance (why?) • Approach • Construct many models on different versions of the training set and combine them during prediction • Goal: reduce bias and/or variance 10 General idea Apply learning algorithm M Apply learning algorithm M1 Apply learning algorithm M2 Apply learning algorithm M3 Apply learning algorithm M4A lte re d t ra in in g d a ta A g g re g a te in to M * }} 11 Bagging • Bootstrap aggregating • Main assumption • Combining many unstable predictors in an ensemble produces a stable predictor • Unstable predictor: small changes in training data produces large changes in the model (e.g., trees) • Model space: non-parametric, can model any function if an appropriate base model is used 12 Bagging • Given a training data set D={(x1,y1),..., (xN,yN)} • For m=1:M • Obtain a bootstrap sample Dm by drawing N instances with replacement from D • Learn model Mm from Dm • To classify test instance t, apply all models to t and take majority vote • Models have uncorrelated errors due to difference in training sets (each bootstrap sample has ~68% of D) 13 Boosting • Main assumption • Combining many weak (but stable) predictors in an ensemble produces a strong predictor • Weak predictor: only weakly predicts correct class of instances (e.g., tree stumps, 1-R) • Model space: non-parametric, can model any function if an appropriate base model is used 14 Overfitting (cont) (Oates & Jensen 1999) 19 Oversearching Heuristic search Exhaustive search (Quinlan and Cameron-Jones 1995; Murthy and Salzberg 1995) Search Method Accuracy Training set Test set 20 Attribute selection errors A2 1 4 3 6 . . . 2 3 A1 + – – + . . . + – Few possible values Many Possible values (Quinlan 1998; Liu and White 1994) Possible values Accuracy Training set Test set 21 • Evaluation functions are functions f(m,D) on models (m) and data samples (D) • Samples vary in their “representativeness”: f(m,D1) = x1 ! x2 = f(m,D2) ! Each score x is an estimate of some population parameter ! x 1 x 2 Evaluation functions are estimators 22 Population Sampling Distribution All Possible Samples ... ... 4.32 3.59 7.44 2.06 5.19 4.27 Derived Statistic Values ... • Parameter estimates What is the accuracy of m? • Evaluate accuracy on many samples; empirically estimate sampling distribution • Use distribution mean as estimate of population parameter • Hypothesis tests Does m perform better than chance? • Evaluate accuracy on sample • Compare to sampling distribution under null hypothesis (H0); asses probability that accuracy would be achieved by “chance” Population under H0 Sampling Distribution All Possible Samples ... ... 4.32 3.59 7.44 2.06 5.19 4.27 Derived Statistic Values ... ! " = b ! pH0 (" # b) = 0.027 How do we use statistical inference? 23 24 • Generate multiple items • Generate n models • Estimate scores • Using the training set and an evaluation function, calculate a score for each model • Select max-scoring item • Select the model withthe maximum score 1.43 28.62 The sampling distribution of Xmax is different from the sampling distribution of Xi Multiple comparison procedures! 24 Explaining Pathologies 29 Under H0, there is a non-zero probability that any model’s score xi will exceed some critical value xcrit The probability that the maximum of n scores (xmax) will exceed xcrit is uniformly equal or higher. p(X max > x crit |H 0 ) ! p(X i > x crit |H 0 ) Incorrect hypothesis tests 30 • Many components are available to use in a given model. • Algorithms select the component with the maximum score. • The correct sampling distribution depends on number of components evaluated. • Most learning algorithms do not adjust for number of components. Overfitting 31 • Sample scores are routinely used as estimates of population parameters. Any x i score is often an unbiased estimator of the population score. But the xmax is almost always a biased estimator Biased parameter estimates 32 • Two or more search spaces contain different numbers of models. • Maximum scores in each space are biased to differing degrees. • Most algorithms directly compare scores. • Attribute selection errors can be explained in an analogous way. Oversearching 33 Adjusting for multiple comparisons • Remove bias by testing on withheld data • New data (e.g., Oates & Jensen 1999) • Cross-validation (e.g., Weiss and Kulikowski 1991) • Estimate sampling distribution accurately • Randomization tests (e.g., Jensen 1992) • Adjust probability calculation • Bonferroni adjustment (e.g., Jensen & Schmill 1997) • Alter evaluation function to incorporate complexity penalty • MDL, BIC, etc. 34

Documents

questions

Data Mining: Predictive Modeling and Evaluation - Prof. Jennifer L. Neville, Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text