Download Predictive Modeling in Data Mining: Components and Approaches - Prof. Jennifer L. Neville and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Data Mining CS57300 / STAT 59800-024 Purdue University February 5, 2009 1 Predictive modeling: representation 2 Data mining components • Task specification: Prediction • Data representation: Homogeneous IID data • Knowledge representation • Learning technique • Inference technique 3 Descriptive vs. predictive modeling • Descriptive models summarize the data • Provide insights into the domain • Focus on modeling joint distribution P(X) • May be used for classification, but not primary goal • Predictive models predict the value of one variable of interest given known values of other variables • Focus on modeling conditional distribution P(Y | X) or decision boundary for Y 4 Classification • In its simplest form, a classification model defines a decision boundary and labels for each side of the boundary • Input: x={x1,x2,...,xn} is a set of attributes, function f assigns a label y to input x, where y is a discrete variable with a finite number of values X 1 X 2 h 9 Classification output • Different classification tasks can require different kinds of output • Class labels — Crisp class boundaries only • Ranking — Allows for exploration of many potential class boundaries • Probabilities — Allows for more refined reasoning about sets of instances • Each requires progressively more accurate models (e.g., a poor probability estimator can still produce an accurate ranking) 10 Discriminative classification • Model the decision boundary directly • Direct mapping from inputs x to class label y • No attempt to model probability distributions • May seek a discriminant function f(x;!) that maximizes measure of separation between classes • Examples: • Perceptrons, nearest neighbor methods, support vector machines, decision trees 11 Probabilistic classification • Model the underlying probability distributions • Posterior class: p(y|x) • Class-conditional and class prior: p(x|y) and p(y) • Indirect mapping from inputs x to class label y through posterior class distribution p(y|x) • Examples: • Naive Bayes classifier, logistic regression, probability estimation trees 12 Knowledge representation 13 Knowledge representation • Model: high-level global description of dataset • Choose model family • “All models are wrong, some models are useful” G. Box and N. Draper (1987) • Estimate model parameters and possibly model structure from training data 14 Perceptron f(x) = ! 1 " wjxj > 0 0 " wjxj ! 0 19 Naive Bayes classifier p(y|x) = p(x|y)p(y) p(x) = ! i p(xi|y) p(y)" j p(x|yj)p(yj) 20 Naive Bayes classifier Y X1 X2 Xn. . . 21 Parametric vs. non-parametric • Parametric • Particular functional form is assumed (e.g., Binomial) • Number of parameters is fixed in advance • Examples: Naive Bayes, perceptron • Non-parametric • Few assumptions are made about the functional form • Model structure is determined from data • Examples: classification tree, nearest neighbor 22 Next class • Reading: Chapter 7 PDM • Topic: • Predictive modeling: learning 23