Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Predictive Modeling in Data Mining: Components and Approaches - Prof. Jennifer L. Neville, Study notes of Data Analysis & Statistical Methods

This document from purdue university, cs57300 / stat 59800-024, discusses predictive modeling in data mining. The components of data mining, including task specification, data representation, knowledge representation, learning technique, and inference technique. It also differentiates between descriptive and predictive modeling and provides examples of predictive modeling approaches such as classification and regression. The document also touches upon knowledge representation and modeling approaches.

Typology: Study notes

Pre 2010

Uploaded on 07/31/2009

koofers-user-bq5
koofers-user-bq5 🇺🇸

10 documents

1 / 12

Toggle sidebar

Related documents


Partial preview of the text

Download Predictive Modeling in Data Mining: Components and Approaches - Prof. Jennifer L. Neville and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Data Mining CS57300 / STAT 59800-024 Purdue University February 5, 2009 1 Predictive modeling: representation 2 Data mining components • Task specification: Prediction • Data representation: Homogeneous IID data • Knowledge representation • Learning technique • Inference technique 3 Descriptive vs. predictive modeling • Descriptive models summarize the data • Provide insights into the domain • Focus on modeling joint distribution P(X) • May be used for classification, but not primary goal • Predictive models predict the value of one variable of interest given known values of other variables • Focus on modeling conditional distribution P(Y | X) or decision boundary for Y 4 Classification • In its simplest form, a classification model defines a decision boundary and labels for each side of the boundary • Input: x={x1,x2,...,xn} is a set of attributes, function f assigns a label y to input x, where y is a discrete variable with a finite number of values X 1 X 2 h 9 Classification output • Different classification tasks can require different kinds of output • Class labels — Crisp class boundaries only • Ranking — Allows for exploration of many potential class boundaries • Probabilities — Allows for more refined reasoning about sets of instances • Each requires progressively more accurate models (e.g., a poor probability estimator can still produce an accurate ranking) 10 Discriminative classification • Model the decision boundary directly • Direct mapping from inputs x to class label y • No attempt to model probability distributions • May seek a discriminant function f(x;!) that maximizes measure of separation between classes • Examples: • Perceptrons, nearest neighbor methods, support vector machines, decision trees 11 Probabilistic classification • Model the underlying probability distributions • Posterior class: p(y|x) • Class-conditional and class prior: p(x|y) and p(y) • Indirect mapping from inputs x to class label y through posterior class distribution p(y|x) • Examples: • Naive Bayes classifier, logistic regression, probability estimation trees 12 Knowledge representation 13 Knowledge representation • Model: high-level global description of dataset • Choose model family • “All models are wrong, some models are useful” G. Box and N. Draper (1987) • Estimate model parameters and possibly model structure from training data 14 Perceptron f(x) = ! 1 " wjxj > 0 0 " wjxj ! 0 19 Naive Bayes classifier p(y|x) = p(x|y)p(y) p(x) = ! i p(xi|y) p(y)" j p(x|yj)p(yj) 20 Naive Bayes classifier Y X1 X2 Xn. . . 21 Parametric vs. non-parametric • Parametric • Particular functional form is assumed (e.g., Binomial) • Number of parameters is fixed in advance • Examples: Naive Bayes, perceptron • Non-parametric • Few assumptions are made about the functional form • Model structure is determined from data • Examples: classification tree, nearest neighbor 22 Next class • Reading: Chapter 7 PDM • Topic: • Predictive modeling: learning 23
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved