Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Predictive Modeling in Data Mining: Components and Approaches - Prof. Jennifer L. Neville, Study notes of Data Analysis & Statistical Methods

Purdue University Data Analysis & Statistical Methods

Prof. Jennifer L. Neville

This document from purdue university, cs57300 / stat 59800-024, discusses predictive modeling in data mining. The components of data mining, including task specification, data representation, knowledge representation, learning technique, and inference technique. It also differentiates between descriptive and predictive modeling and provides examples of predictive modeling approaches such as classification and regression. The document also touches upon knowledge representation and modeling approaches.

Typology: Study notes

Pre 2010

Uploaded on 07/31/2009

koofers-user-bq5 🇺🇸

10 documents

1 / 12

Related documents

Data Mining: Predictive Modeling and Evaluation - Prof. Jennifer L. Neville

Data Mining: Predictive Modeling and Search Algorithms - Prof. Jennifer L. Neville

Anomaly Detection in Data Mining - Prof. Jennifer L. Neville

Evaluating Cluster Validity in Data Mining: Techniques and Measures - Prof. Jennifer L. Ne

Principles of Data Mining: Lecture 17 - Predictive Modeling in Data Mining

Probability Theory and Statistical Inference in Data Mining: A Review - Prof. Jennifer L.

Principles of Data Mining Lecture 26: Data Mining Process and Tasks - Prof. Lise Getoor

Anomaly Detection in Data Mining: Nearest-Neighbor, Density-Based, Clustering - Prof. Jenn

Data Mining: An Example from the Financial Industry - Prof. Jennifer L. Neville

Data Mining: K-means Clustering and Spectral Clustering - Prof. Jennifer L. Neville

Lecture Notes on What is Mining - Data Mining | CS 57300

Data Mining: Techniques for Summarization and Dimensionality Reduction - Prof. Jennifer L.

Principles of Data Mining: Lecture 21 - Predictive Modeling and Support Vector Machines

Elements of Data Mining - Lecture Slides | CS 57300

Comparing Data Mining Systems: Commercial Options and Selection Criteria - Prof. Jennifer

Naive Bayes Classifiers - Data Mining - Lecture Slides | STAT 59800

Lecture Slides on Support Vector Machines - Data Mining | STAT 59800

Lecture Slides on Data Mining and Bayes Net Interference | STAT 59800

Data Mining, and Populations and Samples - Lecture Slides | STAT 59800

Probabilistic Model-Based Clustering: Gaussian Mixture Models - Prof. Jennifer L. Neville

DSC 433/533 Practice Mid-term Answer Key: Data Mining and Predictive Analytics - Prof. Iai

Weka Data Analysis and Predictive Modeling for Information Technology Students

Decision Trees: Useful Modeling Approaches for Business Intelligence and Data Mining

Descriptive Modeling, Bayes Nets - Lecture Slides | STAT 59800

Data Mining - Classification and Prediction - Prediction

Predictive Analytics Modeler - Study and Lecture Notes

Data Mining: Descriptive Modeling and Cluster Analysis

Data Mining and Predictive Analytics in Various Industries

data mining approaches

Business Analytics: Predictive Data Mining and Spreadsheet Models

Partial preview of the text

Download Predictive Modeling in Data Mining: Components and Approaches - Prof. Jennifer L. Neville and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Data Mining CS57300 / STAT 59800-024 Purdue University February 5, 2009 1 Predictive modeling: representation 2 Data mining components • Task specification: Prediction • Data representation: Homogeneous IID data • Knowledge representation • Learning technique • Inference technique 3 Descriptive vs. predictive modeling • Descriptive models summarize the data • Provide insights into the domain • Focus on modeling joint distribution P(X) • May be used for classification, but not primary goal • Predictive models predict the value of one variable of interest given known values of other variables • Focus on modeling conditional distribution P(Y | X) or decision boundary for Y 4 Classification • In its simplest form, a classification model defines a decision boundary and labels for each side of the boundary • Input: x={x1,x2,...,xn} is a set of attributes, function f assigns a label y to input x, where y is a discrete variable with a finite number of values X 1 X 2 h 9 Classification output • Different classification tasks can require different kinds of output • Class labels — Crisp class boundaries only • Ranking — Allows for exploration of many potential class boundaries • Probabilities — Allows for more refined reasoning about sets of instances • Each requires progressively more accurate models (e.g., a poor probability estimator can still produce an accurate ranking) 10 Discriminative classification • Model the decision boundary directly • Direct mapping from inputs x to class label y • No attempt to model probability distributions • May seek a discriminant function f(x;!) that maximizes measure of separation between classes • Examples: • Perceptrons, nearest neighbor methods, support vector machines, decision trees 11 Probabilistic classification • Model the underlying probability distributions • Posterior class: p(y|x) • Class-conditional and class prior: p(x|y) and p(y) • Indirect mapping from inputs x to class label y through posterior class distribution p(y|x) • Examples: • Naive Bayes classifier, logistic regression, probability estimation trees 12 Knowledge representation 13 Knowledge representation • Model: high-level global description of dataset • Choose model family • “All models are wrong, some models are useful” G. Box and N. Draper (1987) • Estimate model parameters and possibly model structure from training data 14 Perceptron f(x) = ! 1 " wjxj > 0 0 " wjxj ! 0 19 Naive Bayes classifier p(y|x) = p(x|y)p(y) p(x) = ! i p(xi|y) p(y)" j p(x|yj)p(yj) 20 Naive Bayes classifier Y X1 X2 Xn. . . 21 Parametric vs. non-parametric • Parametric • Particular functional form is assumed (e.g., Binomial) • Number of parameters is fixed in advance • Examples: Naive Bayes, perceptron • Non-parametric • Few assumptions are made about the functional form • Model structure is determined from data • Examples: classification tree, nearest neighbor 22 Next class • Reading: Chapter 7 PDM • Topic: • Predictive modeling: learning 23

Documents

questions