Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli

Classification and Prediction-Data Warehousing and Data Mining-Book Summary Part 04-Computer Science, Appunti di Basi di Dati

Classification

Tipologia: Appunti

2011/2012

Caricato il 12/03/2012

millyg
millyg 🇮🇹

4.6

(7)

34 documenti

1 / 37

Toggle sidebar

Documenti correlati


Anteprima parziale del testo

Scarica Classification and Prediction-Data Warehousing and Data Mining-Book Summary Part 04-Computer Science e più Appunti in PDF di Basi di Dati solo su Docsity! Chapter 2: Classification & Prediction 2.1 Basic Concepts of Classification and Prediction 2.1.1 Definition 2.1.2 Classification vs. Prediction 2.1.3 Classification Steps 2.1.4 Issues of Classification and Prediction 2 2 Decision Tree Induction. 2.2.1 The Algorithm 2.2.2 Attribute Selection Measures 2 2 3 T P i. . ree run ng 2.2.4 Scalability and Decision Tree Induction 2.3 Bayes Classification Methods 2.4 Rule Based Classification 2.5 Lazy Learners 2 6 Prediction. 2.7 How to Evaluate and Improve Classification 2.1.1 Definition Classification is also called Supervised Learning Supervision Th t i i d t ( b ti t t ) d t e ra n ng a a o serva ons, measuremen s, e c are use o learn a classifier The training data are labeled data Age Income Class label 27 28K Budget-Spenders Training data New data (unlabeled) are classified Using the training data 35 36K Big-Spenders 65 45K Budget-Spenders Class label [Budget Spender] Principle Classifier Numeric value Unlabeled data Age Income 29 25K [Budget Spender (0.8)] Construct models (functions) based on some training examples Describe and distinguish classes or concepts for future prediction P di t k l l b lre c some un nown c ass a e s 2.1.3 Classification Steps (2/2) Step2: Model Usage Before using the model, we first need to test its accuracy Measuring model accuracy To measure the accuracy of a model we need test data Test data is similar in its structure to training data (labeled data) How to test? The known label of test sample is compared with the classified result from the model A I Cl l b l Test data Age Income Tuple ge ncome ass a e 25 30K Budget-Spenders 40 50k Big-Spender Classifier Budget-Spender 25 30K Accuracy rate is the percentage of test set samples that are correctly classified by the model Important: test data should be independent of training set, otherwise over-fitting will occur Using the model: If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known Model Construction If age <30 & income <30k Then, Budget-Spender Classification Classifier If age <30 & income > 30k Then, Big-Spender If 30<age<60 and income >30k Then Big spender Training data Algorithm - If 30<age<60 and income <30k Then Budget-spender Age Income Class label 27 28K Budget-Spender 35 36K Big-Spender If age >60, Then Budget-spender 38 28K Budget-Spender 65 45K Budget-Spender 20 18k Budget-Spender 75 40k Budget-Spender 28 50k Big-Spender 40 60k Big-Spender 60 65k Big-Spender Model Usage 1 Test the classifier Age Income Class label Test Data - 27 28K Budget-Spenders 25 36K Big-Spenders 70 45K Budget-Spenders 40 35k Big-Spender Test Classifier Accuracy 2-If acceptable accuracy Age Income 18 28K Unlabeled data Age Income Class label 18 28K Budget-Spenders Classified data Classifier37 40K 60 45K 40 36k 37 40K Big-Spenders 60 45K Budget-Spenders 40 36k Budget-Spenders Summary of section 2.1.1 Classification predicts class labels Numeric prediction models continued-valued functions T t f l ifi ti 1) T i iwo s eps o c ass ca on: ra n ng 2) Testing and using Data cleaning and Evaluation are the main issues of classification and prediction Chapter 2: Classification & Prediction 2.1 Basic Concepts of Classification and Prediction 2.1.1 Definition 2.1.2 Classification vs. Prediction 2.1.3 Classification Steps 2.1.4 Issues of Classification and Prediction 2 2 Decision Tree Induction. 2.2.1 The Algorithm 2.2.2 Attribute Selection Measures 2 2 3 T P i. . ree run ng 2.2.4 Scalability and Decision Tree Induction 2.3 Bayes Classification Methods 2.4 Rule Based Classification 2.5 Lazy Learners 2 6 Prediction. 2.7 How to Evaluate and Improve Classification 2.2 Decision Tree Induction Decision tree induction is the learning of decision trees from class- labeled training tuples A decision tree is a flowchart-like tree structure Internal nodes (non leaf node) denotes a test on an attribute Branches represent outcomes of tests Leaf nodes (terminal nodes) hold class labels Root node is the topmost node age? youth i A decision tree indicating whether a customer is likely to purchase a computer student? credit –rating? yes Middle-aged sen or yesno fair Excellent no yesyes no Class-label Yes: The customer is likely to buy a computer Class-label no: The customer is unlikely to buy a computer 2.2.1 The Algorithm Principle Basic algorithm (adopted by ID3, C4.5 and CART): a greedy algorithm Tree is constructed in a top-down recursive divide-and-conquer manner Iterations At start, all the training tuples are at the root Tuples are partitioned recursively based on selected attributes T t tt ib t l t d th b i f h i ti es a r u es are se ec e on e as s o a eur s c or statistical measure (e.g., information gain) Stopping conditions All samples for a given node belong to the same class There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf There are no samples left Example age? youth Middle-aged senior RID Class RID Class 1 yes 2 yes 3 no 4 no RID Class 5 yes 6 no 7 yes i iRID age student cred t-rat ng Class: buys_computer 1 youth yes fair yes 2 youth yes fair yes 3 youth yes fair no 4 youth no fair no 5 middle-aged no excellent yes 6 senior yes fair no 7 senior yes excellent yes Example age? youth Middle-aged senior RID Class RID Class 1 yes 2 yes 3 no 4 no 6 no 7 yes yes i iRID age student cred t-rat ng Class: buys_computer 1 youth yes fair yes 2 youth yes fair yes 3 youth yes fair no 4 youth no fair no 5 middle-aged no excellent yes 6 senior yes fair no 7 senior yes excellent yes Example age? youth Middle-aged senior RID Classstudent? 6 no 7 yes yesno yes no yes i iRID age student cred t-rat ng Class: buys_computer 1 youth yes fair yes 2 youth yes fair yes 3 youth yes fair no 4 youth no fair no 5 middle-aged no excellent yes 6 senior yes fair no 7 senior yes excellent yes Example age? youth Middle-aged senior student? credit –rating? RID Class yesno yes no yes fair Excellent RID Class 7 yes 6 no i iRID age student cred t-rat ng Class: buys_computer 1 youth yes fair yes 2 youth yes fair yes 3 youth yes fair no 4 youth no fair no 5 middle-aged no excellent yes 6 senior yes fair no 7 senior yes excellent yes Example age? youth Middle-aged senior student? credit –rating? yesno yes no yes fair Excellent yesno i iRID age student cred t-rat ng Class: buys_computer 1 youth yes fair yes 2 youth yes fair yes 3 youth yes fair no 4 youth no fair no 5 middle-aged no excellent yes 6 senior yes fair yes 7 senior yes excellent no Before Describing Information Gain Entropy & Bits You are watching a set of independent random sample of X X has 4 possible values: P(X=A)=1/4, P(X=B)=1/4, P(X=C)=1/4, P(X=D)=1/4 You get a string of symbols ACBABBCDADDC… To transmit the data over binary link you can encode each symbol with bits (A=00, B=01, C=10, D=11) Y d 2 bit b lou nee s per sym o http://www.cs.cmu.edu/~guestrin/Class/10701-S06/Handouts/recitations/recitation-decision_trees-adaboost-02-09-2006.ppt (Next 4 slides are from this link) Before Describing Information Gain Fewer Bits – example 1 Now someone tells you the probabilities are not equal P(X=A)=1/2, P(X=B)=1/4, P(X=C)=1/8, P(X=D)=1/8 Now, it is possible to find coding that uses only 1.75 bits on the average. How? E.g., Huffman coding Before Describing Information Gain Fewer Bits – example 2 Suppose there are three equally likely values P(X=A)=1/3, P(X=B)=1/3, P(X=C)=1/3 Naïve coding: A = 00, B = 01, C=10 Uses 2 bits per symbol Can you find coding that uses 1.6 bits per symbol? In theory it can be done with 1.58496 bits 1st approach: Information Gain Approach D: the current partition N: represent the tuples of partition D Select the attribute with the highest information gain (based on the work by Shannon on information theory) This attribute minimizes the information needed to classify the tuples in the resulting partitions reflects the least randomness or “impurity” in these partitions Information gain approach minimizes the expected number of tests needed to classify a given tuple and guarantees a i l ts mp e ree Information Gain Approach Step1: compute Expected information (entropy) of D -Info(D)- The expected information needed to classify a tuple in D is given by: m m: the number of classes )(log)( 2 1 i i i ppDInfo ∑ = −= pi: the probability that an arbitrary tuple in D belongs to class Ci estimated by: |Ci,D|/|D|(proportion of tuples of each class) A log function to the base 2 is used because the information is encoded in bits Info(D) Th t f i f ti d d t id tif th l e average amoun o n orma on nee e o en y e c ass label of a tuple in D It is also known as entropy Info(D): Example RID i t d t dit ti l b tage ncome s u en cre -ra ng c ass: uy_compu er 1 youth high no fair no 2 youth high no excellent no 3 middle-aged high no fair yes 4 senior medium no fair yes 5 senior low yes fair yes 6 senior low yes excellent no 7 middle-aged low yes excellent yes 8 youth medium no fair no 9 youth low yes fair yes 10 senior medium yes fair yes 11 youth medium yes excellent yes 12 middle-aged medium no excellent yes m=2 (the number of classes) 9 tuples in class yes 13 middle-aged high yes fair yes 14 senior medium no excellent no N= 14 (number of tuples) 5 tuples in class no The entropy (Info(D)) of the current partition D is: bits 940.0) 14 5(log 14 5) 14 9(log 14 9)( 22 =−−=DInfo Information Gain Approach Step1: compute Expected information (entropy)of the current partition Info(D) Step2: compute InfoA(D), the amount of information would we still need to arrive at an exact classification after partitioning using attribute A Step3: compute information gain: Information gain by branching on A is I f ti i i th t d d ti i th i f ti (D)InfoInfo(D)Gain(A) A−= n orma on ga n s e expec e re uc on n e n orma on requirements caused by knowing the value of A The attribute A with the highest information gain ,(Gain(A)), is h th litti tt ib t t d Nc osen as e sp ng a r u e a no e Infoage(D): Example RID i t d t dit ti l b tage ncome s u en cre -ra ng c ass: uy_compu er 1 youth high no fair no 2 youth high no excellent no 3 middle-aged high no fair yes 4 senior medium no fair yes 5 senior low yes fair yes 6 senior low yes excellent no 7 middle-aged low yes excellent yes 8 youth medium no fair no 9 youth low yes fair yes 10 senior medium yes fair yes 11 youth medium yes excellent yes 12 middle-aged medium no excellent yes 13 middle-aged high yes fair yes 14 senior medium no excellent no bits 940.0) 14 5(log 14 5) 14 9(log 14 9)( )1 22 =−−=DInfo 694.0)2,3( 14 5)0,4( 14 4)3,2( 14 5)( )2 =++= IIIDInfo age 2460)()()()3 == DInfoDInfoageGain . − age Similarly, Gain(Income)=0.029, Gain(student)=0.151, Gain(credit_rating)=0.48 Attribute age has the highest gain ⇒ It is chosen as the splitting attribute Note on Continuous Valued Attributes Let attribute A be a continuous-valued attribute Must determine the best split point for A Sort the value A in increasing order Typically, the midpoint between each pair of adjacent values is considered as a possible split point (ai+ai+1)/2 is the midpoint between the values of ai and ai+1 The point with the minimum expected information requirement for A is selected as the split-point for A Split D1 is the set of tuples in D satisfying A ≤ split-point, and D2 is the set of tuples in D satisfying A > split-point
Docsity logo


Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved