Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Parameter Estimation, Hypothesis Testing - Principles of Data Mining | CMSC 828G, Study notes of Computer Science

University of Maryland Computer Science

Prof. Lise Getoor

Material Type: Notes; Professor: Getoor; Class: ADV TOPC INFO PROC; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-1s0 🇺🇸

(1)

10 documents

1 / 33

Partial preview of the text

Download Parameter Estimation, Hypothesis Testing - Principles of Data Mining | CMSC 828G and more Study notes Computer Science in PDF only on Docsity! CMSC828G Principles of Data Mining Lecture #8 • Today’s Reading: – HMS, chapter 4, 5 • Today’s Lecture: – Parameter Estimation cont. – Hypothesis Testing – Data mining: the component view • Upcoming Due Dates: – H1 due now Statistical Inference • Statistical inference: inferring properties of an unknown distribution from data generated by that distribution. • Estimate parameters of the model from data • Use likelihood function: Model Data Probability Statistical Inference Parameter estimation e Maximum Likelihood Estimation cont. e Bayesian Estimation Likelihood Function • Let D = {x(1),…,x(n)} • independently sampled, from the same distribution p(x|θ) ‘independent and identically distributed’, iid • The likelihood function, L(θ| x(1),…,x(n)) captures the probability of the data as a function of θ )|)i(x(p )|)n(x),...,1(x(p ))n(x),...,1(x|(L)D,(L n 1i θ= θ= θ=θ ∏ = Likelihood • L(X, θ) for a parameter θ and data X is the relative probability of getting the data for the different possible parameters θ. • We must remember that it is not a probability distribution. • For a probability distribution, θ is fixed and you can get all possible values of the random variable X. • For the likelihood funtion, X=x is fixed and you consider how the probability of getting this x changes when you change θ. Bayesian Estimation • frequentist approach: parameters of a population are fixed but unknown, and data is a random sample from that population • Bayesian approach: the data is known, and the parameters are the random variables. θ has a distribution of possible values and the observed data provide evidence for different values Bayesian Estimation, cont. • Before observing the data, we have a distribution over possible values for θ which is called the prior distribution )(p θ • By analyzing the data, we can update our beliefs to take into account the observed data. This leads to a distribution over possible values for θ given D which is called the posterior distribution )D|(p θ • We use Bayes’ theorem to obtain the posterior: )D(p )(p)|D(p )D|(p θθ =θ Bayesian Estimation, cont. • The posterior gives a distribution over parameter values. If we would like a single value (a point estimate, as in the MLE case), we can use the same princple, choose the value of θ which maximizes the posterior. • This is called the maximum a posteriori (MAP) method )D|(pmaxargˆMAP θ=θ θ MLE is a special case of MAP with ‘flat’ prior Example )1()(p 11 −β−α θ−θ∝θ • Beta prior: • Likelihood for Binomial model: )1()D|(L rnr −θ−θ=θ )1( )1()1( )(p)|D(p)D|(p 1rn1r 11rnr −β+−−α+ −β−α− θ−θ= θ−θθ−θ= θθ=θ • Posterior: • Beta is like a Binomial with α - 1 prior successes and β - 1 prior failures; • can think of Beta as having equivalent sample size α + β - 2 • Posterior is Binomial Beta distribution with parameters r + α and n - r + β BETA is CONJUGATE PRIOR for BINOMIAL Bayesian Approach to Prediction • A fully Bayesian approach is characterized by maintaining a distribution over models • In order to make a prediction about a new data point x(n+1) not in our training data set D, we average over all possible values of θ, weighted by the posterior probability p(θ|D): θθθ+= θθ+=+ ∫ ∫ d)D|(p)|)1n(x(p d)D|),1n(x(p)D|)1n(x(p Of course, computationally, this is much more challenging than the maximum likelihood approach… MCMC Classical Hypothesis Testing • Two hypothesis H0 (the null hypothesis) and H1 (the alternative hypothesis) • Outcome of a hypothesis test is 'reject H0' or 'do not reject H0' • Example: – H0: there is no difference in taste between coke and pespi against H1: there is a difference. • The hypotheses are often statements about population parameters like expected value and variance: – for example H0 might be that the expected value of the height of ten year old boys in the Scottish population is not different from that of ten year old girls • A hypothesis might also be a statement about the distributional form of a characteristic of interest, – the height of ten year old boys is normally distributed within the Scottish population Z-test Example Testing for Independence • Two nominal variables X = x1, …, xr and Y = y1, …, ys • H0 = I(X,Y) • Let n(xi)/n be then proportion of observations with X = xi, similarly for n(yj)/n • If X and Y are independent then we expect n(xi yj) = n(xi) n(yj) • Number the possible combinations 1, …, t • Let Ek denote the expected number of occurrences of combination k; let Ok denoted the observed occurrences ∑ = − =Χ t 1k k kk2 E )OE( If H0 is true, X2 follows a chi-squared distribution Example • A = age of mother and B = birth weight 20017525 15013515> 20 504010≤ 20 > 2500g≤ 2500g Rejection Rule: Reject if X2 > 3.841, df = (2-1)(2-1)=1 200 175150E, 200 25150E, 200 17550E, 200 2550E 11100100 ⋅ = ⋅ = ⋅ = ⋅ = 25.131 )25.131135( 75.18 )75.1815( 75.43 )75.4340( 25.6 )25.610( X 2222 2 −+ − + − + − = =3.428 Component View • Data mining algorithm components: – task – structure or model – score function – search or optimization method – data management technique Task e visualization, classification, clustering, regression, anomaly detection Structure • functional form of the model or pattern we are fitting to the data • e.g., linear regression model, hierarchical clustering model, neural network, Bayesian network, association rule Data Management Technique • methods used for storing, indexing and retrieving data • many statistical and machine learning algorithms do not specify any data management technique, essentially assuming that the data set fits in main memory so random access to any data point is free • However for massive data sets that exceed capacity of main memory, accessing data is orders of magnitude slower, and the physical location of the data and manner in which it is accessed can be critical to algorithm efficiency 3 Algorithms linear scansunspecifiedunspecified Data Management Technique breadth-first with pruning gradient descent on parameters greedy search over structureSearch Method support/accuracy squared errorcross-validated loss functionScore Function association rulesneural networkdecision treeStructure rule pattern discoveryregression classification and regressionTask A PrioriBackpropagationCART Next Time e Reading: — HMS, chapter 5, 6 cont.

Documents

questions

Parameter Estimation, Hypothesis Testing - Principles of Data Mining | CMSC 828G, Study notes of Computer Science

Related documents

Partial preview of the text