Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Maximum Likelihood Estimation and Hypothesis Testing for a Binomial Proportion, Lecture notes of Statistics

Binomial DistributionHypothesis TestingProbability TheoryMaximum Likelihood EstimationStatistical Inference

An explanation of maximum likelihood estimation and hypothesis testing for a binomial proportion using the example of a random sample of college graduates. It covers the binomial distribution, the likelihood function, and the methods of wald, score, and likelihood-ratio statistics for testing hypotheses about the population proportion.

What you will learn

  • How is the Wald statistic calculated for a binomial proportion?
  • What is the maximum likelihood estimator for a binomial proportion?
  • What is the likelihood-ratio statistic and how is it used in hypothesis testing for a binomial proportion?

Typology: Lecture notes

2017/2018

Uploaded on 10/30/2018

lexi199702
lexi199702 🇺🇸

14 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Maximum Likelihood Estimation and Hypothesis Testing for a Binomial Proportion and more Lecture notes Statistics in PDF only on Docsity! Handout 4 Inference for a binomial In handout 2, we have discussed how to obtain an approximate confidence interval for a binomial propor- tion π. If Y is the number of college graduate in a random sample of size n, then the maximum likelihood estimator (MLE) of π, the population proportion of college graduates, is given by the sample proportion p = Y/n. If both nπ and n(1 − π) are large, an approximate 95% confidence interval for π is given by p± 1.96SE(p), where SE(p) = √ p(1− p)/n. Maximum Likelihood estimation. Recall that the pdf (probability density function) of a binomial distribution is f(y;π) = ( n y ) πy(1− π)n−y. Suppose we have observed y = 71 college graduates in a random sample of n = 200, then the likelihood function is l(π) = f(71;π) = ( n 71 ) π71(1− π)n−71 = ( 200 71 ) π71(1− π)200−71, ie, the likelihood function l(π) is a function of π treating y as fixed. We may plot l(π) against π, and find the value of π at which l(π) attains its maximum. If π̂ is the value of π at which l(π) attains its maximum. then π̂ is called the maximum likelihood estimator of π. A plot of this function is given below when n = 200 and y = 71. The maximum of l(π) (and log(l(π))) is attained at π̂ = y/n = 71/200 = 0.3550. Instead of plotting l(π) against π, we can use calculus to obtain the value of π at which l(π) or equivalently log(l(π)) attains its maximum. We differentiate log(l(π)) with respect to π, equate the derivative to zero and solve for π. [Technical Note: Note that log(π)) = log ( n! y!(n− y)! ) + y log(π) + (n− y) log(1− π), and d dπ log(l(π)) = y/π − (n− y)/(1− π), 0 = d dπ log(l(π)) =⇒ π̂ = y/n.] Hypothesis testing (Wald and score statistics) Suppose we wish to test if the percentage of college graduates is 30% or it is higher. In the statistical jargon we state this as: test H0 : π = 0.3 against H1 : π > 0.3 at some given level of significance, say α = 0.05. The test statistic is z = p− 0.3 SE , where SE is an estimate of √ V ar(p) = √ π(1− π)/n. If we estimate SE as √ p(1− p)/n, then the cor- responding z-statistic is called the Wald statistic. It SE is estimated at the null (ie, π = 0.3), then the 1 15e-23 3.0e-23 0.0e+00 -200 400 Plot of I(x) against x Plot of log(I(z)) against 7 0.0 02 04 06 08 1.0 Suppose that in random sample of size n = 20, we have 8 college graduates. We want to test H0 : π = 0.3 vs H1 : π > 0.3 using the score statistic. We would like to calculate the p-value using the normal approximation. When ni is of moderate size, the accuracy of the approximation can be improved by using what is known as the ’continuity correction. Note that estimated value of π and z-statistic are π̂ = 8/20 = 0.4, z = π̂ − 0.3√ π0(1− π0)/n = 0.1√ (0.3)(1− 0.3)/20 = 0.9759. The p-value can be calculated as the area to the right of 0.9759 under the standard normal curve, which equals 0.1646. Now let us look at the original definition of the p-value. The p-value is the probability of getting 8 or more graduates if the null were true (ie, when π = 0.3). Thus the p-value is (exact calculation using binomial probabilities) P (Y ≥ 8) = 1− P (Y ≤ 7) = 0.2277. This value can be obtained using the R command: 1-pbinom(7,size=16,prob=0.3). Note that pbinom(7,size=16,prob=0.3) gives you P (Y ≤ 7). Central Limit Theorem tells us under H0 the random variable /Y , the number of college graduate, is approximately normally distributed with mean nπ0 = (20)(0.3) = 6 and SD = √ nπ0(1− π0) =√ (20)(0.3)(1− 0.7) = 2.0494. Without continuity correction, we approximate P (Y ≥ 8) by area to the right of (8− 6)/2.0494 = 0.9759 under the standard normal curve, and this area is 0.1646. Thus the approximate p-value using normal approximation without continuity correction is 0.1646. If we use continuity correction, then P (Y ≥ 8) is approximated by the area under the standard normal curve to the right of (7.5− 6)/2.0494 = 0.7319. Area to the right of 0.7319 under the normal curve is 0.2321. With continuity correction, the p-value is approximately equal to 0.2321. Note that the p-value using the continuity correction is much closer to the correct value than the normal approximation without continuity correction. Some useful R commands. If Y ∼ binomial(20, 0.4), then (a) P (Y = 6) can be obtained by the R command dbinom(6,size=20,prob=4), (b)P (Y ≤ 6) can be obtained by the R command pbinom(6,size=20,prob=4). If Y ∼ Poisson(4), then (a) P (Y = 6) can be obtained by the R command dbpois(6,lambda=4), (b)P (Y ≤ 6) can be obtained by the R command ppois(6,lambda=4). If Y ∼ N(3, 22), then (a) P (Y ≤ 6) can be obtained by the R command pnorm(6,mean=3,sd=2), (b) 0.9 quantile can be obtained by the R command pnorm(0.9,mean=3,sd=2). [0.9 quantile of 5 the distribution of Y is that value y such that 0.9 = P (Y ≤ y).] A note on Normal approximation for the binomial. For large n, the binomial distribution may be approximated by the normal distribution. However, when the sample size is moderate, this approximation can be improved by using what is known as the ’continuity correction’. Let Y ∼ binomial(16, 0.6). Then E(Y ) = nπ = (16)(0.6) = 9.6, V ar(Y ) = nπ(1− π) = (16)(0.6)(0.4) = 3.84, SD(Y ) = √ 3.84 = 1.9596. Suppose that we want to calculate P (Y ≤ 8). Then P (Y ≤ 8) = 8∑ y=0 P (Y = y) = 8∑ y=0 ( 16 y ) (0.6)y(0.4)16−y = 0.284. If we use the normal approximation for the binomial without continuity correction, then P (Y ≤ 8) is approximately equal to the area under the normal curve to the left of (8− E(Y ))/SD(Y ) = (8− 9.6)/1.9596 = −0.8165. So using the normal table, we have P (Y ≤ 8) ≈ 0.207. Note that this approximation is not that accurate. The method of continuity correction suggests approximating P (Y ≤ 8) by the area under the normal curve to the left of (8.5− E(Y ))/SD(Y ) = (8.5− 9.6)/1.9596 = −0.5613, and this area is about 0.287 which is close to the correct value. Suppose that we want to calculate P (Y = 8) using the normal approximation. First note that P (Y = 8) = 0.142 using the binomial formula. We may try to approximate this probability by finding the area under the normal curve between (7.5− 9.6)/1.9596 = −1.0717 and (8.5− 9.6)/1.9596 = −0.5613, and this area is about 0.145. Once again, note that this approximation is close to the true value. Hypothesis testing for the multinomial. A recruiting company believes that the highest educational levels of the adults in a certain large city are: 40% college graduates, 40% high school graduates, and 20% none. In order to verify this claim, a sample of n = 200 adults are taken, and let n1, n2 and n3 be the sample counts of college graduates, high school graduates, and no degree, respectively. Then (n1, n2, n3) ∼ multinomial(n;π1, π2, π3), where π1, π2 and π3 are the true population proportions. Based on the data we would like to test H0 : π1 = π10, π2 = π20, π3 = π30 against H1 : at least one πj 6= πj0, where π10 = 0.4, π20 = 0.4, and π30 = 0.2. In general if the multinomial has c categories, we may want to test H0 : πj = πj0, j = 1, . . . , c against H1 : at least one πj 6= πj0, where the values of πj0 are prespecified. We know that E(nj) = nπj . If H0 were 6 true, then E(nj) = nπj0 is the expected count (frequency) under H0. Denote the expected frequency nπj0 under H0 by µj . Since π̂j = nj/n estimates πj , any reasonable hypothesis testing method should compare π̂j’s to πj0’s, or equivalently compare nj’s to µj’s.. Two well known test statistics are X2 = c∑ j=1 (nj − µj)2 µj [Pearson’s chi square statistic], G2 = 2 ∑ nj log(nj/µj) [Likelihood Ratio (LR) statistic]. Under H0, both X2 and G2 are approximately distributed as χ2c−1 (chi-square with c− 1 df) if µj = nπj0 is large for all j. If we use any of these statistics, then we can reject H0 if the value of the statistic is larger than the cutoff value obtained from the chi-square table. Example. We have the following counts based on a random sample of n = 200 adults in a large city: 93 (college graduate), 75 (high school graduates), 32 (none). Let π10 = 0.4, π20 = 0.4, π30 = 0.2. We would like to test H0 : πj = πj0, j = 1, . . . , c = 3 against H1 : at least one πj 6= πj0, at level α = 0.05. In this example c = 3. The expected counts under H0 are: µ1 = nπ10 = (200)(0.4) = 80, µ2 = nπ20 = (200)(0.4) = 80, µ3 = nπ30 = (200)(20) = 40. Thus we have X2 = c∑ j=1 (nj − µj)2 µj = (93− 80)2 80 + (75− 80)2 80 + (32− 40)2 40 = 4.0250, G2 = 2 ∑ nj log(nj/µj) = 2 [(93) log(93/80) + (75) log(75/80) + (32) log(32/40)] = 4.0446 Area to the right of 5.9915 under the chi-square curve with c−1 = 2 df is 0.05 (the R command qchisq(0.95,2) yields 5.9915). Note that both X2 and G2 are smaller than 5.9915. Thus the null hypothesis cannot be rejected by any of the two-tests. If we use Pearson’s chi-square, then the p-value is area to the right of 4.0250 under the chi-square curve with 2 df, ie, p-value=P (χ22 ≥ 4.0250). This p-value is 0.1334 (using the R command 1-pchisq(4.0250,2)). Remark 2. (a) Both Pearson’s chi-square and LR tests assume that µj = nπj0 is large for all j. A rule of thumb: µj should be 5 or larger for all j. (b) If the null hypothesis were true (or if the true πj’s are close to πj0’s), then the values of X2 and G2 are usually not all that different. However, if the true πj’s are quite different from πj0’s, then the values of X2 and G2 may be quite different. 7
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved