Download Inferring Population Proportions: Large Sample Confidence Intervals and Significance Tests and more Exams Statistics in PDF only on Docsity! Inference about a population proportion BPS chapter 20 © 2006 W.H. Freeman and Company Objectives (BPS chapter 20) Inference for a population proportion The sample proportion The sampling distribution of Large sample confidence interval for p Accurate confidence intervals for p Choosing the sample size Significance tests for a proportion p̂ p̂ Sampling distribution of The sampling distribution of is never exactly normal. But as the sample size increases, the sampling distribution of becomes approximately normal. p̂ p̂ p̂ The mean and standard deviation (width) of the sampling distribution are both completely determined by p and n. Thus, we have only one population parameter to estimate, p. Implication for estimating proportions N p, p(1− p) n( ) Therefore, inference for proportions can rely directly on the normal distribution (unlike inference for means, which requires the use of a t distribution with a specific degree of freedom). Conditions for inference on p Assumptions: 1. We regard our data as a simple random sample (SRS) from the population. That is, as usual, the most important condition. 2. The sample size n is large enough that the sampling distribution is indeed normal. How large a sample size is enough? Different inference procedures require different answers (we’ll see what to do practically). Upper tail probability P 0.25 0.2 0.15 0.1 0.05 0.03 0.02 0.01 z* 0.67 0.841 1.036 1.282 1.645 1.960 2.054 2.326 50% 60% 70% 80% 90% 95% 96% 98% Confidence level C Let’s calculate a 90% confidence interval for the population proportion of arthritis patients who suffer some “adverse symptoms.” What is the sample proportion ? ))1( ,( ˆ npppNp −≈ 023.0014.0*645.1 440/)052.01(052.0*645.1 )ˆ1(ˆ* ≈= −= −= m m nppzm 052.0 440 23ˆ ≈=p What is the sampling distribution for the proportion of arthritis patients with adverse symptoms for samples of 440? For a 90% confidence level, z* = 1.645. Using the large sample method, we calculate a margin of error m: With 90% confidence level, between 2.9% and 7.5% of arthritis patients taking this pain medication experience some adverse symptoms. 023.0052.0or ˆ:forCI%90 ± ±mpp p̂ Because we have to use an estimate of p to compute the margin of error, confidence intervals for a population proportion are not very accurate. m = z * ˆ p (1− ˆ p ) n Specifically, the actual confidence interval is usually less than the confidence level you asked for in choosing z*. But there is no systematic amount (because it depends on p). Use with caution! “Plus four” confidence interval for p A simple adjustment produces more accurate confidence intervals. We act as if we had four additional observations, two being successes and two being failures. Thus, the new sample size is n + 4 and the count of successes is X + 2. 4nsobservatio all ofcount 2successes of counts~ + + =p )4()~1(~** with,~: +−== ± nppzSEzm mpCI The “plus four” estimate of p is: And an approximate level C confidence interval is: Use this method when C is at least 90% and sample size is at least 10. Upper tail probability P 0.25 0.2 0.15 0.1 0.05 0.03 0.02 0.01 z* 0.67 0.841 1.036 1.282 1.645 1.960 2.054 2.326 50% 60% 70% 80% 90% 95% 96% 98% Confidence level C What sample size would we need in order to achieve a margin of error no more than 0.01 (1%) for a 90% confidence interval for the population proportion of arthritis patients who suffer some “adverse symptoms?” 4.2434)9.0)(1.0( 01.0 645.1*)1(** 22 ≈⎟ ⎠ ⎞ ⎜ ⎝ ⎛=−⎟ ⎠ ⎞ ⎜ ⎝ ⎛= pp m zn We could use 0.5 for our guessed p*. However, since the drug has been approved for sale over the counter, we can safely assume that no more than 10% of patients should suffer “adverse symptoms” (a better guess than 50%). For a 90% confidence level, z* = 1.645. To obtain a margin of error of no more than 1% we would need a sample size n of at least 2435 arthritis patients. Significance test for p The sampling distribution for is approximately normal for large sample sizes, and its shape depends solely on p and n. Thus, we can easily test the null hypothesis: H0: p = p0 (a given value we are testing) z = ˆ p − p0 p0(1− p0) n If H0 is true, the sampling distribution is known The likelihood of our sample proportion given the null hypothesis depends on how far from p0 our p^ is in units of standard deviation. This is valid when both expected counts — expected successes np0 and expected failures n(1 − p0) — are each 10 or larger. p0(1− p0) n p0 ˆ p p̂ P-values and one- or two-sided hypotheses — reminder And as always, if the P-value is smaller than the chosen significance level α, then the difference is statistically significant and we reject H0.