Download Confidence Intervals for Population Means: Understanding the Concept and Calculation and more Study notes Statistics in PDF only on Docsity! Review If X 1, . . . , X n have mean µ and SD σ, E(X̄ ) = µ no matter what SD(X̄ ) = σ/ √ n if the X ’s are independent If X 1, . . . , X n are iid normal(mean=µ, SD=σ), X̄ ∼ normal(mean = µ, SD = σ/ √ n). If X 1, . . . , X n are iid with mean µ and SD σ and the sample size, n, is large, X̄ ∼ normal(mean = µ, SD = σ/ √ n). 1 A discrepancy Caution: Sometimes the order in which the book covers material is a bit odd. (The authors would probably think I’m odd.) But sometimes it is just wrong. (Or perhaps they are making some simplifications to ease learning.) A case in point: Let X 1, . . . , X n be random draws from a population with mean µ and SD σ, and X̄ the sample average. Book Karl σ population SD population SD s SD of the data our estimate of σ σ/ √ n SD of the sampling distribution of X̄ SD(X̄ ) aka SE(X̄ ) s/ √ n Standard error of the mean our estimate of SE(X̄ ) 2 Confidence intervals Suppose we measure the log10 cytokine response in 100 male mice of a certain strain, and find that the sample average (x̄) is 3.52 and sample SD (s) is 1.61. Our estimate of the SE of the sample mean is 1.61/ √ 100 = 0.161. A 95% confidence interval for the population mean (µ) is 3.52± (2× 0.16) = 3.52± 0.32 = (3.20, 3.84). What does this mean? What is the chance that (3.20, 3.84) contains µ? 3 Suppose that X 1, . . . , X n are iid normal(mean=µ, SD=σ). Suppose that we actually know σ. Then X̄ ∼ normal(mean=µ, SD=σ/ √ n) where σ is known but µ is not. How close is X̄ to µ? Pr ( |X̄ − µ| σ/ √ n ≤ 1.96 ) = 95% Pr ( −1.96 σ√ n ≤ X̄ − µ ≤ 1.96 σ√ n ) = 95% µ σ n Pr ( X̄ − 1.96 σ√ n ≤ µ ≤ X̄ + 1.96 σ√ n ) = 95% 4 But we don’t know the SD Use of X̄ ± 1.96 σ/ √ n as a 95% confidence interval for µ requires knowledge of σ. That the above is a 95% confidence interval for µ is a result of the following: X̄ − µ σ/ √ n ∼ normal(0,1) What if we don’t know σ? We plug in the sample SD (s), but then we need to widen the intervals to account for the uncertainty in s. 9 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 500 BAD confidence intervals for µ (σ unknown) 10 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 500 confidence intervals for µ (σ unknown) 11 The Student t distribution If X 1, X 2, . . . X n are iid normal(mean=µ, SD=σ), X̄ − µ s/ √ n ∼ t(df = n− 1) Discovered by William Gossett (“Student”) who worked for Guiness. In R, use the functions pt(), qt(), and dt(). e.g., qt(0.975,9) returns 2.26 (cf 1.96) pt(1.96,9)-pt(-1.96,9) returns 0.918 (cf 0.95) −4 −2 0 2 4 df=2 df=4 df=14 normal 12 The t interval If X 1, . . . , X n are iid normal(mean=µ, SD=σ), X̄ ± t(α/2, n− 1) s/ √ n is a 1− α confidence interval for µ. t(α/2, n−1) is the 1−α/2 quantile of the t distribution with n− 1 “degrees of freedom.” −4 −2 0 2 4 t(α 2, n − 1) α 2 In R: qt(0.975,9) for the case n=10, α=5%. 13 Example 1 Suppose we have measured the log10 cytokine response of 10 mice, and obtained the following numbers: Data 0.2 1.3 1.4 2.3 4.2 4.7 4.7 5.1 5.9 7.0 x̄ = 3.68 s = 2.24 n = 10 qt(0.975,9) = 2.26 95% confidence interval for µ (the population mean): 3.68 ± 2.26 × 2.24 / √ 10 ≈ 3.68 ± 1.60 = (2.1, 5.3) ● ●●●● ●●● ● 0 1 2 3 4 5 6 7 95% CI s 14