Download Confidence Intervals for Population Means: Formula, Calculation, and Interpretation and more Study notes Statistics in PDF only on Docsity! Review µ σ Population distribution 0 zα 2 Z = (X − µ) (σ n) µ σ n Distribution of X 0 tα 2 t = (X − µ) (s n) X 1, X 2, . . . , X n independent normal(µ, σ). 95% confidence interval for µ: X̄ ± t s/ √ n where t = 97.5 percentile of t distribution with (n–1) d.f. 1 Example Suppose we have weighed the mass of tumor in 20 mice, and obtained the following numbers Data 34.9 28.5 34.3 38.4 29.6 28.2 25.3 . . . . . . 32.1 x̄ = 30.7 s = 6.06 n = 20 qt(0.975,19) = 2.09 95% confidence interval for µ (the population mean): 30.7 ± 2.09 × 6.06 / √ 20 ≈ 30.7 ± 2.84 = (27.9, 33.5) ● ● ● ●● ●● ●● ● ●● ● ●●● ● ● ● 20 25 30 35 40 95% CI s 2 What is a confidence interval? A confidence interval is the result of a procedure that 95% of the time produces an interval containing the population parameter. In advance, there is a 95% chance that the confidence interval that you obtain will contain the parameter of interest. After the fact, your particular 95% CI either contains the parameter or it doesn’t; we’re not allowed to talk about chance anymore. 0 2 4 6 8 0 2 4 6 8 200 confidence intervals for µ 3 What’s the deal? Why this wacky confidence interval business? We can talk about Pr(data | µ). But we can’t talk about Pr(µ | data). Actually, a portion of modern (and even rather non-modern) statistics (called Bayesian statistics—remember Bayes’s rule?) concerns inferential statements like Pr(µ | data). But this is beyond the scope of the current course. 4 CI for difference between means (X̄ − Ȳ )− (µA − µB) ŜD(X̄ − Ȳ ) ∼ t(df = n + m− 2) The procedure: 1. Calculate (X̄ − Ȳ ). 2. Calculate ŜD(X̄ − Ȳ ). 3. Find the 97.5 percentile of the t distr’n with n + m – 2 d.f. −→ t 4. Calculate the interval: (X̄ − Ȳ ) ± t · ŜD(X̄ − Ȳ ). 9 Example Strain A: 2.67 2.86 2.87 3.04 3.09 3.09 3.13 3.27 3.35 n = 9, X̄ ≈ 3.04, sA ≈ 0.214 Strain B: 3.78 3.06 3.64 3.31 3.31 3.51 3.22 3.67 m = 8, Ȳ ≈ 3.44, sB ≈ 0.250 σ̂pooled = √ s2A(n− 1) + s2B(m− 1) n + m− 2 = . . . ≈ 0.231 ŜD(X̄ − Ȳ ) = σ̂pooled √ 1 n + 1 m = . . . ≈ 0.112 97.5 percentile of t(df=15) ≈ 2.13 10 Example 95% confidence interval: (3.04 – 3.44) ± 2.13 · 0.112 ≈ –0.40 ± 0.24 = (–0.64, –0.16). ●● ●●● ●●● ●●● ●●●● ● 2.8 3.0 3.2 3.4 3.6 3.8 4.0 B A The data −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 Confidence interval for µA − µB 11 Example Strain A: n = 10 sample mean: X̄ = 55.22 sample SD: sA = 7.64 t value = qt(0.975, 9) = 2.26 95% CI for µA: 55.22 ± 2.26 × 7.64 / √ 10 = 55.2 ± 5.5 = (49.8, 60.7) Strain B: n = 16 sample mean: X̄ = 68.2 sample SD: sA = 18.1 t value = qt(0.975, 15) = 2.13 95% CI for µB: 68.2 ± 2.13 × 18.1 / √ 16 = 68.2 ± 9.7 = (58.6, 77.9) 12 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● 40 50 60 70 80 90 100 B A 13 Example σ̂pooled = √ (7.64)2×(10−1)+(18.1)2×(16−1) 10+16−2 = 15.1 ŜD(X̄ − Ȳ ) = σ̂pooled × √ 1 n + 1 m = 15.1× √ 1 10 + 1 16 = 6.08 t value: qt(0.975, 10+16-2) = 2.06 95% confidence interval for µA − µB: (55.2 – 68.2) ± 2.06 × 6.08 = –13.0 ± 12.6 = (–25.6, –0.5) 14 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● 40 50 60 70 80 90 100 B A −30 −20 −10 0 10 20 30 Prev CI for µA − µB New CI for µA − µB 19 Degrees of freedom One sample of size n: X 1, X 2, . . . , X n −→ (X̄ − µ)/(s/ √ n) ∼ t(df = n – 1) Two samples, of size n and m: X 1, X 2, . . . , X n Y 1, Y 2, . . . , Y m −→ (X̄ − Ȳ )− (µA − µB) σ̂pooled √ 1 n + 1 m ∼ t(df = n + m – 2) What are these “degrees of freedom”? 20 Degrees of freedom The degrees of freedom concern our estimate of the population SD. We use the residuals (X 1 − X̄ ), (X 2 − X̄ ), . . . , (X n − X̄ ) to estimate σ. But we really only have n – 1 independent data points (“degrees of freedom”), since ∑ (X i − X̄ ) = 0. In the two-sample case, we use (X 1− X̄ ), (X 2− X̄ ), . . . , (X n− X̄ ) and (Y 1 − Ȳ ), . . . , (Y m − Ȳ ) to estimate σ. But ∑ (X i − X̄ ) = 0 and ∑ (Y i − Ȳ ) = 0, and so we really have just n + m – 2 independent data points. 21 Confidence interval for population SD Suppose we observe X 1, X 2, . . . , X n iid normal(µ, σ). Suppose we wish to create a 95% CI for the population SD, σ. Our estimate of σ is, of course, the sample SD, s. The sampling distribution of s is such that (n− 1)s2 σ2 ∼ χ2(df = n− 1) 0 5 10 15 20 25 30 df = 4 df = 9 df = 19 22 Choose L and U such that Pr ( L ≤ (n−1)s 2 σ2 ≤ U ) = 95%. 0 L U =⇒ Pr ( 1 U ≤ σ2 (n−1)s2 ≤ 1 L ) = 95% =⇒ Pr ( (n−1)s2 U ≤ σ 2 ≤ (n−1)s 2 L ) = 95% =⇒ Pr ( s √ n−1 U ≤ σ ≤ s √ n−1 L ) = 95% =⇒ ( s √ n−1 U , s √ n−1 L ) is a 95% CI for σ. 23 Example Strain A: n = 10 sample SD: sA = 7.64 L = qchisq(0.025, 9) = 2.70 U = qchisq(0.975, 9) = 19.0 95% CI for σA: (7.64 × √ 9 19.0, 7.64 × √ 9 2.70) = (7.64 × 0.688, 7.64 × 1.83) = (5.3, 14.0) Strain B: n = 16 sample SD: sB = 18.1 L = qchisq(0.025, 15) = 6.25 U = qchisq(0.975, 15) = 27.5 95% CI for σB: (18.1 × √ 15 27.5, 18.1 × √ 15 6.25) = (18.1 × 0.739, 18.1 × 1.55) = (13.4, 28.1) 24