Download Hypothesis Testing and Power Analysis for Comparing Means and more Study notes Statistics in PDF only on Docsity! Hypothesis tests and confidence intervals The 95% confidence interval for µ is the set of values, µ0, such that the null hypothesis H0 : µ = µ0 would not be rejected (by a two-sided test with α = 5%). The 95% CI for µ is the set of plausible values of µ. If a value of µ is plausible, then as a null hypothesis, it would not be rejected. For example: 9.98 9.87 10.05 10.08 9.99 9.90 (assumed iid normal(µ,σ).) X̄ = 9.98; s = 0.082; n = 6 qt(0.975,5) = 2.57 95% CI for µ = 9.98 ± 2.57 · 0.082 / √ 6 = 9.98 ± 0.086 = (9.89,10.06) 1 Sample size calculations n = $ available $ per sample Too little data −→ A total waste Too much data −→ A partial waste 2 Power X 1, . . . , X n iid normal(µA, σA) Y 1, . . . , Y m iid normal(µB, σB) Test H0 : µA = µB vs Ha : µA 6= µB at α = 0.05. Test statistic: T = X̄ − Ȳ ŜD(X̄ − Ȳ ) . Critical value: C such that Pr(|T| > C | µA = µB) = α. Power: Pr(|T| > C | µA 6= µB) C0 ∆− C Power 3 Power depends on... • The design of your experiment •What test you’re doing •Chosen significance level, α •Sample size • True difference, µA − µB •Population SD’s, σA and σB. 4 Choice of sample size We mostly influence power via n and m. Power is greatest when σ 2 A n + σ2B m is as small as possible. Suppose the total sample size N = n + m is fixed. σ2A n + σ2B m is minimized when n = σA σA+σB N and m = σBσA+σBN For example: If σA = σB, we should choose n = m. If σA = 2 σB, we should choose n = 2 m. (e.g., if σA = 4 and σB = 2, we might use n=20 and m=10) 9 Calculating the sample size Suppose we seek 80% power to detect a particular value of µA − µB = ∆, in the case that σA and σB are known. (For convenience here, let’s pretend that σA = σB and that we plan to have equal sample sizes for the two groups.) Power ≈ Pr ( Z > C − ∆√ σ2A n + σ2B m ) = Pr ( Z > 1.96− ∆ √ n σ √ 2 ) −→ Find n such that Pr ( Z > 1.96− ∆ √ n σ √ 2 ) = 80%. Thus 1.96− ∆ √ n σ √ 2 = qnorm(0.2) = –0.842. =⇒ √ n = σ∆ [1.96− (−0.842)] √ 2 =⇒ n = 15.7× (σ∆) 2 10 Equal but unknown population SDs X 1, . . . , X n iid normal(µA, σ) Y 1, . . . , Y m iid normal(µB, σ) Test H0 : µA = µB vs Ha : µA 6= µB at α = 0.05. σ̂p = √ s2A(n−1)+s2B(m−1) n+m−2 ŜD(X̄ − Ȳ ) = σ̂p √ 1 n + 1 m Test statistic: T = X̄ − Ȳ ŜD(X̄ − Ȳ ) . In the case µA = µB, T follows a t distribution with n + m – 2 d.f. Critical value: C = qt(0.975, n+m-2) 11 Power: equal but unknown pop’n SDs Power = Pr ( |X̄−Ȳ | σ̂p √ 1 n+ 1 m > C ) In the case µA − µB = ∆, the statistic X̄−Ȳ σ̂p √ 1 n+ 1 m follows a non-central t distribution. This distribution has two parameters: degrees of freedom (as before) the non-centrality parameter, ∆ σ √ 1 n+ 1 m C <- qt(0.975, n + m - 2) se <- sigma * sqrt( 1/n + 1/m ) power <- 1 - pt(C, n+m-2, ncp=delta/se) + pt(-C, n+m-2, ncp=delta/se) 12 0 20 40 60 80 100 P ow er Power curves − 2σ − σ 0 σ 2σ ∆ n = 20 n = 10 n = 5 known SDs unknown SDs 13 A built-in function: power.t.test() Calculate power (or determine the sample size) for the t-test when: • Sample sizes equal • Population SDs equal Arguments: • n = sample size • delta = ∆ = µ2 − µ1 • sd = σ = population SD • sig.level = α = significance level • power = the power • type = type of data (two-sample, one-sample, paired) • alternative = two-sided or one-sided test 14 0.0 0.5 1.0 1.5 2.0 2.5 0 20 40 60 80 100 ∆ P ow er 19 Determining sample size The things you need to know: • Structure of the experiment •Method for analysis • Chosen significance level, α (usually 5%) • Desired power (usually 80%) • Variability in the measurements – If necessary, perform a pilot study, or use data from prior experiments or publi- cations • The smallest meaningful effect 20 Reducing sample size • Reduce the number of treatment groups being compared. • Find a more precise measurement (e.g., average survival time rather than proportion dead). • Decrease the variability in the measurements. – Make subjects more homogenous. – Use stratification. – Control for other variables (e.g., weight). – Average multiple measurements on each subject. 21 Tests to compare two means 1. Assume σ1 ≡ σ2 (a) Calculate pooled estimate of population SD (b) ŜE = σ̂pooled √ 1 n + 1 m (c) Compare to t(df = n + m – 2) In R: t.test with var.equal=TRUE 2. Allow σ1 6= σ2 (a) ŜE = √ s21 n + s22 m (b) Compare to t with df from nasty formula. In R: t.test with var.equal=FALSE (the default) 22 Estimated type I error rates X 1, . . . , X 4 iid normal(µ, σ) Y 1, . . . , Y 4 iid normal(µ, σ×τ ) 10,000 simulations τ = 1 Allow σ1 6= σ2 Assume σ1 ≡ σ2 FTR H0 Reject H0 FTR H0 0.948 0.000 0.948 Reject H0 0.009 0.043 0.052 0.957 0.043 τ = 2 Allow σ1 6= σ2 Assume σ1 ≡ σ2 FTR H0 Reject H0 FTR H0 0.940 0.000 0.940 Reject H0 0.012 0.048 0.060 0.952 0.048 τ = 1.5 Allow σ1 6= σ2 Assume σ1 ≡ σ2 FTR H0 Reject H0 FTR H0 0.944 0.000 0.944 Reject H0 0.009 0.047 0.056 0.953 0.047 τ = 4 Allow σ1 6= σ2 Assume σ1 ≡ σ2 FTR H0 Reject H0 FTR H0 0.924 0.000 0.924 Reject H0 0.023 0.054 0.076 0.946 0.054 23 Estimated power X 1, . . . , X 4 iid normal(µ, σ) Y 1, . . . , Y 4 iid normal(µ+2, σ×τ ) 10,000 simulations τ = 1 Allow σ1 6= σ2 Assume σ1 ≡ σ2 FTR H0 Reject H0 FTR H0 0.344 0.000 0.344 Reject H0 0.046 0.611 0.656 0.389 0.611 τ = 2 Allow σ1 6= σ2 Assume σ1 ≡ σ2 FTR H0 Reject H0 FTR H0 0.658 0.000 0.658 Reject H0 0.060 0.282 0.342 0.718 0.282 τ = 1.5 Allow σ1 6= σ2 Assume σ1 ≡ σ2 FTR H0 Reject H0 FTR H0 0.532 0.000 0.532 Reject H0 0.057 0.411 0.468 0.589 0.411 τ = 4 Allow σ1 6= σ2 Assume σ1 ≡ σ2 FTR H0 Reject H0 FTR H0 0.836 0.000 0.836 Reject H0 0.047 0.117 0.164 0.883 0.117 24