Download Two-Sample Hypothesis Testing: Paired and Independent Samples and more Exams Asian literature in PDF only on Docsity! PSTAT 120C: Two sample tests April 9, 2009 Two sample tests • Is there a difference between two populations? • Two populations: ? Male and Female. ? White and Minority. ? Smokers and non-Smokers. ? Control and Treatment. ? New and old. Difference tests • 2 independent samples with different means • Paired data Normal distributions with different means X1, . . . , Xn ∼ N (µX , σ2X) Y1, . . . , Ym ∼ N (µY , σ2Y ) H0 : µX = µY HA : µX 6= µY two sided µX > µY one sided µX < µY These are equivalent to hypotheses H0 : µX − µY = 0 HA : µX − µY 6= 0 two sided µX − µY < 0 one sided µX − µY > 0 Test statistic is based on x̄− ȳ. 1 Distributions x̄ ∼ N ( µX , σ2X n ) ȳ ∼ N ( µY , σ2Y m ) x̄− ȳ ∼ N ( µX − µY , σ2X n + σ2Y m ) x̄− ȳ√ σ2X n + σ2Y m ∼ N (µX − µY , 1) This is a standard normal under the null hypothesis. If σX = σY then the statistic is x̄− ȳ σ √ 1 n + 1 m A t distribution Under the assumption that σx = σy, estimate σ2 using∑n i=1(xi − x̄)2 + ∑m j=1(yi − ȳ)2 σ ∼ χ2n+m−2 because the y and x are independent. (x̄− ȳ)/ ( σ √ 1 n + 1 m ) √∑n i=1(xi−x̄)2+ ∑m j=1(yi−ȳ)2 σ2(n+m−2) So let s2pool = 1 n+m− 2 n∑ i=1 (xi − x̄)2 + m∑ j=1 (yi − ȳ)2 then the t statistic t = x̄− ȳ spool √ 1 m + 1 n has a t distribution with n+m− 2 degrees of freedom. Example See Example 10.7 on pages 500–501 in seventh edition pages 472–473 in the sixth edition. 2 with the two terms that makes this∑ (xi − x̄)2 + ∑ (yi − ȳ)2 + m2n(x̄− ȳ)2 (m+ n)2 + mn2(x̄− ȳ)2 (m+ n)2 where the last term is equal (x̄− ȳ)2( 1 n + 1 m ) . Therefore 1 Λ = 1 + (x̄− ȳ)2 ( ∑ (xi − x̄)2 + ∑ (yi − ȳ)2) ( 1 n + 1 m ) . which increases with t2 Paired Sample Diet Example • I have a new fancy diet and I ask 25 subjects to try it out. • X1, . . . , X25 are their weights before they go on the diet. • Y1, . . . , Y25 are their weights after • Did they lose weight? • H0 : µx = µy versus Ha : µx > µy • Here the Xi is not independent of the Yi Suppose that the data is a series of pairs (Xi, Yi) with means µx and µy. Then the difference of each pair is Di = Xi − Yi are independent normals with means µx − µy. The statistic t = d̄ sd/ √ n for sd = √√√√ 1 n− 1 n∑ i=1 (di − d̄)2. n = 25 • Just like the one sample t test. Treat Di as a whole new data set. • No worries about different variances because it all comes out in the wash. 5 Matched Pairs • Twin Studies • Split-plot designs • Before - after • Shoes Moon Illusion Data is the perceived increase in the size of the moon at the horizon. Each person gave a measure of their perception of the size of the moon at elevation and with eyes elevated. H0 : Illusion is the same at each elevation, µd = 0 Ha: Illusion is greater at when the eyes are elevated, µd > 0. In fact, d̄ = 0.0035 with s.e. s/ √ 10 = 0.0133 which gives an insignificant t = 0.26. The critical value for this t with 9 degrees of freedom is 1.833 (one-sided test). Deciding between paired and independent samples • If the two samples are of different sizes then they cannot be paired. • If each observation in one sample is positively correlated (both big at the same time) with exactly one observation in the other sample. • Two measurements on the same subject (pre and post tests, etc.) are always paired-tests Benefit of pairs • Simple one sample test • d̄ = x̄− ȳ • Variance of D Var(x̄− ȳ) = Var(x̄) + Var(ȳ)− 2 Cov(x̄, ȳ) = 1 n (Var(X) + Var(Y )− 2 Cov(X,Y )) • Variance estimation is easier (no issues of different variances) 6 The difference between pairs and independent samples. If (Xi, Yi) are independent random variables then x̄− ȳ = d̄ ∼ N ( µx − µy, σ2x + σ 2 y n ) s2p = 1 2n− 2 n∑ i=1 (xi − x̄)2 + (yi − ȳ)2 s2d = 1 n− 1 n∑ i=1 (xi − yi − x̄+ ȳ)2 = 1 n− 1 n∑ i=1 (xi − x̄)2 + (yi − ȳ)2 − 2(xi − x̄)(yi − ȳ) = 2s2p − 2 n− 1 n∑ i=1 (xi − x̄)(yi − ȳ) Thus sp is nearly half of sd. tp = d̄ sp √ 2/n df = 2(n− 1) td = d̄ sd/ √ n df = n− 1 td = d̄√ 2s2p/n− 2n−1 ∑n i=1(xi − x̄)(yi − ȳ)/n The quantity 1 n− 1 n∑ i=1 (xi − x̄)(yi − ȳ) is an estimate of the Cov(X,Y ) and should be nearly 0 for independent data. Matched pairs • Pairwise test is often simpler and more powerful. • Thus, the design is often better if we can construct pairs. • Have the same subjects go through both protocols. • Match up subjects according to some criteria. 7