Download Hypothesis Testing on Proportions: Testing Equality of Two Proportions and Chi-Square Test and more Exams Biostatistics in PDF only on Docsity! 1 Recall: Hypothesis testing on Proportions Let p be the true proportion of times that an event occurs in a population. Suppose we would like to test H0 : p = p0 against HA : p 6= p0, We collect a sample of size n from the population of interest. Under the null, the estimate of the standard error of p̂ takes the form √ √ √ √ √ p0(1 − p0) n The appropriate test statistic is Z = (p̂ − p0) √ p0(1−p0) n If the null is true (p = p0), this statistic is approximately standard normal for n large (we defined how large n needs to be last time). An approximate 100(1 − α) % confidence interval for p: p̂ − zα/2 √ √ √ √ √ p̂(1 − p̂) n , p̂ + zα/2 √ √ √ √ √ p̂(1 − p̂) n 2 Suppose we would like to test H0 : p1 = p2 against HA : p1 6= p2, We collect a sample of size n1 from the first population and a sample of size n2 from the second population. Under the null, the estimate of the standard error of the difference p1 − p2 takes the form √ √ √ √ √p̂(1 − p̂) 1 n1 + 1 n2 where p̂ = n1p̂1+n2p̂2 n1+n2 The appropriate test statistic is Z = (p̂1 − p̂2) − (p1 − p2) √ p̂(1 − p̂) [ 1 n1 + 1n2 ] If n1 and n2 are large, this statistic is approximately standard normal. 3 A few lectures ago, we considered the effectiveness of bike helmets in preventing head injury. In particular, we considered two random samples: one of size 147 from a population of people that wear helmets and the other of size 646 from a population of people that do not wear helmets. We record that 17 of the 147 suffered a serious head injury and 218 of the 646 suffered a serious head injury. We wanted to know if the proportion of serious head injuries was the same in the two populations. Recall the evaluated test statistic was z = (0.116 − 0.337) √ 0.296(1 − 0.296) [ 1 147 + 1 646 ] = −5.3 p-value = P (Z ≤ −5.3) + P (Z ≥ 5.3) = 5.8 · 10(−8) + 5.8 · 10(−8) = 1.16 · 10(−7) The null was rejected at significance level α = 0.01. 4 Another way to approach the same question is to consider a random sample of 793 bike riders and classify the riders using two questions: 1. Do you wear a helmet ? 2. Have you suffered a serious head injury ? Wearing Helmet Head Injury (Y) (N) total + (Y) 17 218 235 - (N) 130 428 558 total 147 646 793 What would we expect this table to look like if the null was true ? 5 Wearing Helmet Head Injury (Y) (N) total + (Y) NA NA 235 - (N) NA NA 558 total 147 646 793 If the null was true, then the proportion of people suffering head injuries would be the same in the two populations (those that wear helmets and those that do not wear helmets). The proportion of people suffering head injuries is 235793 = 0.296; and the proportion of people not suffering head injuries is 558 793 = 0.704 As a result, if the null is true, then for the 147 people wearing helmets, we would expect 29.6 % (43.6) of them to suffer a head injury and 70.4 % (103.4) of them to be free of head injury. Similar reasoning applies to the the 646 people wearing helmets. So, if the null is true, we’d expect the table to look like the one below: Wearing Helmet Head Injury (Y) (N) total + (Y) 43.6 191.4 235 - (N) 103.4 454.6 558 total 147 646 793 6 How far is this expected table from the observed table ? Wearing Helmet Head Injury (Y) (N) total + (Y) 43.6 191.4 235 - (N) 103.4 454.6 558 total 147 646 793 Wearing Helmet Head Injury (Y) (N) total + (Y) 17 218 235 - (N) 130 428 558 total 147 646 793 7 You could think about summing the squared differences between the four cells in the two tables: (17 − 43.6)2 + (218 − 191.4)2 + (130 − 103.4)2 + (428 − 454.6)2. Under the null, X2 = 4 ∑ i=1 (Oi − Ei) 2 Ei is approximately chi-square (χ2) distributed with (2 − 1) · (2 − 1) degrees of freedom. For this example, the value of the test statistic is x2 = (17 − 43.6)2 43.6 + (218 − 191.4)2 191.4 + (130 − 103.4)2 103.4 + (428 − 454.6)2 454.6 = 28.32 p-value: P (χ21 ≥ 28.32) = 1.028 · 10 (−7) The null is rejected and we conclude that there is an association between helmet wearing and suffering of a serious head injury. 8 Note: Since we are using discrete observations to estimtate a continuous distribution, a continuity correction could be applied which might make the approximation of the test statistic a little better. Yates proposed such a correction. Under the null, X2 = 4 ∑ i=1 (|Oi − Ei| − 0.5) 2 Ei is approximately chi-square (χ2) distributed with (2 − 1) · (2 − 1) degrees of freedom. In the example above, the value of this corrected statistic is 27.27 and the pvalue is 1.769 ∗ 10(−7). In practice, you will often see this correction applied to 2 x 2 tables.