Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Comparing Means & Proportions: Independent vs. Dependent Samples, Exams of Statistics

An explanation of how to compare population means and proportions using independent and dependent samples. It includes formulas for calculating confidence intervals and test statistics, as well as examples using real data. Hypothesis testing and the relationship between confidence intervals and significance tests.

Typology: Exams

Pre 2010

Uploaded on 09/17/2009

koofers-user-9s2
koofers-user-9s2 🇺🇸

10 documents

1 / 20

Toggle sidebar

Related documents


Partial preview of the text

Download Comparing Means & Proportions: Independent vs. Dependent Samples and more Exams Statistics in PDF only on Docsity! Chapter 7 Comparing Two Populations In this Chapter we are going to compare the means or proportions of two populations, using a (simple) random sample from each of the populations. The samples can be either independent of each other or they may be dependent on each other. Definition: Two random samples are said to be independent samples if the selection of a unit from one population has no effect on the selection or non- selection of another unit from the second population. Otherwise the samples are said to be dependent samples. Independent samples are more common than the dependent samples. However, in some applications the selection of one unit will from one of the populations determines the selection of another one from the second population. Such samples are said to be dependent samples. In such applications one unit from each population come in pairs. Thus we have a random sample of pairs. These pairs either come naturally (e.g., twins-studies, studies of married couples, observations on the same person under two different conditions, etc.) or the pairs are created by the experimenter, being matched on as many characteristics as possible, except the one characteristic of interest to the researcher. Thus, sometimes we do create dependence by matching. These are called matched samples. The general formula and the steps you have seen in Chapters 5 and 6 apply in this Chapter as well, with some minor differences. 7.1 Comparing two population means Using independent samples Notation: Suppose we have a population of X’s with mean X and standard deviation X and a population of Y’s with mean Y and standard deviation Y. We are interested in the difference if the population means, X – Y. We select a simple random sample, of size nX, from the population of X’s and calculate X and SX. Similarly we select an independent simple random sample of size nY from the population of Y’s and calculate Y and SY. A Confidence Interval for X – Y Using Independent Samples As in Chapter 5, the general formula for the CI is (Estimate  ME) Where ME = (table-value)( SE of the estimate) Since the parameter of interest is X – Y, a point estimate of the difference of population means is X Y . Note the changes in SE and ME: 1 1 1 96 16 357 0 01958 0 6276 pooled X Y ME t S n n . . . .         Therefore a 95% CI for X – Y is       18 1 32 6 0 6276 14 5 0 6276 15 1276 13 8724 CI ( . . ) . . . . , .          Interpretation of the CI We may write the interpretation in the usual way and say “We are 95% confident that the difference of the mean number of hours/week of housework males and females perform is a point in the interval (– 15.1, – 13.9).” However, the above CI tells us more: Since both ends of the CI are negative, and the CI is for the difference between the mean for men and the mean for women, we can state with 95% confidence that men work less than women on the average, the difference between the means of the populations being between 13.9 and 15.1 hours per week. Significance Tests for X – Y With Independent Samples We have the same 6 steps as before: 1. Assumptions 2. Hypotheses 3. Test Statistics 4. P-Value 5. Decision 6. Conclusion. Let’s look at these in detail. 1. Assumptions: a) Independent simple random samples from two populations (explanatory variable) b) Quantitative (Response) Variable c) Large samples (nX ≥ 20. nY ≥ 20) 2. Hypotheses: One of the following pairs: Ho: X – Y = 0 vs. Ha: X – Y  0 or Ho: X – Y = 0 vs. Ha: X – Y > 0 or Ho: X – Y = 0 vs. Ha: X – Y < 0. 3. Test Statistic: When  X   Y the test statistic is 2 2 0 ( df ) X Y X Y ( X Y ) ( X Y ) T ~ t SE( X Y ) S S n n        With df = smaller of (nx – 1) and (nY – 1) or When  X =  Y the test statistic is 0 1 1 ( df ) pooled X Y ( X Y ) ( X Y ) T ~ t SE( X Y ) S n n        With df = nX + nY – 2 and 2 21 1 2 X X Y Y pooled X Y ( n )S ( n )S S n n       P-Value = 2 × P(T ≥ | Tcal |) = 2 × P(T ≥ | – 45.3 |) = 2 × 0 = 0 (Why?) 5. Decision: Reject Ho (very strong rejection) since we have p-value < any reasonable value of . 6. Conclusion: The observed data give very strong evidence that the mean time spent for housework by men is significantly different from the mean time spent by females. (p-value = 0 almost). Relation between CI and Significance Test We have rejected the null hypothesis of equal population means using the significance test. Can we get the same result from the CI? We have found A 95% CI for M – F =  15 088 13 912. , .  Since zero is not in the above CI we will reject Ho at 5% level of significance, (remember that Ho: M – F = 0) so we reach the same decision and hence same conclusion by both of the methods. Points to ponder: 1. Do we always reach the same decision by the two methods? 2. Do we lose something by using the CI instead of the significance test? 7.2 Comparing two population proportions Using large independent samples Given two independent random samples of size n1 and n2 from two populations in which the proportions of population units that have a characteristic of interest are 1 and 2 respectively. We are interested in making inferences about the difference of population proportions, i.e., inferences about 1 – 2. A CI for 1 – 2 Using large independent samples The general formula we have seen before apply here with slight changes: CI = (Estimates  ME) Where ME = (table-value)(SE of estimates) In this section since the parameter of interest is 1 – 2 its point estimate is p1 – p2, where p1 and p2 are the sample proportions. The SE of the estimate is estimated by 1 1 2 2 1 2 1 2 1 1p ( p ) p ( p ) SE( p p ) n n      Hence a large-sample CI for 1 – 2 is  1 2( p p ) ME  where 1 2 1 1 2 2 1 2 1 1 ME z SE( p p ) p ( p ) p ( p ) z n n         Example: Problem 10 on page 236 For a random sample of 1600 Canadians taken in January, 880 people indicate approval of the prime minister’s performance. A similar poll a month later of a separate random sample of 1600 Canadians has a favorable rating by 944 people. Let 1 = true population proportion in January of all Canadians who approve the prime minister’s performance and 2 = true population proportion in February. a) Calculate point estimates of the true population proportions. Calculate the difference and interpret Given n1 = 1600, X1 = 880, n2 = 1600, X2 = 944 If Ha: 1 – 2  0 then p-value = 2 × P(Z ≥ | Zcal |) If Ha: 1 – 2 > 0 then p-value = P(Z ≥ Zcal) If Ha: 1 – 2 < 0 then p-value = P)Z ≤ Zcal) 5. Decision Same as before 6. Conclusion Same as before Example (Problem 10 Continued) Test the claim that the prime minister’s approval rating has increased from January to February. Here we are asked to test 2 > 1 or equivalently to test Ho: 2 – 1 = 0 vs. Ha: 2 – 1 > 0 Are the assumptions satisfied? Test statistic: 2 1 1 2 0 1 1 1 1 p p Z ~ N( , ) p( p ) n n     P-Value: We already have calculated p1 = 0.55 and p2 = 0.59. We need p to find the calculated value of Z. Let’s first find the pooled sample estimate of p. 1 2 1 2 880 944 0 57 1600 1600 X X p . n n        Hence, 0 59 0 55 1 1 0 57 1 0 57 1600 1600 0 04 4 62 0 0086656 cal . . Z . ( . ) . . .        So, p-value = P(Z ≥ Zcal) = P(Z > 4.62) = 0.00000192 Decision: Reject Ho. Conclusion: The observed data give strong evidence of a significant increase in the approval rate of the prime minister from January to February (p = 0.00000192). The case for small samples: Comparing two (or more) population proportions with small samples to be covered in Chapter 8. This will include Fisher’s exact test for proportions (page 224) as well as contingency tables and conditional probabilities (page 220). So, skip these sections for the time being. 7.4 Comparing Dependent Samples Dependent samples (also called Matched samples) are samples that are selected as pairs. These pairs either occur naturally or they are created by the researcher in such a way that the characteristics of the two units in each pair are as much alike as possible, except with respect to the variable of interest to the researcher. Hence for each unit in sample 1 there is a matching unit in sample 2. Dependent samples are especially useful in controlling the effect of all (or as many as possible of) the factors other than the one we are interested. We are interested in the Difference in the variable of interest (Di) between the two units in each pair: Di = (Observation on ith unit in sample 2 – observation on ith unit in sample 1 = Xi – Yi for i = 1, 2, …, n Thus, we have one sample, of size n, of differences, D1, D2, …, Dn These differences are a random sample of from a (hypothetical) population of differences that has mean D (= 2 – 1), estimated by D and standard deviation D, estimated by SD. Then, a CI for D is  DD t S / n 
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved