Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Normal Distributions & Hypothesis Testing: Sampling Mean & t-Tests, Slides of Statistics

The concept of sampling distribution of the mean and its relationship with z-scores and t-tests. It covers the normal distribution of population and sampling distribution of mean, the role of sample size and population shape, and the use of z-scores and t-tests in hypothesis testing. It also includes examples of single-sample, paired, and independent samples t-tests.

Typology: Slides

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(620)

8.6K documents

1 / 31

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Normal Distributions & Hypothesis Testing: Sampling Mean & t-Tests and more Slides Statistics in PDF only on Docsity! B. Weaver (27-May-2011) z- and t-tests ... 1 Hypothesis Testing Using z- and t-tests In hypothesis testing, one attempts to answer the following question: If the null hypothesis is assumed to be true, what is the probability of obtaining the observed result, or any more extreme result that is favourable to the alternative hypothesis?1 In order to tackle this question, at least in the context of z- and t-tests, one must first understand two important concepts: 1) sampling distributions of statistics, and 2) the central limit theorem. Sampling Distributions Imagine drawing (with replacement) all possible samples of size n from a population, and for each sample, calculating a statistic--e.g., the sample mean. The frequency distribution of those sample means would be the sampling distribution of the mean (for samples of size n drawn from that particular population). Normally, one thinks of sampling from relatively large populations. But the concept of a sampling distribution can be illustrated with a small population. Suppose, for example, that our population consisted of the following 5 scores: 2, 3, 4, 5, and 6. The population mean = 4, and the population standard deviation (dividing by N) = 1.414. If we drew (with replacement) all possible samples of 2 from this population, we would end up with the 25 samples shown in Table 1. Table 1: All possible samples of n=2 from a population of 5 scores. First Second Sample First Second Sample Sample # Score Score Mean Sample # Score Score Mean 1 2 2 2 14 4 5 4.5 2 2 3 2.5 15 4 6 5 3 2 4 3 16 5 2 3.5 4 2 5 3.5 17 5 3 4 5 2 6 4 18 5 4 4.5 6 3 2 2.5 19 5 5 5 7 3 3 3 20 5 6 5.5 8 3 4 3.5 21 6 2 4 9 3 5 4 22 6 3 4.5 10 3 6 4.5 23 6 4 5 11 4 2 3 24 6 5 5.5 12 4 3 3.5 25 6 6 6 13 4 4 4 Mean of the sample means = 4.000 SD of the sample means = 1.000 (SD calculated with division by N) 1 That probability is called a p-value. It is a really a conditional probability--it is conditional on the null hypothesis being true. B. Weaver (27-May-2011) z- and t-tests ... 2 The 25 sample means from Table 1 are plotted below in Figure 1 (a histogram). This distribution of sample means is called the sampling distribution of the mean for samples of n=2 from the population of interest (i.e., our population of 5 scores). Figure 1: Sampling distribution of the mean for samples of n=2 from a population of N=5. I suspect the first thing you noticed about this figure is peaked in the middle, and symmetrical about the mean. This is an important characteristic of sampling distributions, and we will return to it in a moment. You may have also noticed that the standard deviation reported in the figure legend is 1.02, whereas I reported SD = 1.000 in Table 1. Why the discrepancy? Because I used the population SD formula (with division by N) to compute SD = 1.000 in Table 1, but SPSS used the sample SD formula (with division by n-1) when computing the SD it plotted alongside the histogram. The population SD is the correct one to use in this case, because I have the entire population of 25 samples in hand. The Central Limit Theorem (CLT) If I were a mathematical statistician, I would now proceed to work through derivations, proving the following statements: MEANS 6.005.505.004.504.003.503.002.502.00 Distribution of Sample Means* * or "Sampling Distribution of the Mean" 6 5 4 3 2 1 0 Std. Dev = 1.02 Mean = 4.00 N = 25.00 B. Weaver (27-May-2011) z- and t-tests ... 5 positive skew in the population, the distribution of sample means is near enough to normal for the normal approximation to be useful. What is the rule of 30 about then? In the olden days, textbook authors often did make a distinction between small-sample and large- sample versions of t-tests. The small- and large-sample versions did not differ at all in terms of how t was calculated. Rather, they differed in how/where one obtained the critical value to which they compared their computed t-value. For the small-sample test, one used the critical value of t, from a table of critical t-values. For the large-sample test, one used the critical value of z, obtained from a table of the standard normal distribution. The dividing line between small and large samples was usually n = 30 (or sometimes 20). Why was this done? Remember that in that era, data analysts did not have access to desktop computers and statistics packages that computed exact p-values. Therefore, they had to compute the test statistic, and compare it to the critical value, which they looked up in a table. Tables of critical values can take up a lot of room. So when possible, compromises were made. In this particular case, most authors and statisticians agreed that for n ≥ 30, the critical value of z (from the standard normal distribution) was close enough to the critical value of t that it could be used as an approximation. The following figure illustrates this by plotting critical values of t with B. Weaver (27-May-2011) z- and t-tests ... 6 alpha = .05 (2-tailed) as a function of sample size. Notice that when n ≥ 30 (or even 20), the critical values of t are very close to 1.96, the critical value of z. Nowadays, we typically use statistical software to perform t-tests, and so we get a p-value computed using the appropriate t-distribution, regardless of the sample size. Therefore the distinction between small- and large-sample t-tests is no longer relevant, and has disappeared from most modern textbooks. The sampling distribution of the mean and z-scores When you first encountered z-scores, you were undoubtedly using them in the context of a raw score distribution. In that case, you calculated the z-score corresponding to some value of X as follows: X X XXz µµ σ σ −− = = (1.4) And, if the distribution of X was normal, or at least approximately normal, you could then take that z-score, and refer it to a table of the standard normal distribution to figure out the proportion of scores higher than X, or lower than X, etc. Because of what we learned from the central limit theorem, we are now in a position to compute a z-score as follows: B. Weaver (27-May-2011) z- and t-tests ... 7 X X X Xz n µ µ σσ − − = = (1.5) This is the same formula, but with X in place of X, and Xσ in place of Xσ . And, if the sampling distribution of X is normal, or at least approximately normal, we may then refer this value of z to the standard normal distribution, just as we did when we were using raw scores. (This is where the CLT comes in, because it tells the conditions under which the sampling distribution of X is approximately normal.) An example. Here is a (fictitious) newspaper advertisement for a program designed to increase intelligence of school children2: As an expert on IQ, you know that in the general population of children, the mean IQ = 100, and the population SD = 15 (for the WISC, at least). You also know that IQ is (approximately) normally distributed in the population. Equipped with this information, you can now address questions such as: If the n=25 children from Dundas are a random sample from the general population of children, a) What is the probability of getting a sample mean of 108 or higher? b) What is the probability of getting a sample mean of 92 or lower? c) How high would the sample mean have to be for you to say that the probability of getting a mean that high (or higher) was 0.05 (or 5%)? d) How low would the sample mean have to be for you to say that the probability of getting a mean that low (or lower) was 0.05 (or 5%)? 2 I cannot find the original source for this example, but I believe I got it from Dr. Geoff Norman, McMaster University. B. Weaver (27-May-2011) z- and t-tests ... 10 disciplines use by convention is this: The difference between X and µ must be large enough that the probability it occurred by chance (given a true null hypothesis) is 5% or less. The observed sample mean for this example was 108. As we saw earlier, this corresponds to a z- score of 2.667, and ( 2.667) 0.0038p z ≥ = . Therefore, we could reject 0H , and we would act as if the sample was drawn from a population in which mean IQ is greater than 100. Version 2: Another directional alternative hypothesis 0 1 : 100 : 100 H H µ µ ≥ < This pair of hypotheses would be used if we expected the Dr. Duntz's program to lower IQ, and if we were willing to include an increase in IQ (no matter how large) under the null hypothesis. Given a sample mean of 108, we could stop without calculating z, because the difference is in the wrong direction. That is, to have any hope of rejecting 0H , the observed difference must be in the direction specified by 1H . Version 3: A non-directional alternative hypothesis 0 1 : 100 : 100 H H µ µ = ≠ In this case, the null hypothesis states that the 25 children are a random sample from a population with mean IQ = 100, and the alternative hypothesis says they are not--but it does not specify the direction of the difference from 100. In the first directional test, we needed to have 100X > by a sufficient amount, and in the second directional test, 100X < by a sufficient amount in order to reject 0H . But in this case, with a non-directional alternative hypothesis, we may reject 0H if 100X < or if 100X > , provided the difference is large enough. Because differences in either direction can lead to rejection of 0H , we must consider both tails of the standard normal distribution when calculating the p-value--i.e., the probability of the observed outcome, or a more extreme outcome favourable to 1H . For symmetrical distributions like the standard normal, this boils down to taking the p-value for a directional (or 1-tailed) test, and doubling it. For this example, the sample mean = 108. This represents a difference of +8 from the population mean (under a true null hypothesis). Because we are interested in both tails of the distribution, we must figure out the probability of a difference of +8 or greater, or a change of -8 or greater. In other words, ( 108) ( 92) .0038 .0038 .0076p p X p X= ≥ + ≤ = + = . B. Weaver (27-May-2011) z- and t-tests ... 11 Single sample t-test (whenσ is not known) In many real-world cases of hypothesis testing, one does not know the standard deviation of the population. In such cases, it must be estimated using the sample standard deviation. That is, s (calculated with division by n-1) is used to estimate σ . Other than that, the calculations are as we saw for the z-test for a single sample--but the test statistic is called t, not z. ( )2 ( 1) where , and 1 1 X X df n X X X XX SSst s s s n nn µ = − −− = = = = − − ∑ (1.9) In equation (1.9), notice the subscript written by the t. It says "df = n-1". The "df" stands for degrees of freedom. "Degrees of freedom" can be a bit tricky to grasp, but let's see if we can make it clear. Degrees of Freedom Suppose I tell you that I have a sample of n=4 scores, and that the first three scores are 2, 3, and 5. What is the value of the 4th score? You can't tell me, given only that n = 4. It could be anything. In other words, all of the scores, including the last one, are free to vary: df = n for a sample mean. To calculate t, you must first calculate the sample standard deviation. The conceptual formula for the sample standard deviation is: ( )2 1 X X s n − = − ∑ (1.10) Suppose that the last score in my sample of 4 scores is a 6. That would make the sample mean equal to (2+3+5+6)/4 = 4. As shown in Table 2, the deviation scores for the first 3 scores are -2, -1, and 1. Table 2: Illustration of degrees of freedom for sample standard deviation Score Mean Deviation from Mean 2 4 -2 3 4 -1 5 4 1 -- -- 4x B. Weaver (27-May-2011) z- and t-tests ... 12 Using only the information shown in the final column of Table 2, you can deduce that 4x , the 4th deviation score, is equal to -2. How so? Because by definition, the sum of the deviations about the mean = 0. This is another way of saying that the mean is the exact balancing point of the distribution. In symbols: ( ) 0X X− =∑ (1.11) So, once you have n-1 of the ( )X X− deviation scores, the final deviation score is determined. That is, the first n-1 deviation scores are free to vary, but the final one is not. There are n-1 degrees of freedom whenever you calculate a sample variance (or standard deviation). The sampling distribution of t To calculate the p-value for a single sample z-test, we used the standard normal distribution. For a single sample t-test, we must use a t-distribution with n-1 degrees of freedom. As this implies, there is a whole family of t-distributions, with degrees of freedom ranging from 1 to infinity (∞ = the symbol for infinity). All t-distributions are symmetrical about 0, like the standard normal. In fact, the t-distribution with df = ∞ is identical to the standard normal distribution. But as shown in Figure 2 below, t-distributions with df < infinity have lower peaks and thicker tails than the standard normal distribution. To use the technical term for this, they are leptokurtic. (The normal distribution is said to be mesokurtic.) As a result, the critical values of t are further from 0 than the corresponding critical values of z. Putting it another way, the absolute value of critical t is greater than the absolute value of critical z for all t-distributions with df < ∞ : For , critical criticaldf t z< ∞ > (1.12) Probability density functions 6420-2-4-6 .5 .4 .3 .2 .1 0.0 Standard Normal Distribution t-distribution with df = 10 t-distribution with df = 2 B. Weaver (27-May-2011) z- and t-tests ... 15 One-Sample Test 1.387 7 .208 1.25 -.88 3.38HEIGHT t df Sig. (2-tailed) Mean Difference Lower Upper 95% Confidence Interval of the Difference Test Value = 63 Paired (or related samples) t-test Another common application of the t-test occurs when you have either 2 scores for each person (e.g., before and after), or when you have matched pairs of scores (e.g., husband and wife pairs, or twin pairs). The paired t-test may be used in this case, given that its assumptions are met adequately. (More on the assumptions of the various t-tests later). Quite simply, the paired t-test is just a single-sample t-test performed on the difference scores. That is, for each matched pair, compute a difference score. Whether you subtract 1 from 2 or vice versa does not matter, so long as you do it the same way for each pair. Then perform a single-sample t-test on those differences. The null hypothesis for this test is that the difference scores are a random sample from a population in which the mean difference has some value which you specify. Often, that value is zero--but it need not be. For example, suppose you found some old research which reported that on average, husbands were 5 inches taller than their wives. If you wished to test the null hypothesis that the difference is still 5 inches today (despite the overall increase in height), your null hypothesis would state that your sample of difference scores (from husband/wife pairs) is a random sample from a population in which the mean difference = 5 inches. In the equations for the paired t-test, X is often replaced with D , which stands for the mean difference. D D DD D Dt ss n µ µ− − = = (1.16) 0 D where the (sample) mean of the difference scores = the mean difference in the population, given a true {often =0, but not always} s the sample SD of the diff D D D Hµ µ = = erence scores (with division by n-1) the number of matched pairs; the number of individuals = 2 the SE of the mean difference 1 D n n s df n = = = − B. Weaver (27-May-2011) z- and t-tests ... 16 Example of paired t-test This example is from the Study Guide to Pagano’s book Understanding Statistics in The Behavioral Sciences (3rd Edition). A political candidate wishes to determine if endorsing increased social spending is likely to affect her standing in the polls. She has access to data on the popularity of several other candidates who have endorsed increases spending. The data was available both before and after the candidates announced their positions on the issue [see Table 4]. Table 4: Data for paired t-test example. Popularity Ratings Candidate Before After Difference 1 42 43 1 2 41 45 4 3 50 56 6 4 52 54 2 5 58 65 7 6 32 29 -3 7 39 46 7 8 42 48 6 9 48 47 -1 10 47 53 6 I entered these BEFORE and AFTER scores into SPSS, and performed a paired t-test as follows: T-TEST PAIRS= after WITH before (PAIRED) /CRITERIA=CIN(.95) /MISSING=ANALYSIS. This yielded the following output. Paired Samples Statistics 48.60 10 9.489 3.001 45.10 10 7.415 2.345 AFTER BEFORE Pair 1 Mean N Std. Deviation Std. Error Mean Paired Samples Correlations 10 .940 .000AFTER & BEFOREPair 1 N Correlation Sig. B. Weaver (27-May-2011) z- and t-tests ... 17 Paired Samples Test 3.50 3.567 1.128 .95 6.05 3.103 9 .013AFTER - BEFOREPair 1 Mean Std. Deviation Std. Error Mean Lower Upper 95% Confidence Interval of the Difference Paired Differences t df Sig. (2-tailed) The first output table gives descriptive information on the BEORE and AFTER popularity ratings, and shows that the mean is higher after politicians have endorsed increased spending. The second output table gives the Pearson correlation (r) between the BEFORE and AFTER scores. (The correlation coefficient is measure of the direction and strength of the linear relationship between two variables. I will say more about it in a later section called Testing the significance of Pearson r.) The final output table shows descriptive statistics for the AFTER – BEFORE difference scores, and the t-value with it’s degrees of freedom and p-value. The null hypothesis for this test states that the mean difference in the population is zero. In other words, endorsing increased social spending has no effect on popularity ratings in the population from which we have sampled. If that is true, the probability of seeing a difference of 3.5 points or more is 0.013 (the p-value). Therefore, the politician would likely reject the null hypothesis, and would endorse increased social spending. The same example done using a one-sample t-test Earlier, I said that the paired t-test is really just a single-sample t-test done on difference scores. Let’s demonstrate for ourselves that this is really so. Here are the BEFORE, AFTER, and DIFF scores from my SPSS file. (I computed the DIFF scores using “compute diff = after – before.”) BEFORE AFTER DIFF 42 43 1 41 45 4 50 56 6 52 54 2 58 65 7 32 29 -3 39 46 7 42 48 6 48 47 -1 47 53 6 Number of cases read: 10 Number of cases listed: 10 I then ran a single-sample t-test on the difference scores using the following syntax: B. Weaver (27-May-2011) z- and t-tests ... 20 * the mean difference = 5 inches. * Suppose we had found that the difference in height between * husbands and wives really had decreased dramatically. In * that case, we might have found a mean difference close to 0, * which might have allowed us to reject H0. An example of * this scenario follows below. COUPLE HUSBAND WIFE DIFF 1.00 68.78 75.34 -6.56 2.00 66.09 67.57 -1.48 3.00 71.99 69.16 2.83 4.00 74.51 69.17 5.34 5.00 67.31 68.11 -.80 6.00 64.05 68.62 -4.57 7.00 66.77 70.31 -3.54 8.00 75.33 72.92 2.41 9.00 74.11 73.10 1.01 10.00 75.71 62.66 13.05 11.00 69.01 76.83 -7.82 12.00 67.86 63.23 4.63 13.00 66.61 72.01 -5.40 14.00 68.64 76.10 -7.46 15.00 78.74 68.53 10.21 16.00 71.66 62.65 9.01 17.00 73.43 70.46 2.97 18.00 70.39 79.99 -9.60 19.00 70.15 64.27 5.88 20.00 71.53 69.07 2.46 21.00 57.49 81.21 -23.72 22.00 68.95 69.92 -.97 23.00 77.60 70.70 6.90 24.00 72.36 67.79 4.57 25.00 72.70 67.50 5.20 Number of cases read: 25 Number of cases listed: 25 T-TEST /TESTVAL=5 /* H0: Mean difference = 5 inches (as in past) */ /MISSING=ANALYSIS /VARIABLES=diff /* perform analysis on the difference scores */ /CRITERIA=CIN (.95) . T-Test One-Sample Statistics 25 .1820 7.76116 1.55223DIFF N Mean Std. Deviation Std. Error Mean B. Weaver (27-May-2011) z- and t-tests ... 21 One-Sample Test -3.104 24 .005 -4.8180 -8.0216 -1.6144DIFF t df Sig. (2-tailed) Mean Difference Lower Upper 95% Confidence Interval of the Difference Test Value = 5 * The mean difference is about 0.2 inches (very close to 0). * Yet, because H0 stated that the mean difference = 5, we * are able to reject H0 (p = 0.005). Unpaired (or independent samples) t-test Another common form of the t-test may be used if you have 2 independent samples (or groups). The formula for this version of the test is given in equation (1.17). ( ) ( ) 1 2 1 2 1 2 X X X X t s µ µ − − − − = (1.17) The left side of the numerator, 1 2( )X X− , is the difference between the means of two (independent) samples, or the difference between group means. The right side of the numerator, 1 2( )µ µ− , is the difference between the corresponding population means, assuming that 0H is true. The denominator is the standard error of the difference between two independent means. It is calculated as follows: 1 2 2 2 1 2 pooled pooled X X s s s n n−   = +     (1.18) Within Groups2 1 2 1 2 Within Groups 2 1 2 2 1 where pooled variance estimate = 2 ( ) for Group 1 ( ) for Group 2 sample size pooled SSSS SSs n n df SS X X SS X X n + = = + − = − = − = ∑ ∑ 2 1 2 for Group 1 sample size for Group 2 2 n df n n = = + − B. Weaver (27-May-2011) z- and t-tests ... 22 As indicated above, the null hypothesis for this test specifies a value for 1 2( )µ µ− , the difference between the population means. More often than not, 0H specifies that 1 2( )µ µ− = 0. For that reason, most textbooks omit 1 2( )µ µ− from the numerator of the formula, and show it like this: 1 2 1 2 1 2 1 2 1 1 2 unpaired X Xt SS SS n n n n − =   + +  + −   (1.19) I prefer to include 1 2( )µ µ− for two reasons. First, it reminds me that the null hypothesis can specify a non-zero difference between the population means. Second, it reminds me that all t- tests have a common format, which I will describe in a section to follow. The unpaired (or independent samples) t-test has 1 2 2df n n= + − . As discussed under the Single-sample t-test, one degree of freedom is lost whenever you calculate a sum of squares (SS). To perform an unpaired t-test, we must first calculate both 1SS and 2SS , so two degrees of freedom are lost. Example of unpaired t-test The following example is from Understanding Statistics in the Behavioral Sciences (3rd Ed), by Robert R. Pagano. A nurse was hired by a governmental ecology agency to investigate the impact of a lead smelter on the level of lead in the blood of children living near the smelter. Ten children were chosen at random from those living near the smelter. A comparison group of 7 children was randomly selected from those living in an area relatively free from possible lead pollution. Blood samples were taken from the children, and lead levels determined. The following are the results (scores are in micrograms of lead per 100 milliliters of blood): Lead Levels Children Living Near Smelter Children Living in Unpolluted Area 18 9 16 13 21 8 14 15 17 17 19 12 22 11 24 15 18 Using 20.01 tailedα −= , what do you conclude? B. Weaver (27-May-2011) z- and t-tests ... 25 You may recall the following output from the first example of a paired t-test (with BEFORE and AFTER scores: Paired Samples Correlations 10 .940 .000AFTER & BEFOREPair 1 N Correlation Sig. The number in the “Correlation” column is a Pearson r. It indicates that there is a very strong and positive linear relationship between BEFORE and AFTER scores for the 10 politicians. The p-value (Sig.) is for a t-test of the null hypothesis that there is no linear relationship between BEFORE and AFTER scores in the population of politicians from which the sample was drawn. The p-value indicates the probability of observing a correlation of 0.94 or greater (or –0.94 or less, because it’s two-tailed) if the null hypothesis is true. General format for all z- and t-tests You may have noticed that all of the z- and t-tests we have looked at have a common format. The formula always has 3 components, as shown below: 0statistic - parameter| or statistic H z t SE = (1.21) The numerator always has some statistic (e.g., a sample mean, or the difference between two independent sample means) minus the value of the corresponding parameter, given that 0H is true. The denominator is the standard error of the statistic in the numerator. If the population standard deviation is known (and used to calculate the standard error), the test statistic is z. if the population standard deviation is not known, it must be estimated with the sample standard deviation, and the test statistic is t with some number of degrees of freedom. Table 5 lists the 3 components of the formula for the t-tests we have considered in this chapter. For all of these tests but the first, the null hypothesis most often specifies that the value of the parameter is equal to zero. But there may be exceptions. Similarity of standard errors for single-sample and unpaired t-tests People often fail to see any connection between the formula for the standard error of the mean, and the standard error of the difference between two independent means. Nevertheless, the two formulae are very similar, as shown below. 2 2 X s s ss nn n = = = (1.22) Given how Xs is expressed in the preceding Equation (1.22), it is clear that 1 2X Xs − is a fairly straightforward extension of Xs (see Table 5). B. Weaver (27-May-2011) z- and t-tests ... 26 Table 5: The 3 components of the t-formula for t-tests described in this chapter. Name of Test Statistic Parameter|H0 SE of the Statistic Single-sample t-test X Xµ X ss n = Paired t-test D Dµ D D s s n = Independent samples t-test 1 2( )X X− 1 2( )µ µ− 1 2 2 2 1 2 pooled pooled X X s s s n n−   = +     Test of significance of Pearson r r ρ 21 2r rs n − = − Assumptions of t-tests All t-ratios are of the form “t = (statistic – parameter under a true 0H ) / SE of the statistic”. The key requirement (or assumption) for any t-test is that the statistic in the numerator must have a sampling distribution that is normal. This will be the case if the populations from which you have sampled are normal. If the populations are not normal, the sampling distribution may still be approximately normal, provided the sample sizes are large enough. (See the discussion of the Central Limit Theorem earlier in this chapter.) The assumptions for the individual t-tests we have considered are given in Table 6. A little more about assumptions for t-tests Many introductory statistics textbooks list the following key assumptions for t-tests: 1. The data must be sampled from a normally distributed population (or populations in case of a two-sample test). 2. For two-sample tests, the two populations must have equal variances. 3. Each score (or difference score for the paired t-test) must be independent of all other scores. The third of these is by far the most important assumption. The first two are much less important than many people realize. B. Weaver (27-May-2011) z- and t-tests ... 27 Table 6: Assumptions of various t-tests. Type of t-test Assumptions Single-sample • You have a single sample of scores • All scores are independent of each other • The sampling distribution of X is normal (the Central Limit Theorem tells you when this will be the case) Paired t-test • You have matched pairs of scores (e.g., two measures per person, or matched pairs of individuals, such as husband and wife) • Each pair of scores is independent of every other pair • The sampling distribution of D is normal (see Central Limit Theorem) Independent samples t- test • You have two independent samples of scores—i.e., there is no basis for pairing of scores in sample 1 with those in sample 2 • All scores within a sample are independent of all other scores within that sample • The sampling distribution of 1 2X X− is normal • The populations from which you sampled have equal variances Test for significance of Pearson r • The sampling distribution of r is normal • 0H states that the correlation in the population = 0 Let’s stop and think about this for a moment in the context of an unpaired t-test. A normal distribution ranges from minus infinity to positive infinity. So in truth, none of us who are dealing with real data ever sample from normally distributed populations. Likewise, it is a virtual impossibility for two populations (at least of the sort that would interest us as researchers) to have exactly equal variances. The upshot is that we never really meet the assumptions of normality and homogeneity of variance. Therefore, what the textbooks ought to say is that if one was able to sample from two normally distributed populations with exactly equal variances (and with each score being independent of all others), then the unpaired t-test would be an exact test. That is, the sampling distribution of t under a true null hypothesis would be given exactly by the t-distribution with df = n1 + n2 – 2. Because we can never truly meet the assumptions of normality and homogeneity of variance, t- tests on real data are approximate tests. In other words, the sampling distribution of t under a true null hypothesis is approximated by the t-distribution with df = n1 + n2 – 2. So, rather than getting ourselves worked into a lather over normality and homogeneity of variance (which we know are not true), we ought to instead concern ourselves with the conditions under which the approximation is good enough to use (much like we do when using other approximate tests, such as Pearson’s chi-square). Two guidelines B. Weaver (27-May-2011) z- and t-tests ... 30 Appendix: The Unequal Variances t-test in SPSS Denominator for the independent groups t-test, equal variances assumed 2 2 1 1 2 2 1 2 1 2 _ 1 2 1 2 2 1 2 2 2 1 2 ( 1) ( 1) 1 1SE 2 1 1 2 1 1 within groups pooled pooled pooled n s n s n n n n SS n n n n s n n s s n n  − + − = × + + −     = × + + −     = × +      = +     (A-1) Denominator for independent t-test, unequal variances assumed 2 2 1 2 1 2 SE s s n n   = +    (A-2) • Suppose the two sample sizes are quite different • If you use the equal variance version of the test, the larger sample contributes more to the pooled variance estimate (see first line of equation A-1) • But if you use the unequal variance version, both samples contribute equally to the pooled variance estimate Sampling distribution of t • For the equal variance version of the test, the sampling distribution of the t-value you calculate (under a true null hypothesis) is the t-distribution with df = n1 + n2 – 2 • For the unequal variance version, there is some dispute among statisticians about what is the appropriate sampling distribution of t • There is some general agreement that it is distributed as t with df < n1 + n2 – 2 B. Weaver (27-May-2011) z- and t-tests ... 31 • There have been a few attempts to define exactly how much less. • Some books (e.g., Biostatistics: The Bare Essentials describe a method that involves use of the harmonic mean. • SPSS uses the Welch-Satterthwaite solution. It calcuates the adjusted df as follows: 22 2 1 2 1 2 2 22 2 1 2 1 2 1 21 1 s s n n df s s n n n n   +   =            + − − (A-3) For this course, you don’t need to know the details of how the df are computed. It is sufficient to know that when the variances are too heterogeneous, you should use the unequal variances version of the t-test.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved