Download Testing the Two Sample Means with t-Test - Business Statistics - Handout and more Exercises Business Statistics in PDF only on Docsity! Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-111 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 1 of 11 11 Testing the two sample means with the t-‐test 11.1 Paired differences: Dependent samples Many studies investigate systems where there are measurements taken before and after. Usually there is an experimental treatment or process between the two measurements. A typical such system would be a pre-‐test and a post-‐test. Inbetween the pre-‐test and the post-‐test would typically be an educational or training event. One could examine each student's score on the pre-‐test and the post-‐test. Even if everyone did better on the post-‐ test, one would have to prove that the difference was statistically significant and not just a random event. These studies are called "paired t-‐tests" or "inferences from matched pairs". Each element in the sample is considered as a pair of scores. The null hypothesis would be that the average difference for all the pairs is zero: there is no difference. For a confidence interval test, the confidence interval for the mean differences would include zero if there is no statistically significant difference. Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-111 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 2 of 11 If the difference for each data pair is referred to as d, then the mean difference could be written d. The hypothesis test is whether this mean difference d could come from a population with a mean difference μd equal to zero (the null hypothesis). If the mean difference d could not come from a population with a mean difference μd equal to zero, then the change is statistically significant. In the diagram above the mean difference μd is equal to μbefore − μ after. Confidence interval test Consider the paired data below. The first column are female body fat measurements from the beginning of a term. The second column are the body fat measurements sixteen weeks later. The third column is the difference d for each pair. Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-111 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 5 of 11 p-‐value 0.14 Maximum confidence level c 0.86 The p-‐value confirms the confidence interval analysis, we fail to reject the null hypothesis. At a 5% risk of a type I error we would fail to reject the null hypothesis. We can have a maximum confidence of only 86%, not the 95% standard typically employed. Some would argue that our concern for limited the risk of rejecting a true null hypothesis (a type I error) has led to a higher risk of failing to reject a false null hypothesis (a type II error). Some would argue that because of other known factors -‐ the high rates of diabetes, high blood pressure, heart disease, and other non-‐communicable diseases -‐ one should accept a higher risk of a type I error. The average shows an increase in body fat. Given the short time frame (a single term), some might argue for reacting to this number and intervening to reduce body fat. They would argue that given other information about this population's propensity towards obesity, 86% is "good enough" to show a developing problem. Ultimately these debates cannot be resolved by statisticians. 11.2 T-‐test for means for independent samples One of the more common situations is when one is seeking to compare two independent samples to determine if the means for each sample are statistically significantly different. In this case the samples may differ in sample size n, sample mean, and sample standard deviation. In this text the two samples are refered to as the x data and the y data. The sample size for the x data is nx. The sample mean for the x data is x. The sample standard deviation for the x data is sx. For the y data, the sample size is ny, the sample mean is y, and the sample standard deviation is sy. Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-111 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 6 of 11 Two possibilities exist. Either the two samples come from the same population and the population mean difference is statistically zero. Or the two samples come from different populations where the population mean difference is statistically not zero. Confidence Interval test Each sample has a range of probable values for their population mean μ. If the confidence interval for the sample mean differences includes zero, then there is no statistically significant difference in the means between the two samples. If the confidence interval does not include zero, then the difference in the means is statistically significant. Note that the margin of error E for the mean difference is still tc multiplied by the standard error. The standard error formula changes to account for the differences in sample size and standard deviation. Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-111 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 7 of 11 Thus the margin of error E can be calculated using: For the degrees of freedom in the t-‐critical tc calculation use n − 1 for the sample with the smaller size. This produces a conservative estimate of the degrees of freedom. Advanced statistical software uses another more complex formula to determine the degrees of freedom. For the degrees of freedom in the t-‐critical tc calculation use n − 1 for the sample with the smaller size. This produces a conservative estimate of the degrees of freedom. Advanced statistical software uses another more complex formula to determine the degrees of freedom. The confidence interval is calculated from: (x − y) − E < (μx − μy) < (x − y) + E Where x is the sample mean of one data set and y is the sample mean of the other data. Some texts use the symbol xd for this difference and μd for the hypothesized difference in the population means. This leads to the more familiar looking formulation: xd − E < μd < xd + E Where: μd = μx − μy and xd = x − y Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-111 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 10 of 11 As noted above, spreadsheets provide a function to calculate p-‐values. If the the p-‐value is less than your chosen risk of a type I error α then the difference is significant. The function takes as inputs one the data for one if the two samples (data_range_x), the data for the other sample (data_range_y), the number of tails, and a final variable that specifies the type of test. A t-‐test for means from independent samples is test type number three. =TTEST(data_range_1,data_range_2,number of tails,3) For the above data, the p-‐value is given in the following table: p-‐value 0.02 Maximum confidence level c 0.98 The TTEST function does not use the smaller sample size to determine the degrees of freedom. The TTEST function uses a different formula that calculates a larger number of degrees of freedom, which has the effect of reducing the p-‐value. Thus the confidence interval result could produce a failure to reject the null hypothesis while the TTEST could produce a rejection of the null hypothesis. This only occurs when the p-‐value is close to your chosen α. [Optional material!] If you have doubts and want to explore further, take the difference of the means and divide by the standard error to obtain the t-‐ statistic t. Then use the TDIST function to determine the p-‐value, using the smaller sample size − 1 to calculate the degrees of freedom. Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-111 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 11 of 11 Note that (μx − μy) is presumed to be equal to zero. Thus the formula is the difference of the means divided by the standard error (given further above). t = xd ÷ (standard error) Once t is calculated, use the TDIST function to determine the p-‐value. =TDIST(ABS(t),n−1,2) Technical side-‐note: TTEST type three does not presume that the population standard deviations σx and σy are equal. This is in keeping with modern practice and reality. TTEST type two presumes σx = σy. One rarely knows either value, and if one did know those values, why would not they also know the actual population means? With the true population means in hand, then any difference would be significant.