Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Hypothesis Testing: T-Test & Confidence Intervals for Mean Difference, Slides of Data Analysis & Statistical Methods

Hypothesis TestingStatistical InferenceConfidence IntervalsTwo-Sample t-Test

How to test the hypothesis that the means of two normal distributions are equal or not, using the two-sample t-test. It also derives the confidence intervals for the difference of means when the variances are equal and not equal. The document assumes the reader has a basic understanding of normal distributions, random variables, and hypothesis testing.

What you will learn

  • What is the alternative hypothesis in the two-sample t-test?
  • How to calculate the test statistic T and T' in the two-sample t-test?
  • What is the null hypothesis in the two-sample t-test?

Typology: Slides

2021/2022

Uploaded on 09/27/2022

rossi46
rossi46 🇬🇧

4.5

(10)

95 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Hypothesis Testing: T-Test & Confidence Intervals for Mean Difference and more Slides Data Analysis & Statistical Methods in PDF only on Docsity! Math 541: Statistical Theory II Hypothesis Testing Based on Two Samples Instructor: Songfeng Zheng It is quite common to compare the properties of two distributions, for example, we would like to see which distribution has a higher mean, or which distribution has a higher variance, etc. This note describes how to conduct hypothesis testing regarding the mean and variance when the two distributions under consideration are normal. 1 Comparing the Means of Two Normal Distributions Let us consider a problem in which random samples are available from two normal distribu- tions. The problem is to determine whether the means of the two distributions are equal. Specifically, we assume that the random variables X1, · · · , Xm form a random sample of size m from a normal distribution for which both the mean µ1 and the variance σ2 1 are unknown; and that the variables Y1, · · · , Yn form another independent random sample of size n from another normal distribution for which both the mean µ2 and variance σ2 2 are unknown. In this section, we will discuss several frequently seen cases. 1.1 The Case σ2 1 = σ2 2 = σ2 We shall assume at this moment that the variance σ2 1 = σ2 2 = σ2 is the same for both distributions, even though the exact values are unknown. Suppose it is desired to test the following hypotheses at a specified level of significance α (0 < α < 1): H0 : µ1 = µ2 vs. Ha : µ1 6= µ2 Intuitively, it makes sense to reject H0 if X̄m− Ȳn is very different from zero, where X̄m and Ȳn are the means of the two samples, respectively. In spirit of the t test, we define S2 X = m∑ i=1 (Xi − X̄m)2 and S2 Y = n∑ j=1 (Yj − Ȳn)2 Then the test statistic we shall use is T = (m + n− 2)1/2(X̄m − Ȳn) ( 1 m + 1 n )1/2 (S2 X + S2 Y )1/2 1 2 Next, let us derive the distribution of T . For each pair of values µ1 and µ2, and for each σ2, the sample mean X̄m has a normal distribution with mean µ1 and variance σ2/m, i.e. X̄m ∼ N(µ1, σ 2/m) similarly Ȳn ∼ N(µ2, σ 2/n) because both samples are from normal distribution. Furthermore, X̄m and Ȳn are indepen- dent. It follows that the difference X̄m − Ȳn has a normal distribution with mean µ1 − µ2 and variance [(1/m) + (1/n)]σ2, i.e., X̄m − Ȳn ∼ N ( µ1 − µ2, [(1/m) + (1/n)]σ2 ) . Therefore, when the null hypothesis is true, i.e. µ1 = µ2, the following random variable Z will have a standard normal distribution: Z = X̄m − Ȳn( 1 m + 1 n )1/2 σ ∼ N(0, 1) Also, for all values of µ1, µ2, and σ2, the random variable S2 X/σ2 has a χ2 distribution with m − 1 degrees of freedom, and S2 Y /σ2 has a χ2 distribution with n − 1 degrees of freedom, and the two random variables are independent. By the additive property of χ2 distribution, the following random variable W has a χ2 distribution with m + n− 2 degrees of freedom: W = S2 X σ2 + S2 Y σ2 = S2 X + S2 Y σ2 ∼ χ2 m+n−2 Furthermore, the four random variables X̄m, Ȳn, S2 X , and S2 Y are independent. This is because: (i) X̄m and S2 X are functions of X1, · · · , Xm; while Ȳn and S2 Y are functions of Y1, · · · , Yn; and we know that X1, · · · , Xm and Y1, · · · , Yn are independent. Therefore {X̄m, S2 X} and {Ȳn, S 2 Y } are independent. (ii) By the property of sample mean and sample variance, X̄m and S2 X are independent, and Ȳn and S2 Y are independent. It follows that Z are W are independent. When the null hypothesis is true, i.e. µ1 = µ2, Z ∼ N(0, 1), and W ∼ χ2 m+n−2. By the definition of t-distribution, we have the following T = (m + n− 2)1/2(X̄m − Ȳn) ( 1 m + 1 n )1/2 (S2 X + S2 Y )1/2 = X̄m−Ȳn ( 1 m + 1 n) 1/2 σ [ (S2 X+S2 Y )/σ2 m+n−2 ]1/2 = Z [W/(m + n− 2)]1/2 ∼ tm+n−2 Thus, to test the hypotheses H0 : µ1 = µ2 vs. Ha : µ1 6= µ2, 5 Similarly, it can be shown that when the null hypothesis H0 is true, T ∼ tm+n−2. We can make our decision based on the value of the test statistic and the null distribution tm+n−2. When the variances of the two normal distributions are not equal, but their relation is known as σ2 2 = kσ2 1, we can define the test statistic as T ′ = (m + n− 2)1/2(X̄m − Ȳn − λ) ( 1 m + k n )1/2 (S2 X + S2 Y k )1/2 . It can be shown that when the null hypothesis H0 is true, T ′ ∼ tm+n−2. We can make our decision based on the value of the test statistic and the null distribution tm+n−2. Similarly, we can define the test procedure for the one-sided hypotheses. 1.4 Confidence Intervals for µ1 − µ2 As we studied before, the confidence intervals and the hypothesis testing process are equiv- alent. Therefore, from the testing process, we can construct the 1 − α confidence intervals for the difference of means of two normal distributions. Firstly, under the case, σ2 1 = σ2 2, the 1− α confidence interval for µ1 − µ2 is (X̄m − Ȳn)± t ( 1− α 2 ,m + n− 2 ) √ S2 X + S2 Y m + n− 2 √ 1 m + 1 n . Indeed, in this case h(X1, · · · , Xm, Y1, · · · , Yn, µ1, µ2) = (m + n− 2)1/2[(X̄m − Ȳn)− (µ1 − µ2)]( 1 m + 1 n )1/2 (S2 X + S2 Y )1/2 is the pivot to construct the confidence interval. Similarly, under the case, σ2 2 = kσ2 1, the 1− α confidence interval for µ1 − µ2 is (X̄m − Ȳn)± t ( 1− α 2 ,m + n− 2 ) √ S2 X + S2 Y /k m + n− 2 √ 1 m + k n . and, in this case h(X1, · · · , Xm, Y1, · · · , Yn, µ1, µ2) = (m + n− 2)1/2[(X̄m − Ȳn)− (µ1 − µ2)]( 1 m + k n )1/2 (S2 X + S2 Y /k)1/2 is the pivot to construct the confidence interval. 6 2 Comparing the Variances of Two Normal Distribu- tions Assume that the random variables X1, · · · , Xm form a random sample of size m from a normal distribution for which both the mean µ1 and the variance σ2 1 are unknown; and that the variables Y1, · · · , Yn form another independent random sample of size n from another normal distribution for which both the mean µ2 and variance σ2 2 are unknown. In this section, we will develop testing procedure to compare the two variances. 2.1 F Distribution We first define and discuss the properties of a probability distribution family, called the F distribution (Fisher-Snedecor distribution). This distribution arises in many important problems of testing hypotheses in which two or more normal distributions are to be compared on the basis of random samples from each of the distribution. Consider two independent random variables Y and W , such that Y has a χ2 distribution with m degrees of freedom, and W has a χ2 distribution with n degrees of freedom, where m and n are given positive integers. Define a new random variable X as follows: X = Y/m W/n = nY mW . Then the distribution of X is called an F distribution with m and n degrees of freedom. It can be proved that the if a random variable X has an F distribution with m and n degrees of freedom, i.e. X ∼ F (m,n), then its p.d.f. is as follows: f(x) =    Γ(m+n 2 )mm/2nn/2 Γ(m 2 )Γ(n 2 ) x(m/2)−1 (mx+n)(m+n)/2 , x > 0 0, x ≤ 0 Please note that when we speak of an F distribution with m and n degrees of freedom, the order in which the numbers m and n are given is important. As we can see from the definition or the p.d.f. of X, when m 6= n, the F distribution with m and n degrees of freedom and the F distribution with n and m degrees of freedom are two different distributions. In fact, from its definition, it is straightforward to see that if X ∼ F (m,n), then its reciprocal 1/X will have an F (n,m) distribution. From the definitions of t distribution and F distribution, it is easy to see that if X ∼ t(n), then X2 ∼ F (1, n). Indeed, since X ∼ t(n), then X could be written as X = Z√ Y/n , where Z ∼ N(0, 1) and Y ∼ χ2 n. Then X2 = Z2 Y/n = Z2/1 Y/n , where Z2 = Z2 ∼ χ2 1, by the definition of F random variable, X2 ∼ F (1, n). 7 2.2 Comparing the Variances of Two Normal Distributions Now let us go back to the problem of comparing the variances of the two normal distributions proposed at the beginning of this section. Suppose that finally the following hypotheses are to be tested at given level of significance α, H0 : σ2 1 = σ2 2 vs. Ha : σ2 1 6= σ2 2. As in the previous section, we define S2 X = m∑ i=1 (Xi − X̄m)2 and S2 Y = n∑ j=1 (Yj − Ȳn)2. As proved before, S2 X/(m− 1) and S2 Y /(n− 1) are unbiased estimators of σ2 1 and σ2 2, respec- tively. Intuitively, it makes sense that if the null hypothesis was true, the ratio V = S2 X/(m− 1) S2 Y /(n− 1) should be close to 1. Therefore, we would reject the null hypothesis if the test statistic V is too small or too big. Under the null hypothesis, σ2 1 = σ2 2, V = S2 X/(m− 1) S2 Y /(n− 1) = [S2 X/σ2 1]/(m− 1) [S2 Y /σ2 2]/(n− 1) . It is well-known that S2 X σ2 1 ∼ χ2 m−1 and S2 Y σ2 2 ∼ χ2 n−1, by the definition of F distribution, V ∼ F (m− 1, n− 1). By our intuitive decision rule, we would reject H0 either V ≤ c1 or V ≥ c2, where c1 and c2 are two constants such that P (V ≤ c1) + P (V ≥ c2) = α. The most convenient choice of c1 and c2 is the one that makes P (V ≤ c1) = P (V ≥ c2) = α/2. That is, choose c1 = F (α/2,m − 1, n − 1), the α/2 percentile of F (m − 1, n − 1), and choose c2 = F (1− α/2,m− 1, n− 1), the 1− α/2 percentile of F (m− 1, n− 1). This procedure can also be generalized to test the hypotheses H0 : σ2 1 = rσ2 2 vs. Ha : σ2 1 6= rσ2 2, where r is a positive constant, and we notice that when r = 1 this will be reduced to the previous case. Under this new null hypothesis, the value of V would be close to r, therefore, V/r would be close to 1. Let us have a close look at the statistic V/r: V r = S2 X/(m− 1) rS2 Y /(n− 1) = [ S2 X σ2 1 ] /(m− 1) r [ S2 Y σ2 1 ] /(n− 1) = [S2 X/σ2 1]/(m− 1) [S2 Y /σ2 2]/(n− 1) .
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved