Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Inferencing Between Two Samples: Z Tests & Confidence Intervals for Difference of Means - , Study notes of Data Analysis & Statistical Methods

Purdue University Data Analysis & Statistical Methods

Prof. Mihails Levins

A lecture note from dr. Levine's statistics 511 class at purdue university, fall 2006. It covers the topic of inferencing between two samples, specifically focusing on z tests and confidence intervals for the difference of two population means. How to calculate the natural estimator and standard deviation of the difference between two sample means, and derives the z distribution of the test statistic under the assumption of equal variances. It also discusses the rejection regions for upper-tailed, lower-tailed, and two-tailed tests, as well as the calculation of type ii error and the choice of sample size.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-u7q 🇺🇸

10 documents

1 / 34

Partial preview of the text

Download Inferencing Between Two Samples: Z Tests & Confidence Intervals for Difference of Means - and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 Lecture 18: Inferences Based on Two Samples Devore: Section 9.1-9.2 Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 z Tests and Confidence Intervals for a Difference Between Two Population Means • An example of such hypothesis would be µ1 − µ2 = 0 or σ1 > σ2. It may also be appropriate to estimate µ1 − µ2 and compute its 100(1 − α)% confidence interval • Assumptions 1. X1, . . . , Xm is a random sample from a population with mean µ1 and variance σ 2 1 2. Y1, . . . , Yn is a random sample from a population with mean µ2 and variance σ 2 2 3. The X and Y samples are independent of one another Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 The Case of Normal Populations with Known Variances • As before, this assumption is a simplification. • Under this assumption, Z = X̄ − Ȳ − (µ1 − µ2)√ σ21 m + σ22 n (1) has a standard normal distribution • The null hypothesis µ1 − µ2 = 0 is a special case of the more general µ1 − µ2 = ∆0. Replacing µ1 − µ2 in (1) with ∆0 gives us a test statistic. Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • The following summary considers all possible types of alternatives: 1. Ha : µ1 − µ2 > ∆0 has the rejection region z ≥ zα 2. Ha : µ1 − µ2 < ∆0 has the rejection region z ≤ −zα 3. Ha : µ1 − µ2 6= ∆0 has the rejection region z ≥ zα/2 or z ≤ −zα/2. Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 Example • Consider Ex. 9.1 in Devore. Sample sizes are m = 20 and n = 25. Note that m 6= n...it is not important now but will be later... • Note that the normality suggestion is based on some exploratory data analysis • The hypotheses are H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 6= 0 • The test statistic is z = x̄ − ȳ√ σ21 m + σ22 n Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • Similar results can be easily obtained for the other two possible alternatives. In particular, if Ha : µ1 − µ2 < ∆0, we have β(∆ ′ ) = 1 − Φ ( −zα − ∆ ′ − ∆0 σ ) • If µ1 − µ2 6= ∆0, the probability of Type II Error is Φ ( zα/2 − ∆ ′ − ∆0 σ ) − Φ ( −zα/2 − ∆ ′ − ∆0 σ ) Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 Example • Consider Example 9.3 from Devore. Suppose that the probability of detecting a difference .5 between the two means should be .90. Can the .01 level test with m = 20 and n = 25 support this? • For a two-sample test we have β(5) = Φ ( 2.58 − 5 − 0 1.34 ) −Φ ( −2.58 − 5 − 0 1.34 ) = .1251 • Because the rejection region is symmetric, we have β(−5) = β(5), and, therefore, the probability of detecting a difference of .5 is 1 − β(5) = .8749. • We can conclude that slightly larger sample sizes are needed. Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • To determine a sample size that satisfies P ( Type II Error when µ1 − µ2 = ∆′) = β we need to solve σ21 m + σ22 n = (∆ ′ − ∆0)2 (zα + zβ)2 • For two equal sample sizes this yields m = n = (σ21 + σ 2 2)(zα + zβ) 2 (∆′ − ∆0)2 Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 Example • A company claims that its light bulbs are superior to those of its main competitor. If a study showed that a sample of n1 = 40 of its bulbs has a mean lifetime of 647 hours of continuous use with a standard deviation of 27 hours , while a sample of n2 = 40 bulbs made by its main competitor had a mean lifetime of 638 hours of continuous use with a standard deviation of 31 hours, does this substantiate the claim at the 0.05 level of significance? Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • H0 : µ1 − µ2 = 0 and Ha : µ1 − µ2 > 0 • Reject H0 if Z > 1.645 • Calculations: z = 647 − 638√ 272 40 + 31 2 40 = 1.38 • Decision: H0 cannot be rejected at α = 0.05; the p-value is 0.0838 Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 Confidence intervals for µ1 − µ2 • Since the test statistic Z that we just described is exactly normal when σ21 and σ 2 2 are known, P  −zα/2 < Z = X̄ − Ȳ − (µ1 − µ2)√ σ21 m + σ22 n < zα/2   = 1−α • The 100(1 − α)% CI is easy to derive from this probability statement; it is x̄ − ȳ ± zα/2σX̄−Ȳ where σX̄−Ȳ is a square root expression. Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • The point estimate of µB − µA is x̄B − x̄A = 42 − 36 = 6. For α = 0.04, we find the critical value z.02 = 2.05. • Thus, the confidence interval is 6 ± 2.05 √ 64 36 + 36 50 = (3.43, 3.87) Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 The Two-Sample t-test • Assumptions: Both populations are normal, so that X1, . . . , Xm is a random sample from a normal distribution and so is Y1, . . . , Yn. The plausibility of these assumptions can be judged by constructing a normal probability plot of the xis and another of the yis. Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • When the population distributions are both normal, the standardized variable T = X̄ − Ȳ − (µ1 − µ2)√ S21 m + S22 n has approximately t distribution with ν df • ν can be estimated from data as ν = ( s21 m + s22 n )2 (s21/m) 2 m−1 + (s22/n) 2 n−1 • ν has to be rounded down to the nearest integer...why not up? Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 Example • Consider example 9.6 in Devore. The following table helps to illustrate it: Fabric Type Sample Size Sample Mean Sample Standard Deviation Cotton 10 51.71 .79 Triacetate 10 126.14 3.59 Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • We assume that porosity distributions for both types of fabric are normal; then, the two-sample t-test(CI) can be used. Note that we do not assume anything about variances of the two populations concerned... • The number of df is ν = ( .6241 10 + 12.881 10 )2 (.6241/10)2 9 + (12.881/10) 2 9 = 9.87 and we use ν = 9 Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • The resulting CI is 51.71−136.14±(2.262) √ .6241 10 + 12.8881 10 = (−87.06,−81.80) • Conclusion... Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 Remarks • Traditionally, this test has been recommended as the first to use when comparing two different means. It has a number of advantages over the two-sample t test: it is a likelihood ratio test, it is an exact test and it is easier to use! • However, this test has a major problem: it is not robust to the violation of equality of variance assumption. When σ21 = σ 2 2 , its gains in power are small when compared to the two-sample t-test. That is why today it is often recommended to use the two-sample t test in most cases. It is especially true when the sample sizes are different. Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 • It may seem to be a plausible idea that one could first test a hypothesis H0 : σ 2 1 = σ 2 2 and then choose the type of the t test based on the outcome. • Unfortunately, the most common type of test used for this purpose ( we will consider it at the very end of the course) is very sensitive to the violation of normality assumption and often not very reliable as a result. • Yet another warning concerns normality of the data. If the distribution of the data is strongly asymmetric, both of these tests will prove unreliable. The alternative is to use a special class of tests that do not use any distribution assumptions at all (so-called nonparametric tests). Aug, 2006 Statistics 511: Statistical Methods Dr. Levine Purdue University Fall 2006 Analysis of Paired Data • The data consists of n independently selected pairs (X1, Y1), (X2, Y2),..., (Xn, Yn) with E Xi = µ1 and E Xi = µ2. The differences Di = Xi − Yi are assumed to be normally distributed with mean value µD = µ1 − µ2 and variance σ2D. The last requirement is usually the consequence of X ’s and Y ’s being normally distributed themselves Aug, 2006

Documents

questions

Inferencing Between Two Samples: Z Tests & Confidence Intervals for Difference of Means - , Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text