Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Hypothesis Testing and t-Tests with an Example, Study notes of Statistics

An explanation of hypothesis testing and t-tests using an example. The example involves testing whether the mean weight of all blocks is equal to a specified value under the null hypothesis, and calculating the p-value and confidence interval using a one-sample t test. The document also discusses the difference between the normal distribution and t-distribution and how to find areas under the t-distribution using table c or minitab.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-4sp
koofers-user-4sp 🇺🇸

10 documents

1 / 32

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Hypothesis Testing and t-Tests with an Example and more Study notes Statistics in PDF only on Docsity! 1 Christopher Holloman, The Ohio State University, Summer 2006 Statistics 528 Data Analysis I Lecture #8 July 18, 2006 Christopher Holloman, The Ohio State University, Summer 2006 Overview of Today’s Lecture  IPS Sections 6.2, 7.1  Tests of Significance  Inference for the Mean of a Population 2 3 Intro. to Hypothesis Tests Two of the most common types of statistical inference: 1. Confidence intervals Goal is to estimate (and communicate uncertainty in our estimate of) a population parameter. 2. Tests of Significance Goal is to assess the evidence provided by the data about some claim concerning the population. Christopher Holloman, The Ohio State University, Summer 2006 Basic Idea of Tests of Significance Example: Each day Tom and Heather decide who pays for lunch based on a toss of Tom’s favorite quarter. Heads - Tom pays Tails - Heather pays  Tom claims that heads and tails are equally likely outcomes for this quarter.  Heather thinks she pays more often. 5 Christopher Holloman, The Ohio State University, Summer 2006 Performing a Hypothesis Test 1. State Hypotheses State your research question as two hypotheses - the null and the alternative hypotheses. These hypotheses are written in terms of the population parameters. The null hypothesis (H0) is the statement being tested. This is assumed “true” and compared to the data to see if there is evidence against it. A null hypothesis that we will see often is that the mean µ is equal to some standard value. Usually, null hypotheses give a statement of “no difference” or “no effect.” Christopher Holloman, The Ohio State University, Summer 2006 Suppose we want to test the null hypothesis that µ is some specified value, say µ0. Then H0: µ = µ0 Note: We will always express H0 using an equality sign. 6 Christopher Holloman, The Ohio State University, Summer 2006 The alternative hypothesis (Ha) is the statement about the population parameter that we hope or suspect is true. We are interested in seeing if the data support this hypothesis.  Ha can be one-sided: Ha: µ >µ0 or Ha: µ < µ0  Ha can be two-sided: Ha: µ ≠µ0 Christopher Holloman, The Ohio State University, Summer 2006 Example: Strawberry Bars Kellogg’s says that its strawberry bars weigh, on average, 16 oz. 10TV’s consumer reporter is suspicious that the bars weigh less than what is claimed. In order to check his suspicion, he weighs the contents of 20 randomly chosen bars. These 20 bars have an average weight of 15.6 oz. Assume that the weights follow a normal distribution with a standard deviation of 0.7 oz. Is there evidence that the reporter’s suspicion is correct? 7 Christopher Holloman, The Ohio State University, Summer 2006  The hypotheses are: H0: µ = 16 Ha: µ < 16  Is this a one-sided test or a two-sided test? This is a one sided test. The reporter thought the bars were smaller than 16 oz. Christopher Holloman, The Ohio State University, Summer 2006 2) Calculate P-value We ask: “Does the sample give evidence against the null hypothesis?” P-value: The probability that the sample mean would take a value as extreme or more extreme than the one we actually observed assuming H0 is true. 10 Christopher Holloman, The Ohio State University, Summer 2006 P-values in terms of the test statistic: where z is the observed value of the test statistic and the probabilities are found using the standard normal distribution given in Table A. H a P - v a lu e A r e a u n d e r c u r v e µ < µ 0 P ( Z ≤ z ) µ > µ 0 P ( Z ≥ z ) µ ≠ µ 0 2 P ( Z ≥ | z | ) Left of z Right of z Tails Christopher Holloman, The Ohio State University, Summer 2006  A p-value is exact if the population distribution is normal.  If the population is not normal, the p-value approximates the true probability for large n because of the Central Limit Theorem. 11 Christopher Holloman, The Ohio State University, Summer 2006 3) State Your Conclusions  The final step is to decide if there is a strong amount of evidence to reject H0 in favor of Ha. This is accomplished using the P-value.  In our example, we got a P-value =0.0052. What does this tell us? If H0 is true (i.e., true mean weight is 16 oz), then the chance of getting a sample whose mean weight is 15.6 oz or less is 0.52% Christopher Holloman, The Ohio State University, Summer 2006 Does it give evidence against H0? Yes, it is very unlikely that we would observe a sample mean as low as we did if H0 is true. Conclusion: We reject the null hypothesis. 12 Christopher Holloman, The Ohio State University, Summer 2006  A small p-value is strong evidence against H0. Such a p-value says that if H0 is true, then the observed data are unlikely to occur just by chance.  The smaller the P-value, the stronger the evidence against the null. Christopher Holloman, The Ohio State University, Summer 2006  Question: How small does the P-value need to be? Prior to testing, it is determined how small the P-value must be to be considered decisive evidence against H0. This value is called the significance level and is usually represented as α. Typical α levels used are 0.1, 0.05 and 0.01. If the P-value ≤ α, reject the null hypothesis. If the P-value > α, do not reject the null hypothesis. 15 Christopher Holloman, The Ohio State University, Summer 2006 Example: EPA The EPA limit on concentration of PCB in drinking water is 5 ppm. Wells are regularly tested to make sure they are not over the limit. A random sample of 100 water specimens from a well was collected and has an average PCB 5.1 ppm. Is there evidence at 5% level that the well is over the limit? Assume that the PCB concentration varies with standard deviation 0.8. Christopher Holloman, The Ohio State University, Summer 2006  State the hypotheses: H0: µ = 5 Ha: µ > 5  Calculate the test statistic: 25.1 100 8.0 51.50 = − = − = n x z σ µ 16 Christopher Holloman, The Ohio State University, Summer 2006  Calculate the P-value: The P-value is the area under the normal curve to the right of 1.25 P-value = 0.1056  Since the p-value is larger than 0.05, we do not have evidence at the 5% level that the PCB level exceeds the limit. We do not reject the null hypothesis. Christopher Holloman, The Ohio State University, Summer 2006 Tests from Confidence Intervals  We have covered two types of statistical inference procedures for the population mean µ: Confidence Interval (CI) and Tests of Significance  Question: Is there any relationship between hypothesis tests and confidence intervals? Answer: Yes, a level α two-sided test rejects a hypothesis H0:µ = µ0 exactly when the value µ0 falls outside a level (1-α) CI for µ. 17 Christopher Holloman, The Ohio State University, Summer 2006 Example: Concrete Block Bud’s Home Center sells concrete blocks. Bud wants to estimate the average weight of all blocks in stock. A sample of 64 blocks has a mean weight of 65.5 lbs. Assume that the weights of blocks vary with standard deviation 4.6 lbs. Christopher Holloman, The Ohio State University, Summer 2006  Construct a 95% CI for the mean weight of all blocks.  (64.373, 66.627) is a 95% CI for µ. 64 6.4 )96.1(5.65* ±=± n zx σ 20 Christopher Holloman, The Ohio State University, Summer 2006  These four statements also convey the same conclusion: 1. The test is not significant. 2. Do not reject the null hypotheses. 3. The data do not show evidence against H0. 4. The p-value is larger than α.  Usually 1,2, or 3 are given as the conclusion, and 4 is given as the explanation of the conclusion. 40 Confidence Intervals and Hypothesis Tests in Minitab 1. Use Minitab to get descriptive statistics and then use formulas. 2. Use Minitab directly to compute confidence intervals and perform tests: Stat  Basic Statistics  1-Sample Z Note: This function is for computing confidence intervals and hypothesis tests of µ, the population mean, assuming the population standard deviation is known. (Section 6.1 and Section 6.2) 21 Christopher Holloman, The Ohio State University, Summer 2006 Stat  Basic Statistics  1-Sample Z  Variables: enter column of data  Sigma: known value of the population standard deviation  Test Mean: value of mean under the null hypothesis (H0) Christopher Holloman, The Ohio State University, Summer 2006 Click on Options box Confidence level: level C of confidence interval or level (1-α) of a hypothesis test Alternative: form of alternative hypothesis  Not equal  HA: µ ≠ µA  Less than  HA: µ < µA  Greater than  HA: µ > µA Note: you need to select not equal as the alternative to calculate an equal tails confidence interval (like the ones we’ve been doing). 22 Christopher Holloman, The Ohio State University, Summer 2006 t-Tests Previously, when making inferences about the population mean, µ, we were assuming: 1. Our data (observations) are an SRS of size n from the population. 2. The observations come from a normal distribution with parameters µ and σ. 3. The population standard deviation σ is known. Christopher Holloman, The Ohio State University, Summer 2006 To perform statistical inference, we were using the test statistic (one-sample z statistic): which has a normal distribution. This holds approximately for large samples even if assumption 2 is not satisfied. Why? CENTRAL LIMIT THEOREM n x z σ µ0−= 25 Christopher Holloman, The Ohio State University, Summer 2006  The spread of a t-distribution is larger than that of a standard normal distribution. That is, there is more probability in the tails of a t-distribution.  This makes sense because the t statistic should have more variability than the test statistic z that we used in Chapter 6.  Why? There is added variability in the t statistic since it uses s, an estimate of σ, rather than a known, fixed value of σ. Christopher Holloman, The Ohio State University, Summer 2006  Notation: tk or t(k) represents the t-distribution with k d.f.  As the d.f. k increases, the tk distribution approaches the N(0,1) distribution. As the sample size increases, s estimates σ more accurately, so there is little extra variation. 26 Christopher Holloman, The Ohio State University, Summer 2006 Finding Areas Under the t Distributions  We use Table C to find areas under the t distributions. Note: Table C works very differently from Table A.  Table C gives critical values of t-distributions for various d.f.  The numbers in the middle of the chart are values from t distributions. Each row corresponds to a t- distribution with the degrees of freedom given at the beginning of the row. The numbers in the top row are right tail areas. Areas under t distributions can also be found using Minitab: Stat  Probability Distributions  t Christopher Holloman, The Ohio State University, Summer 2006 Example: If we go across the row for nine degrees of freedom and down the column for an area of .05, we get a t value of 1.833. That means that for a t9 distribution the area under the curve to the right of 1.833 is 0.05. Note: There are no negative t values in the table. To get the area to the left of a negative t-statistic, take advantage of the symmetry of the distribution. 27 Christopher Holloman, The Ohio State University, Summer 2006 Inference Procedures using the t Statistics We use the t distributions to do the same types of inferences we did in Chapter 6 (confidence intervals and hypothesis tests). The procedures are very similar to what we learned in Chapter 6, only we replace the σ’s with s’s and z’s with t’s. Both confidence intervals and hypothesis tests can be performed using Minitab: Stat  Basic Statistics  1-Sample t This function is almost identical to the 1-Sample Z function. Christopher Holloman, The Ohio State University, Summer 2006 Confidence Interval for µ when σ is Unknown A (1-α) confidence interval for µ is given by where t* is the upper α/2 critical value for the tn-1 distribution, i.e. the area between –t* and t* is 1-α. n stx * ± 30 Christopher Holloman, The Ohio State University, Summer 2006 Example: Suppose testing H0: µ = 0 vs. Ha: µ > 0 yields a one-sample t statistic of 1.82 from a sample of size 15.  What are the degrees of freedom for this statistic?  Give two critical values t* from Table C that bracket t. What are the right-tail probabilities for these two entries? Christopher Holloman, The Ohio State University, Summer 2006  Between what two values does the P- value fall?  Is t = 1.82 statistically significant at the 5% level? At 1%? 31 Christopher Holloman, The Ohio State University, Summer 2006 Example: The National Center for Health Statistics reports that the mean systolic blood pressure for males aged 35-44 years is 128. The medical director of a large company looks at the medical records of 21 executives in this age group and finds that the mean systolic blood pressure is 127.07 and standard deviation is 15 in this sample. Is there evidence that the company’s executives have a different mean blood pressure than the general population? Christopher Holloman, The Ohio State University, Summer 2006  State the hypotheses: H0: µ = 128 Ha: µ ≠ 128  Calculate the t statistic: 28.0 21 15 12807.1270 −= − = − = ns x t µ 32 Christopher Holloman, The Ohio State University, Summer 2006  Estimate the p-value and state the conclusion of the test: The degrees of freedom is 20. Look to see where 0.28 falls in the row for 20 d.f.  It’s off the chart!  The area to the left of -0.28 is larger than 0.25  Our p-value is larger than 2 x 0.25 = 0.50  We do not reject the null. We do not have evidence that this company’s executives have different blood pressure. Christopher Holloman, The Ohio State University, Summer 2006 Cautions about Using the t Procedures 1. If the sample size is less than 15 and the data are close to normal, it is okay to use the t procedure. Do not use it if the data are clearly nonnormal and/or if outliers exist. 2. If the sample size is ≥ 15 it is ok to use the t procedure unless outliers or strong skewness exists. 3. If the sample size is ≥ 40 it is okay to use the t procedure even if strong skewness exists.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved