Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Inferential Statistics (Hypothesis Testing), Study notes of Statistics

In this chapter, we first present the logic behind hypothesis testing—a sec- tion that can be skimmed without any loss of information provided that one pays ...

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

tintoretto
tintoretto 🇬🇧

4

(8)

215 documents

1 / 15

Toggle sidebar

Related documents


Partial preview of the text

Download Inferential Statistics (Hypothesis Testing) and more Study notes Statistics in PDF only on Docsity! Chapter 7 Inferential Statistics (Hypothesis Testing) The crux of neuroscience is estimating whether a treatment group differs from a control group on some response, whether different doses of a drug are asso- ciated with a systematic difference in response, or a host of other questions. All of these queries have one thing in common—they ask the scientist to make inferences about the descriptive statistics from a study. This is the domain of inferential statistics and the generic topic is hypothesis testing. Many topics in this chapter have been touched on in earlier discussion of parametric statistics (Section X.X), Terminology (Section X.X), and Distributions (Section X.X). Here these disparate concepts are presented together in a unified framework. In this chapter, we first present the logic behind hypothesis testing—a sec- tion that can be skimmed without any loss of information provided that one pays attention to the definitions. We then outline the three major modes for hypothesis testing—the test statistic/p value approach, the critical value ap- proach, and the confidence limits approach. With modern computers, almost everyone uses the test statistic/p value approach. Having some familiarity with the other two approaches, however, increases understanding of the inferential statistics. 7.1 Logic of Hypothesis Testing In the hierarchy of mathematics, statistics is a subset of probability theory. Thus, inferential statistics always involves the probability distribution for a statistic. It is easier to examine this distribution using a specific statistic than it is to treat it in general terms. Hence, we start with the mean. 1 CHAPTER 7. INFERENTIAL STATISTICS (HYPOTHESIS TESTING) 2 Figure 7.1: Sampling Distribution of the Means. … … Indivi- duals Means X N X2 X1 = X (X1 + X2 + … X N ) N 7.1.1 The Mean and its Sampling Distribution Figure 7.1 gives a schematic for sampling means, a topic that we touched on in Section X.X. There is a large hat containing an infinite number of observations, say people in this case. We reach into the hat, randomly select a person, and record their score on a variable, X. We do this for N individuals. We then calculate the mean of the scores on a separate piece of paper and toss that into another hat, the hat of means. Finally we repeat this an infinite number of times. The distribution of means in the hat of means is called the sampling distribution of the means. In general, if we were to perform this exercise for any statistic, the distribution in the hat of statistics is called the sampling distribution of that statistic. Now comes the trick. We can treat all the means in the hat of means as if they were raw scores. Hence, we can ask questions such as “What is the probability of randomly picking a mean greater than 82.4?” To answer such a question we must know the distribution in the hat. That distribution depends on two things: (1) the distribution of the raw scores, and (2) the sample size. Let µ and σ denote, respectively, the mean and standard deviation of the raw scores in the hat of individuals. If these raw scores have a normal distribution, then the means in the hat of means will also have a normal distribution. The CHAPTER 7. INFERENTIAL STATISTICS (HYPOTHESIS TESTING) 5 now want to define the “remote probabilities” of this curve. Naturally, these will be at the two tails of the curve. (At this point, you may wonder while we use both tails when the mean IQ for Mrs. Smith’s students is clearly above the mean. Why not just use the upper tail? The answer is that we should always set up the hypothesis tests before gathering or looking at the data. Hence, we want the most unlikely outcomes at both ends of the curve because the class could be unrepresentative by having either low IQs or high IQs). Now the question becomes, “Just how remote should the probabilities be?” You may be surprised to learn that there is no rigorous, mathematical answer to this question. Instead, may decades ago, scientists arrived at an “educated guess” that won consensus, and that tradition has been carried on to this day. For most purposes, scientists consider a “remote probability” as the 5% most unlikely outcomes. We will have more to say about this later. Let us just accept this criterion for the time being. Given that the remote probabilities are divided between each tail of the normal curve, we want to find the lower cutoff for a normal curve with a mean of 100 and standard deviation of 3.128 that has 2.5% or the curve below it. Then we must find the upper cutoff so that 2.5% of the curve is above it. We start with the lower cutoff and must find the Z score that separates the bottom 2.5% of the normal curve from the upper 97.5%.2 Using the appropriate function provided with statistical packages, that Z score is -1.96. Now we use Equation 7.2 to solve for X: −1.96 = X − 100 3.128 X = −1.96)(3.128) + 100 = 93.87 We now repeat this exercise for the upper portion of the curve. The Z score separating the bottom 97.5% from the top 2.5% is 1.96. Substituting this into Equation X.X gives the upper cutoff as 106.13. The shaded areas at the tails of Figure 7.2 give these cutoffs and Table X.X presents code from SAS and R that calculates these cut points. Hence, we will reject the hypothesis that the students in Mrs. Smith’s class are representative of the general population if their mean is less than 93.87 or greater than106.13. Their observed mean is 108.6 which is greater than 106.13. Hence, we reject the hypothesis and conclude that the students are not representative of the general population. 7.1.3 More Statisticalese If you grasp the logic behind this, then fine—you know the meaning of statistical inference. This is an intolerable situation for statisticians and so they have 2Most statistical packages have routines that can directly find the cutoff without first converting to Z scores. The longer route is taken here because it will assist learning in other topics about hypothesis testing. CHAPTER 7. INFERENTIAL STATISTICS (HYPOTHESIS TESTING) 6 Table 7.1: SAS and R code for establishing critical values in the Mrs. Smith example. SAS Code: DATA nu l l ; N = 23 ; alpha = . 0 5 ; std = 15/ sq r t (N) ; ZCritLo = PROBIT( . 5∗ alpha ) ; ZCritHi = PROBIT(1 − . 5∗ alpha ) ; XbarCritLo = mu + std ∗ZCritLo ; XbarCritHi = mu + std ∗ZCritHi ; PUT XbarCritLo= XbarCritHi=; RUN; R Code: N <− 23 alpha <− . 05 mu <− 100 std <− 15 / sq r t (N) ZCrit <− c (qnorm ( . 5∗ alpha ) , qnorm(1 − . 5∗ alpha ) ) XbarCrit = mu + std ∗ZCrit XbarCrit CHAPTER 7. INFERENTIAL STATISTICS (HYPOTHESIS TESTING) 7 developed more jargon so that students are forced to memorize the terms and regurgitate them on tests. The hypothesis that Mrs. Smith’s students are representative of the popu- lation is called the null hypothesis with is denoted as H0, subscript being a zero and not an uppercase letter O. From a mathematical standpoint, the null hypothesis is a hypothesis that provides concrete numerical values so that the sampling distribution of a statistic (e.g., the mean in the Mrs. Smith example) can be calculated. From a common sense view, the “null” in the null hypothesis means that the hypothesis lack positive, distinguishing characteristics. It is an “empty” hypothesis. The purpose in research is to reject the null hypothesis and conclude that there is evidence for the hypothesis logically opposite to the null hypothesis. This logical alternative is called the alternative hypothesis usually denoted as HA. The percent of outcomes regard as “remote” or, in other works, the percent of most unlikely outcomes is called the alpha or α level. By convention, the α level is set at .05 or 5%. In special cases, it may be set to lower or high values. The α level may also be looked upon as the false positive rate. If the null hypothesis is true, then we will incorrectly reject it α percent of the time. A test of a hypothesis that splits the α level in half, one half used for the upper tail of a distribution and the other half for the lower tail of the sampling distribution, is called a two-tailed test or two-sided test . Such a hypothesis is called a non directional hypothesis. When a hypothesis clearly pertains to only one side of the sampling dis- tribution then it is called a directional hypothesis and the test is termed a one-tailed test or one-sided test . An example would be a hypothesis that administration of a drug might increase locomotor activity. There are more terms to learn, but they will be introduced in the course of the remaining discussion of hypothesis testing. 7.2 The Three Approaches to Hypothesis Testing The three approaches to hypothesis testing are: (1) the critical values approach; (2) the test-statistic/p value approach; and the (3) confidence limits approach. All three are mathematically equivalent and will always lead to the same con- clusion. In this section, these approaches are outlined in the abstract. This is meant as a reference section, so there is no need to commit these approaches to memory. Applications and examples of the methods will be provided later when specific types of problems for hypothesis testing are discussed. By far and away, the most prevalent approach is the test-statistic/p value one because this is the way in which modern statistical software usually presents the results. One important issue is the symmetry of critical values and confidence limits. The examples used below all use symmetrical critical values and confidence in- tervals. That is because we deal with issues about the mean. For other statistics, like the correlation coefficient, they may not be symmetrical. When the corre- lation is small, critical values and confidence limits are for all practical purses CHAPTER 7. INFERENTIAL STATISTICS (HYPOTHESIS TESTING) 10 Table 7.4: Steps in the confidence limit approach to hypothesis testing. Step Instruction: 1 State the null hypothesis (H0) and the alternative hypothesis (HA). 2 Establish whether the test is one-tailed or two-tailed. (NB. All stat packages default to two-tailed testing, so most statisticians recommend two-tailed testing). 3 Establish the probability of a false positive finding (aka the α level). 4 Establish sample size (see Section X.X) . 5 Calculate the observed descriptive statistic. 6 Find the α most unlikely outcomes on the distribution around the observed statistic. Note that confidence intervals are always calculated as two-tailed probabilities. 7 If the value of the statistic under the null hypothesis is not located within this interval, then reject then reject H0. interval estimates are provided in phrases such as “the estimate of the mean was 27.6 ± 4.2.” Confidence limits are always given in terms of (1 – α) units expressed as a percent. Hence, if α is .05, we speak of the 95% confidence interval. A confidence limit is a plus or minus interval such that if an infinite number of random samples were selected then the interval would capture the population parameter (1 – α) percent of the time. That is a mouthful, so let’s step back and explain it. Suppose that we repeatedly sampled 25 people from the general population and recorded their mean IQ. The means in the hat of means in this case would be normally distributed with a mean of 100 and a standard deviation of 15/ √ 25 = 3. To establish the 95% confidence limits, we want to establish the cutoffs in a normal distribution with a mean of 100 and a standard deviation of 3 that separates the middle 95% from the lower and upper 2.5%. The equivalent Z values in the standard normal distribution are -1.96 and 1.96. Hence, the lower confidence limit will be 100 – 1.96*3 = 94.12 and the upper limit would be 105.88. Hence, if we repeatedly sampled means based on an N of 25, then 95% of the time, that mean should fall in the interval between 94.12 and 105.88. This is a concocted example meant to illustrate. No one calculates a confi- dence interval around the expected value of a statistic under the null hypothesis. Why? You can answer this yourself by verifying that such a confidence interval will always equal the critical values. Instead, the confidence interval is calculated around the observed statistic. We can calculate the confidence interval for Mrs. Smith’s class using Equation 7.2 but substituting the class mean for µ and solving for a lower Xand an upper X. The Z that separates the lower 2.5% of the distribution from the upper 97.5% is -1.96. Hence, CHAPTER 7. INFERENTIAL STATISTICS (HYPOTHESIS TESTING) 11 Table 7.5: Correct and incorrect decisions in hypothesis testing. Decision State of H0: Reject H0 Do not reject H0 Probability True α (1− α) 1.0 False (1− β) β 1.0 ZL = −1.96 = XL − µ σX = XL − 108.6 3.128 solving for XL gives the lower confidence limit as XL = 102.47. The Z sep- arating the upper 2.5% from the lower 97.5% is 1.96. Substituting this into Equation 7.2gives the upper confidence limit as XU = 114.73. Consequently, the confidence interval for the mean IQ in Mrs. Smith’s class is between 102.47 and 114.73. The final step in using a confidence interval is to examine whether the interval includes the statistic for the null hypothesis. If the statistic is not located within the interval then reject the null hypothesis. Otherwise, do not reject the null hypothesis. The mean IQ for the general population (100) is the statistic for the null hypothesis. It is not located within the confidence interval. Hence, we reject the null hypothesis that Mrs. Smith’s class is a random sample of the general population. 7.3 Issues in Hypothesis Testing 7.3.0.1 The Yin and Yang (or α and β) of Hypothesis Testing In hypothesis testing there are two hypotheses, the null hypothesis and the alter- native hypothesis. Because we test the null hypothesis, there are two decisions about it. We can either reject the null hypothesis or fail to reject it. This leads to two other possible decisions about the alternative hypothesis—reject it or do not reject it. The two by two contingency table given in Table 7.5 summarizes these decisions. The probabilities are stated as conditional probabilities given the null hypothesis. For example, given that the null hypothesis is true, the probability of rejecting it (and generating a false positive decision) is α. The probability of not rejecting it must be (1− α). The table introduces one more statistical term—β or the probability of a false negative error. Here, the null hypothesis is false but we err in failing to reject it. To complicate matters, the term Type I error is used as a synonym for a false positive judgment or the rejection of a null hypothesis when the null hypothesis is in fact true. In slightly different words, you conclude that your substantive hypothesis has been confirmed when, in fact, that hypothesis is false. A Type II error is equivalent to a false negative error or failure to CHAPTER 7. INFERENTIAL STATISTICS (HYPOTHESIS TESTING) 12 reject the null hypothesis when, in fact, the null hypothesis is false. Here, you conclude that there is no evidence for your substantive hypothesis when, in fact, the hypothesis was correct all along. In terms of notation, α is the probability of a Type I error and β is the probability of a Type II error. One final aspect of terminology: previously we defined power as the ability to reject the null hypothesis when the null hypothesis is false (see Section X.X). In more common sense terms, power is the likelihood that the statistical test will confirm your substantive hypothesis. In Table 7.5, the power of a test equals the quantity (1− β). The dilemma for hypothesis testing is that, for a given sample size and given effect size, α is negatively correlated with β. Upon their introduction to hypothesis testing, many students question why one does not set α as small as possible to avoid false positives. The reason is that a decrease in αincreasesβ. Because power equals (1− β), decreasing α results in a loss of power. Hence, there is a yin/yang relationship in decision making and hypothesis testing. A decrease in false positive rates diminishes power and results in a high rate of false negative findings. An increase in power increases α and results in a high rate of false positive findings. Compromise is in order, and scientists have accepted an α level of .05 as a satisfactory middle ground between the extremes of false positive and false negative errors. 7.3.0.2 Multiple Hypothesis Testing Often one sees a table with a large number of brain areas as the rows and the means on an assay for a control and (one or more) treatment groups as the columns. The researchers perform a number of t-tests or ANOVAs, one for each brain area, and then highlight those that reach statistical significance with an α of .05. This issue illustrates the problem of multiple comparisons or multiple hypothesis testing that occur(s) when a large number of statistical tests is per- formed in a single study or experiment. (The specific topic of multiple testing’s in regression, ANOVA, and the GLM will be discussed later in Section X.X). The nature of this problem is that as more and more hypotheses are tested, the probability that at least one of the null hypotheses will be incorrectly rejected. For example, if the null hypothesis were true and if 100 independent statisti- cal tests were performed with an α level of .05, then 5% of those tests will be significant by chance alone. One simple approach is to use a Bonferroni correction. This adjusts the alpha level by dividing by the number of statistical tests. For example, if the initial alpha level were .05 and 12 statistical tests were performed, then the Bonferroni adjusted alpha level would be .05/12 = .004. Only those tests that achieved a p value of .004 or lower would be significant. While the Bonferroni has the benefit of simplicity, it can significantly reduce statistical power, especially when the number of statistical tests is large. A recent advance for multiple testings is to control the false discovery rate or FDR (Benjamini and Hochberg, 1995). The FDR estimates the proportion
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved