Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistics and Probability: Sampling Distributions and Hypothesis Testing, Study notes of Political Science

An overview of statistical concepts related to sampling distributions and hypothesis testing. It covers topics such as empirical and theoretical probability distributions, discrete and continuous distributions, population parameters, expected value and mean, variance and standard deviation, chebyshev's inequality, normal distribution, hypothesis testing with known and unknown standard errors, and the t distribution. The document also discusses the central limit theorem and the sampling distribution of sample means.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-y83-1
koofers-user-y83-1 🇺🇸

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Statistics and Probability: Sampling Distributions and Hypothesis Testing and more Study notes Political Science in PDF only on Docsity! 18 Lecture POLI 7962: Seminar in Quantitative Political Analysis August 2-6, 2007 I. Sample Estimation of Population Parameters A. Introduction 1. In testing hypotheses about the real world, it is unusual for the researcher to have collected data that involves the population of all possible relevant cases. For instance, if we are interested in the determinants of voting behavior for American citizens, it is impossible to collect data on all eligible citizens of voting age. Instead, researchers must rely on samples of the population in order to make inferences about the parameters (characteristics) of that population. The researcher hopes that the distribution of outcomes observed is a good approximation of the population distribution from which the sample was drawn. If it is a good approximation, the sample mean and standard deviation, for example, would also be expected to be good approximations of the population mean and standard deviation. a. Random sampling 2. Inferential statistics: numbers that represent generalizations, or inferences, drawn about some characteristic (parameter) of a population, based on evidence from a sample of observations from the population. a. Inference: the process of making generalizations or drawing conclusions about the attributes of a population based on evidence contained in a sample. b. More broadly, an inference is the process of drawing conclusions about that which is not observed directly. B. Probability Distributions 1. A probability distribution is a set of outcomes, each of which has an associated probability of occurrence. a. Empirical probability distribution: a probability distribution for a set of empirical observations. b. Theoretical probability distribution: a probability distribution for a set of theoretical observations. 2. Discrete vs. continuous probability distributions. 19 C. Describing Discrete Probability Distributions 1. Population parameters: descriptive characteristics of a population (such as mean, variance, or correlation), usually designated by a Greek letter. 2. Expected value and mean of a probability distribution a. The single outcome that best describes a probability distribution is its expected value, which is also the mean of the probability distribution: E(Y) = Σ Yip(Yi) μ = E(Y) b. If one is looking for a single number that (1) characterizes a given distribution, (2) minimizes the sum of deviations from the mean, and (3) represents the best “guess” of a randomly-selected number from a given distribution, the mean (i.e., expected value) is that number. 3. Variance (σ2) and standard deviation (σ) of the population: because researchers do not ordinarily observe populations, these parameters are largely of theoretical interest. D. Chebycheff's Inequality Theorem 1. Observations that are distant from the mean of a distribution occur, on average, with less frequency than those close to the mean. In general, the more distant an outcome is from its mean, the lower the probability of observing it. Hence extreme scores are less likely to occur. 2. Chebycheff's Inequality: a theorem which states that, regardless of the shape of a distribution, the probability of an observation being k standard deviations above (or below) the population mean is less than or equal to 1/k2. E. Normal Distributions 1. Normal distribution: a smooth, bell-shaped theoretical probability distribution for continuous variables that ranges from -∞ to +∞. a. The shape of any given normal curve can be determined by two values: the population mean and variance. b. The values of any normal distribution can easily be converted to Z-scores using the formula: Z = (Y - μY) / σY 22 F. The Central Limit Theorem 1. Central limit theorem: a mathematical theorem that states that if repeated random samples of size N are selected from any population with mean = μ and standard deviation = σ, then the means of the samples will be normally distributed with mean = μ and standard error = σ / (sqrt N) as N gets large. In other words, the mean of the distribution of all the sample means of a given sample size drawn at random will equal the population from which the samples were drawn. Moreover, the variance of this new hypothetical distribution is smaller than the original population variance by exactly a factor of 1 / (sqrt N). The theorem does not make any assumption about the shape of the population from which the samples are drawn. 2. Sampling distribution of sample means: the population distribution of all possible means for samples of size N selected from a population. 3. Standard error: the standard deviation of the sampling distribution. For the μ parameter, the standard error is represented by the following equation: a. The central limit guarantees that a given sample mean can be made to come close to the population mean in value by simply choosing an N large enough, since the variance of the sample distribution of means becomes small as N gets larger. b. Based on information pertaining to normal distributions, one should expect that 95% of all sample means will fall within 1.96 standard errors of the population mean. G. Sample Point Estimates and Confidence Intervals 1. Point estimate: a sample statistic used to estimate a population parameter. For instance, the sample mean (M) can be used to estimate the population mean (μ). This estimate has some level of uncertainty associated with it, so it becomes crucial to specify that uncertainty so that one can judge the quality of the parameter estimate. 2. Confidence interval: a range of values constructed around a point estimate which makes it possible to state the probability that the interval contains the population parameter between its upper and lower confidence limits. a. In general: Y ± (Zα/2)(σY) b. For 95% confidence interval: Y ± (1.96)(σY) 3. In general, the larger the sample size, the smaller the interval around the sample mean for a given confidence interval, primarily because the standard error is an inverse function of sample size. 23 4. Use of the standard error: To be able to compute the standard error, one requires knowledge of the population standard deviation, which is not generally known. When N is "large" (i.e. 100 or more is a rough estimate), however, we can be confident that the sample standard deviation is a good estimate of the population standard error. When this is the case, one can use the Z distribution to test hypotheses about sample means. H. Desirable Properties of Estimators 1. Lack of bias: For unbiased estimators, it is expected that the estimator of a population parameter has an expected value which equals the parameter value. 2. Efficiency: the estimator of a population parameter has the smallest sampling variance among all possible estimators. 3. Consistency: the estimator of a population parameter approximates the parameter more closely as N gets large. I. Hypotheses Testing 1. Researchers are often interested in testing hypotheses about population means based on information collected from sample data. In order to test such an hypothesis, it is necessary to construct a null hypothesis as well as the alternative hypothesis. a. Null hypothesis: a hypothesis that the population mean is equal to or unequal to a specific value. b. Alternative hypothesis: a secondary hypothesis about the value of a population parameter that often mirrors the research or operational hypothesis. J. Hypothesis testing when the standard error is known: The Z distribution 1. When is the standard error known? The standard error is known when one knows σY, the population standard deviation. Normally, one would not expect to know σY, since population data are rarely available. However, when the sample size is sufficiently large (say, N = 100), the sample standard deviation is a relatively accurate, unbiased estimator for the population standard deviation. Given this, one can substitute the sample standard deviation for the population standard deviation in calculating the standard error and conducting hypotheses tests. 2. The general logic behind the process of testing hypotheses is to establish the null and alternative (working) hypotheses, determine the level of significance to be used, ascertain the critical value of the test statistic (i.e., the value associated with the significance level), and then calculate the test statistic that represents the difference between the value of the sample estimate and the null hypothesized population value, all couched in standard error units. Typically, the test statistic will be measured using the following general formula: One then compares the test statistic with the critical value. If the test statistic is further away from 0 then the critical value, we say that the observed sample value is sufficiently removed from the null hypothesized population value that it is highly unlikely that the null hypothesized population could have generated the observed sample value. 24 3. In testing hypotheses relating to means, the following are the steps in the process: (1) The first step in hypothesis testing is to specify the null hypothesis and the alternate hypothesis. In testing hypotheses about µ, the null hypothesis (H0) is a hypothesized value of µ, usually associated with a null effect. The alternative hypothesis (H1) represents the research hypothesis. (2) The second step is to choose a significance level, or a level of uncertainty that one is willing to accept in falsely rejecting the null hypothesis. Normally one should assume the .05 level is chosen. (3) The third step is to establish the critical test statistic—in this case, a Z value--associated with the level of significance from step #2. This is going to be the compared to a test statistic that will determine whether the observed mean is sufficiently different from the null hypothesized mean justify rejection of the null hypothesis. (4) The fourth step is to calculate the test Z score, using the following formula: Z = (M - μH0) / σM (5) In the fifth step, one compares the critical Z and test Z. If the test statistic is further away from 0 than the critical Z, then one rejects the null hypothesis and “accepts” the working hypothesis. Otherwise one is unable to reject the null hypothesis. K. Hypothesis testing when the standard error is unknown: The t distribution 1. Thus far it has been assumed that the researcher knows the standard error of the sampling distribution of the mean, or that N is large enough so that we can use the sample variance as a reasonable estimate of the population variance. If either of these assumptions is not met, then the Z distribution is inappropriate for hypothesis testing. If this is the case, researchers must rely on the t distribution: a. t distribution: a test statistic used with small samples selected from a normally distributed population or, for large samples, drawn from a population with any shape. 2. t variable (t score): a transformation of the scores of a continuous frequency distribution derived by subtracting the mean and dividing by the estimated standard error. In general: For hypothesis tests involving the mean, the following equation is used for calculating the critical t statistic: t = (M - μH0) / sM
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved