Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Sampling & Hypothesis Testing: Population, Sample, Estimators, & Confidence Intervals - Pr, Study notes of Introduction to Econometrics

An in-depth exploration of sampling and hypothesis testing concepts. It covers the properties of estimators, the central limit theorem, and the calculation of confidence intervals for population means and proportions. The document also discusses the importance of knowing the mean, standard error, and shape of sampling distributions.

Typology: Study notes

Pre 2010

Uploaded on 08/16/2009

koofers-user-9dl
koofers-user-9dl 🇺🇸

5

(1)

10 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Sampling & Hypothesis Testing: Population, Sample, Estimators, & Confidence Intervals - Pr and more Study notes Introduction to Econometrics in PDF only on Docsity! Sampling and Hypothesis Testing Allin Cottrell Population and sample Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus statistic . size mean variance proportion Population: N µ σ 2 π Sample: n x̄ s2 p 1 Properties of estimators: sample mean x̄ = 1 n n∑ i=1 xi To make inferences regarding the population mean, µ, we need to know something about the probability distribution of this sample statistic, x̄. The distribution of a sample statistic is known as a sampling distribution . Two of its characteristics are of particular interest, the mean or expected value and the variance or standard deviation. E(x̄): Thought experiment: Sample repeatedly from the given population, each time recording the sample mean, and take the average of those sample means. 2 If the sampling procedure is unbiased, deviations of x̄ from µ in the upward and downward directions should be equally likely; on average, they should cancel out. E(x̄) = µ = E(X) The sample mean is then an unbiased estimator of the population mean. 3 Efficiency One estimator is more efficient than another if its values are more tightly clustered around its expected value. E.g. alternative estimators for the population mean: x̄ versus the average of the largest and smallest values in the sample. The degree of dispersion of an estimator is generally measured by the standard deviation of its probability distribution (sampling distribution). This goes under the name standard error . 4 Standard error of sample mean σx̄ = σ√ n • The more widely dispersed are the population values around their mean (larger σ ), the greater the scope for sampling error (i.e. drawing by chance an unrepresentative sample whose mean differs substantially from µ). • A larger sample size (greater n) narrows the dispersion of x̄. 5 Other statistics Population proportion, π . The corresponding sample statistic is the proportion of the sample having the characteristic in question, p. The sample proportion is an unbiased estimator of the population proportion E(p) = π Its standard error is given by σp = √ π(1−π) n 6 Population variance, σ 2. σ 2 = 1 N N∑ i=1 (xi − µ)2 Estimator, sample variance: s2 = 1 n− 1 n∑ i=1 (xi − x̄)2 7 Generalizing the idea Let θ denote a “generic parameter”. 1. Find an estimator (preferably unbiased) for θ. 2. Generate θ̂ (point estimate). 3. Set confidence level, 1−α. 4. Form interval estimate (assuming symmetrical distribution): θ̂ ±maximum error for (1−α) confidence “Maximum error” equals so many standard errors of such and such a size. The number of standard errors depends on the chosen confidence level (possibly also the degrees of freedom). The size of the standard error, σθ̂, depends on the nature of the parameter being estimated and the sample size. 16 z-scores Suppose the sampling distribution of θ̂ is Gaussian. The following notation is useful: z = x − µ σ The “standard normal score” or “z-score” expresses the value of a variable in terms of its distance from the mean, measured in standard deviations. Example: µ = 1000 and σ = 50. The value x = 850 has a z-score of −3.0: it lies 3 standard deviations below the mean. Where the distribution of θ̂ is Gaussian we can write the 1−α confidence interval for θ as θ̂ ± σθ̂ zα/2 17 z.975 = −1.96 z.025 = 1.96 0.95 This is about as far as we can go in general terms. The specific formula for σθ̂ depends on the parameter. 18 The logic of hypothesis testing Analogy between the set-up of a hypothesis test and a court of law. Defendant on trial in the statistical court is the null hypothesis , some definite claim regarding a parameter of interest. Just as the defendant is presumed innocent until proved guilty, the null hypothesis (H0) is assumed true (at least for the sake of argument) until the evidence goes against it. H0 is in fact: Decision: True False Reject Type I error Correct decision P = α Fail to reject Correct decision Type II error P = β 1− β is the power of a test; trade-off between α and β. 19 Choosing the significance level How do we get to choose α (probability of Type I error)? The calculations that compose a hypothesis test are condensed in a key number, namely a conditional probability: the probability of observing the given sample data, on the assumption that the null hypothesis is true. This is called the p-value . If it is small, we can place one of two interpretations on the situation: (a) The null hypothesis is true and the sample we drew is an improbable, unrepresentative one. (b) The null hypothesis is false. The smaller the p-value, the less comfortable we are with alternative (a). (Digression) To reach a conclusion we must specify the limit of our comfort zone, a p-value below which we’ll reject H0. 20 Say we use a cutoff of .01: we’ll reject the null hypothesis if the p-value for the test is ≤ .01. If the null hypothesis is in fact true, what is the probability of our rejecting it? It’s the probability of getting a p-value less than or equal to .01, which is (by definition) .01. In selecting our cutoff we selected α, the probability of Type I error. 21 Example of hypothesis test A maker of RAM chips claims an average access time of 60 nanoseconds (ns) for the chips. Quality control has the job of checking that the production process is maintaining acceptable access speed: they test a sample of chips each day. Today’s sample information is that with 100 chips tested, the mean access time is 63 ns with a standard deviation of 2 ns. Is this an acceptable result? Sould we go with the symmetrical hypotheses H0:µ = 60 versus H1:µ ≠ 60 ? Well, we don’t mind if the chips are faster than advertised. So instead we adopt the asymmetrical hypotheses: H0:µ ≤ 60 versus H1:µ > 60 Let α = 0.05. 22 The p-value is P(x̄ ≥ 63 |µ ≤ 60) where n = 100 and s = 2. • If the null hypothesis is true, E(x̄) is no greater than 60. • The estimated standard error of x̄ is s/ √ n = 2/10 = .2. • With n = 100 we can take the sampling distribution to be normal. • With a Gaussian sampling distribution the test statistic is the z-score. z = x̄ − µH0 sx̄ = 63− 60 .2 = 15 23 Variations on the example Suppose the test were as described above, except that the sample was of size 10 instead of 100. Given the small sample and the fact that the population standard deviation, σ , is unknown, we could not justify the assumption of a Gaussian sampling distribution for x̄. Rather, we’d have to use the t distribution with df = 9. The estimated standard error, sx̄ = 2/ √ 10 = 0.632, and the test statistic is t(9) = x̄ − µH0 sx̄ = 63− 60 .632 = 4.74 The p-value for this statistic is 0.000529—a lot larger than for z = 15, but still much smaller than the chosen significance level of 5 percent, so we still reject the null hypothesis. 24 In general the test statistic can be written as test = θ̂ − θH0 sθ̂ That is, sample statistic minus the value stated in the null hypothesis—which by assumption equals E(θ̂)—divided by the (estimated) standard error of θ̂. The distribution to which “test” must be referred, in order to obtain the p-value, depends on the situation. 25 Another variation We chose an asymmetrical test setup above. What difference would it make if we went with the symmetrical version, H0:µ = 60 versus H1:µ ≠ 60 ? We have to think: what sort of values of the test statistic should count against the null hypothesis? In the asymmetrical case only values of x̄ greater than 60 counted against H0. A sample mean of (say) 57 would be consistent with µ ≤ 60; it is not even prima facie evidence against the null. Therefore the critical region of the sampling distribution (the region containing values that would cause us to reject the null) lies strictly in the upper tail. But if the null hypothesis were µ = 60, then values of x̄ both substantially below and substantially above 60 would count against it. The critical region would be divided into two portions, one in each tail of the sampling distribution. 26 H0:µ = 60. Two-tailed test. Both high and low values count against H0. α/2α/2 H0:µ ≤ 60. One-tailed test. Only high values count against H0. α 27
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved