Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Central Limit Theorem and Confidence Intervals in Statistics, Study notes of Data Analysis & Statistical Methods

The central limit theorem, which states that the distribution of sample means is approximately normal, regardless of the underlying distribution of the population, as long as the sample size is large enough. The document also covers confidence intervals, which use point estimates and an estimate of dispersion to form interval estimates around the population mean.

Typology: Study notes

2009/2010

Uploaded on 04/12/2010

koofers-user-wca-1
koofers-user-wca-1 🇺🇸

10 documents

1 / 53

Toggle sidebar

Related documents


Partial preview of the text

Download Central Limit Theorem and Confidence Intervals in Statistics and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Estimating with Confidence, Part II Review • We use y-bar to estimate a population mean, µ. • When sampling from a population with true mean µ, the true mean of the distribution of y- bar is µ. • On the average, the mean of means from larger samples should be closer to the true mean than the mean of the means from smaller samples Sample Size • The rule of thumb is-in most practical situations-n =30 is satisfactory. • As a practical matter though, if the original distribution is severely non-normal then it may take much more 30 samples to assure us that the sample mean will be normally distributed. Central Limit Theorem • More formally, what we've been discussing is the implications of the Central Limit Theorem. • The CLT is the only theorem we'll cover in BST 621 (because it's that important). CLT • Draw a simple random sample of size n from any population whatsoever with mean µ and finite standard deviation σ. • When n is large, the sampling distribution of the sample mean y-bar is approximately normally distributed with mean µ and standard deviation σ/√n (Daniel, p.134) • However, there is no way to rescue a study using data collected haphazardly. • The data will have unknown biases and no fancy formula can rescue badly produced data. • Garbage in, garbage out • Let’s assume the data are representative. • So far, our estimation methods have resulted in point estimates. • Confidence intervals are even more useful. Confidence Intervals • Confidence intervals use point estimates and an estimate of dispersion to form interval estimates. • Recall that estimating a parameter with an interval involves three components: General Form Estimate ± (reliability coefficient) x (standard error) This will yield two values, a lower limit and an upper limit, around the point estimate. The confidence interval will, with specified reliability, contain the true (unknown) population mean. Known Variance • So, if we know the population standard deviation, σ , then a 95% confidence interval for the population mean is: Examples • In our example population the known σ is 45.9194. • Using a sample of size n = 9, the first simulated experiment yielded a y-bar of 217.6: • [187.6, 247.6] • Notice that this interval covers the true mean of 205.7 Unknown Variance • In practice, we never know σ . • The obvious solution is to use the estimated standard deviation, s, we determined from our sample. • But this does not work. The problem is that the reliability coefficient (1.96) is wrong. • It's wrong because now there are two random terms entering into the confidence interval, y-bar and s. • Both of these are subject to random fluctuation. Solution • Gosset, a statistician who worked at the Guinness brewery, figured out the solution to this problem: the t-distribution. • But to keep from getting fired, he had to publish the work under a pseudonym "Student." • Thus, you may have seen a reference to "Student's t." • The t-distribution is very close to the z but the t distribution has wider tails, reflecting the extra variability ignored by z. —_ Hoarmalz = = = :t distribution • In Appendix Table E, Daniel gives the appropriate t-values for various df. • If you use this table, you want to use the value labeled t.975 for a 95% CI. • That is for a 95% CI, α = 0.05; so, (1 - α /2) = 0.975. • Notice as n gets larger the t value gets closer to the z value. Using JMP • JMP automatically calculates the 95% confidence interval on the mean and shows it in the Distribution of Y report window. • For instance, the Moments report from the first n = 9 cholesterol sample. • The 95% confidence interval from this sample is [173.4, 261.7]. Sample n = 25 300 4 } T OQ 10 N 6 o i] ™ UISN [9 %S6 150 4 100 Sample n = 100 —1 300 + ° I NS 16 Q oO N UISN |D %S6 150 - 100 4 Sample Size and Confidence • A 95% confidence interval implies that we're 95% sure that the interval covers the true (but unknown) mean. • On the other hand, it also means that 5% of the intervals we calculate will not cover the true mean. • This is true whether we use n = 2 or n = 2,000,000. n=9 Sample Qo2 92 9 0 2 092 0 oO 9 9 0 OQ 64 oO WW SF HO N • Notice how much more variable the widths are. • The first sample's y-bar estimate was 164.6 and estimated standard deviation s = 101.2. • The second sample's y-bar estimate was 323.2 and estimated standard deviation s = 383.3. • With larger estimates, you're seeing the effect that an outlier can have. Sample n = 25 600 4 500 | 400 4 aplies — 300 4 200 - Bu 100 - Summary • Sample estimates have distributions that are affected by the underlying distribution and sample size. • Estimates may be totally worthless if obtained from a haphazard "sample" with unknowable bias. • But, if the data are representative of the population then we can rely on the sample mean to estimate the center of the distribution. • The sample mean is unbiased. Summary (cont) • Further, if the population is known to be normal, then a sample mean will also be normal. • If the population distribution has an unknown shape then, with a sufficient sample size, we can rely on the CLT and trust that a sample mean will also be normally distributed. Assessing Normality • Use the Normal Quantile Plot in JMP to assess whether a distribution appears normal. • Here, y-bar is the sample mean, s is the sample standard deviation, and the t-reliability coefficient is the (1 - α/2) percentile of the t- distribution with df = n - 1. • When describing a confidence interval in a sentence or table, be sure to indicate the level of confidence and the sample size. • Always be aware that the shape of the underlying distribution and the size of your sample will directly affect the believability of your point- and interval-estimates Example write ups • In the case where you judge that the distribution is markedly non-normal (skewed), say we begin with the following raw data: • Since the sample was small and the distribution was skewed the distribution of the sample is described by the median and range: • A random sample of n = 20 subjects was assessed for serum triglycerides. The median triglyceride was 115 and the values ranged between 31 and 755. Half of the values were between 91.25 and 195.0. Another example Cholesterol 300 250 200 150 100 T T T T T <4 o-2 -1 O 1 2 3 Normal uantile Plot Quantiles 100.0% maximum 99.555 oF 5% 90.05% 75.0% quartile 50.0% median 25.0% quartile 10.0% 25% O.5% 0.0% minimeurn Moments Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean iM 201.5 33.2407 11906787 22672119 176.8758 20
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved