Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Estimating Population Percentages - Lecture Notes | STAT 2, Study notes of Statistics

Material Type: Notes; Class: Introduction to Statistics; Subject: Statistics; University: University of California - Berkeley; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 09/07/2009

koofers-user-8re
koofers-user-8re 🇺🇸

10 documents

1 / 48

Toggle sidebar

Related documents


Partial preview of the text

Download Estimating Population Percentages - Lecture Notes | STAT 2 and more Study notes Statistics in PDF only on Docsity! From yesterday's USA Today/Gallup Poll “Obama was ahead 47%-44% among registered voters, down from a 6-percentage point lead he had last month. McCain led 49%-45% among likely voters, reversing a 5-point Obama lead among that group. In both cases, the margin of error is +/-4 points.” From yesterday's USA Today/Gallup Poll “To determine whether they were likely voters, poll participants were asked how much thought they had given the election, how often they voted in the past and whether they plan to vote this fall... there was an even number of likely voters from each party.” Recall: population and samples ● We find the statistic(s) of a sample so that we can estimate the parameter(s) of the population ● Unless you perform a census of the whole population, there will always be some chance error, since the sample is only part of the population Recall: survey box model ● If we are surveying to find the percentage of a population with a certain characteristic, think of a box in which all tickets are 0 or 1 ● One ticket for each member of the population ● Usually draw without replacement Recall: drawing with replacement ● EV of percentage is box percentage ● SE of percentage is ● If n is large, distribution of percentage will be approx. normal  p 1− pn ×100 % Today ● Yesterday: going from populations to samples ● Today: going from samples to populations I Why everything we looked at yesterday doesn't work The problem ● We found EV and SE of the sample percentage, and hence its distribution, from calculations based on the box (i.e. population) ● But we don't know about the population! That's why we're doing the survey! The solution ● Since the sample should be representative of the population, let's pretend the sample is the box (for the purpose of estimating the standard error of the sample percentage) ● In general, this technique is called the bootstrap The equation ● In that case, we can still use the equation for the SE: except we let p be the sample proportion instead of the population proportion  p 1− pn × popsize−samplesizepopsize−1 ×100 % Example ● I survey a simple random sample of 1500 Berkeley students (size of population = 34953), and find 25% of the sample watch Hannah Montana ● What can I say about the percentage of all Berkeley students who watch Hannah Montana? The Central Limit Theorem, again ● We know that if the sample size is large and the population is much larger, the sample percentage has an approximately normal distribution ● So we can estimate how large the chance error is likely to be Example For the Hannah Montana example, we estimated the SE was 1.1%. By the 68-95-99.7 rule: ● There's about a 68% probability that the chance error is between -1.1% and +1.1% ● About 95% probability the chance error is between -2.2% and 2.2% Example From this, we make a confidence interval: 25% +/- 2.2% ● This is a 95% confidence interval for the population percentage ● The confidence level of the interval is 95% Example ● Population is large compared to sample, so estimated SE = sqrt(0.2*0.8/1600)*100% = 1% ● 95% confidence interval is 20% +/- 2*1%, that is, from 18% to 22% ● We're 95% confident this interval includes the population percentage Other confidence levels ● We can calculate confidence intervals at other levels e.g. to find a 90% confidence interval, find the z-values for the 5th and 95th percentiles of the normal: z = -1.65 and +1.65 CI is 20% +/- 1.65 * 1%, or 18.35% to 21.65% III Issues of interpretation What assumptions do our techniques require? ● Assume statistic is an unbiased estimate of the parameter (if known to be biased, need to adjust interval) ● Assume sample percentage is at least close to true percentage, otherwise SE will be wrong (Example: if sample percentage is 0, SE is 0) The two parts of a confidence interval ● The interval: gives you the range for a population parameter (could be percentage, count, average, sum, something else entirely) ● The confidence level gives you the approximate chance the interval includes the true value of the parameter The two parts of a confidence interval ● There's a chance your interval is wrong: for a 95% confidence interval, there's a 5% chance it won't include the population parameter ● 95% is often used because it balances interval width with the chance of being wrong* *more honestly it's used because z=2 is convenient IV Beyond the simple random sample SRS vs other sampling ● The formulae for standard error (and thus confidence intervals) we have seen only apply to simple random samples ● Different sampling methods will produce different standard errors SE for stratified sampling ● Find the standard error for each stratum sample ● Multiply each by the proportion of the population in its stratum ● Square these results, sum them, take the square root: this is the SE I'll give you a formula in the unlikely event I ask you to calculate this in a quiz or the final Example: stratified sampling ● SE for whole sample = sqrt[(1.3/2)2 + (1.5/2)2] = 0.995% ● This is very slightly less than the SE for a simple random sample (1%) ● We estimate the percentage of Americans who eat ice cream daily is 20%, give or take 0.995% In general ● Stratified random samples have slightly lower standard error than SRSs ● Cluster samples have slightly higher standard error than SRSs (and also require more care to avoid selection bias) Confidence intervals ● As long as you can assume normality, you can calculate the confidence interval from the SE the usual way (no matter how you found the SE) ● For the previous example: CI is 20% +/- 2*0.995% = 18.01% to 21.99% Recap: Estimating the SE ● If we don't know the true SE (because we don't know the true population percentage), we can estimate it by plugging the sample proportion into the standard error formula ● This is an example of a bootstrap estimate Recap: Confidence intervals ● A confidence interval spans from a few SEs below the estimate to a few SEs above the estimate, where “a few” is found from a normal table – for example, 2 SDs for a 95% CI ● Relies large sample and population sizes Recap: Chance and confidence ● When interpreting confidence intervals, remember in frequentist statistics, statistics are random but parameters are not ● This is why we talk about confidence rather than probability
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved