Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Sampling Distribution: Understanding Random Samples and Statistic Calculation - Prof. Edit, Lab Reports of Statistics

The concept of sampling distribution, where a random sample is drawn from a population to estimate population statistics such as mean age and proportion of smokers. Why random sampling is used, the statistics to calculate, and the concept of sampling variability. It also discusses the sampling distribution of the sample mean and proportion, and how they can be used to solve problems.

Typology: Lab Reports

2009/2010

Uploaded on 02/25/2010

koofers-user-80c
koofers-user-80c 🇺🇸

10 documents

1 / 4

Toggle sidebar

Related documents


Partial preview of the text

Download Sampling Distribution: Understanding Random Samples and Statistic Calculation - Prof. Edit and more Lab Reports Statistics in PDF only on Docsity! Math 1530 –Lab- Introducing the idea of Sampling distribution (Chapter 18) Drawing a random sample IS a random experiment Imagine you have a population of individuals and you will select a random sample of size n to ask them a few questions, for example their age and if they are or have been smokers in some point of their life. Before drawing the sample we know n of them are going to be in the sample but we don’t know exactly WHO is going to be in the sample. 1. Why we select a random sample? Population parameters. We select a random sample when we want to know something about the population but we don’t have time or money to ask everybody in the population. The things we want to know about the population, in this case: ‘mean age in the population ’ and ‘proportion of smokers in the population’ 2. What statistics to calculate from the sample? Assume that you will take a sample of n individuals, ask them the questions: ‘What is your age( in years)?’ and ‘Have you smoked more than 100 cigarettes in your life?’ (the official definition of ‘being an smoker” ? and you want to summarize the data in the sample. What type of variable is age? Quantitative or Categorical ? _____________________ What type of variable is ‘being an smoker’ ? Quantitative or Categorical ? ___________________ Considering the type of variable which statistic do you consider appropriate to summarize the information of the sample ? For age ________________________________ For smokers ____________________________ 3. Taking samples and calculating statistics As you can imagine the mean age in the sample and the proportion of smokers in the sample depends on who is in the sample. Just as for simplicity lets assume that we have a population of 50 individuals and that you will select a sample of 5 individuals. In real life we only know the answers to the questions for those individuals in the sample, but here just as an exercise you see below the age and smoking status of the 50 individuals in the population. This population is in the file agesmoke.mtw available in our web page. ID Age Smoker 1 34 NO 2 39 YES 3 37 NO 4 46 NO 5 31 NO 6 32 NO 7 36 YES 8 51 NO 9 93 YES 10 66 YES 11 50 YES 12 32 NO 13 31 YES ID Age Smoker 14 43 YES 15 24 NO 16 25 YES 17 43 NO 18 29 NO 19 31 NO 20 58 YES 21 76 YES 22 65 YES 23 39 YES 24 38 NO 25 37 YES 26 27 NO ID Age Smoker 27 38 YES 28 69 YES 29 68 NO 30 21 NO 31 82 NO 32 32 YES 33 23 NO 34 51 NO 35 45 NO 36 26 NO 37 35 NO 38 26 NO 39 35 NO ID Age Smoker 40 24 YES 41 25 YES 42 47 NO 43 45 NO 44 42 YES 45 81 NO 46 43 NO 47 39 NO 48 34 YES 49 71 NO 50 31 NO Using the random digit table or Minitab select two different samples of size 5, report the observations and the value of the statistics for each sample Sample 1 Person 1 Person 2 Person 3 Person 4 Person 5 Value of the statistic ID Age Mean= Smoker? Proportion= Sample 2 Person 1 Person 2 Person 3 Person 4 Person 5 Value of the statistic ID Age Mean= Smoker? Proportion= Notice something interesting for categorical variables with two possible answers (‘success’ or ‘failure’). In this example the variable Smoker has two categories : YES and NO. In the samples above replace Yes by 1 and No by 0. Call that new variable Y Counting the number of ‘yes’ is equivalent to adding the 1s and 0s corresponding to the answers. For example if the answers to the question ‘Have you smoked more than 100 cigarettes in your life?’ are : YES , NO , YES, NO, NO ; the values of Y would be 1,0,1,0,0 5 2# ˆ  n successes p n y y n i i  1  5 2 5 10101   The sample proportion can be understood also as the sample mean of a variable that only takes values 1 and 0 (for success and failure, respectively) Below you see the distribution of age for the population. The population mean 42.92 is marked with an arrow. Mark (in the X axis) the values of the sample means for the two samples you got. How far were the means in the samples from the population mean? 958575655545352515 15 10 5 0 Age Fr eq ue nc y Age (in years) of 50 individuals Pop mean We know that a proportion only can take values between 0 and 1. Below, in a line that goes from 0 to 1 we have marked the proportion of smokers in this small population (40% of the 50 individuals are or have been smokers). In the same graph, mark the proportion of smokers in the two samples you obtained. 0 1 0.4 4. Sampling Variability In the samples you selected in the previous section, be aware of two things: 1) The value of the statistic is not necessarily equal to the value of the parameter we want to estimate (actually we would be VERY LUCKY if this happened), specially when the sample size is as small as the sample size we are working with (n=5) 2) The values of the statistics were different for the two samples. Compare your values with the values obtained by the other students. That IS SAMPLING VARIABILITY : THE VALUES OF THE STATISTICS DIFFER FROM SAMPLE TO SAMPLE. The statistics, such as sample mean or sample proportion, are RANDOM VARIABLES because we don’t know exactly what value they will take until we select the sample. 5. Sampling Distribution of the sample mean and sample proportion As for other random variables we are interested in the probability distribution of the statistics (sample mean or sample proportion), that distribution is called SAMPLING DISTRIBUTION. i.e. we want to know what values the sample mean or the sample proportion (of samples of size 5) can take and with what probability Now instead of taking 2 samples of size 5 we will take 1000 samples of size 5, to do it by hand would be too time consuming but we can use the computer. Next you will see the results for 1000 random samples of size 5 taken from the population of 50 individuals. In the appendix you can see how these samples were generated with the computer and you can generate your own samples if you wish.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved