Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Populations, Samples, and Parameters in Statistics: A Do-It-Yourself Guide, Study notes of Psychology

A chapter from a statistics textbook that explains the concepts of populations, samples, parameters, and sampling distributions. It uses the analogy of images to illustrate how small samples can be misleading and how larger samples provide more accurate information. The text also discusses the importance of estimating parameters and computing confidence intervals based on sampling distributions.

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-mio
koofers-user-mio 🇺🇸

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Populations, Samples, and Parameters in Statistics: A Do-It-Yourself Guide and more Study notes Psychology in PDF only on Docsity! PSY 394U – Do-It-Yourself Statistics Chapter 2 Some Preliminaries Populations, samples, parameters, and sampling distributions In science, we are interested in collecting data that will tell us about important features of the world, such as whether two groups of measurements are different or not (e.g. the speed of light measured in one direction vs. another), or whether one variable influences another or not (e.g. the influence of axon diameter on conduction velocity). In order to truly know the exact state of some aspect of the world, we would have to make every possible measurement, and this collection of measurements is known as a population. Rarely, it is possible to measure a population, such as “the average height of seniors at Austin High School.” Much more often, however, is either theoretically possible but impractical (e.g. the average height of adults alive today living in Austin, Texas) or impossible (e.g. the average height of adult humans). Thus, in most experiments, we collect a sample of data and, if the experiment is done well, the properties of the sample will embody all of the important properties of the population that we wish to examine. The relationship between a population and random samples is fairly easy to appreciate, and it depends most critically on the sample size. Consider the following digital image. This full resolution image can be thought of as a “population” of pixels. By looking at the entire image – the whole population of pixels – it is easy to determine things about the image that are over and above the pixel values per se. For example, it is easy to see that Mickey is smiling or that there are two buttons on the front of his overalls. Now consider the following four images, which consist of random samples of 40%, 10%, 5%, and 2% of the pixels (the remaining unsampled pixels have been set to gray). Notice that, as the sample size decreases, the picture becomes increasingly ragged and, importantly, it becomes progressively more difficult to make judgments about the picture. For example, our judgment about the number of buttons would probably change to “one” and “zero” for the 5% and 2% cases, respectively. Obviously, these answers would be PSY 394U – Do-It-Yourself Statistics wrong, and they would be wrong because sample sizes were too small to allow us to correctly make the relevant decision. The relationship between samples of experimental data and populations of interest is directly analogous to the relationship between the original Mickey image and the sampled versions. Notice that there is information even in the rightmost sampled image; the pixels are black where the ears are, yellow where the shoes are, etc. In fact, if some people might be able to guess that this was a picture of Mickey Mouse if they were told beforehand that it was a famous cartoon character. In other words, the rightmost sampled image does represent the original image to some extent; it just does not allow us to make decisions about the original image with the same accuracy as does an image containing a greater number of samples (such as the leftmost image). In exactly the same way, small samples of experimental data do not allow us to make judgments about aspects of the population with the same accuracy as do larger samples. In science, we are generally interested in aspects of the population distribution such as the mean, the variance, the median, etc. These are called parameters of the distribution because, like the number of buttons Mickey has, they are not directly obtainable from the actual measurements, they must be computed or inferred from these measurements. The word parameter, in fact, means beyond (para) measurement (meter). In a scientific experiment, we collect a sample of data from a population, and then estimate some parameter of that population (the mean, for example) by computing the value of that parameter from our sample. There is an additional step, however. Once we have estimated some parameter from our data, it would be extremely informative to know how confident we are in that estimate. Consider Mickey again. If you determine the number of buttons by looking at the 40% image, your answer will be “two.” If I do the same on the 5% image, my answer will be “one.” Clearly, your answer is better than mine is some respect, and it would be extremely valuable to quantify this in some way, that is, to compute not only the value of our parameter (the number of buttons), but also to compute how confident we are in our estimate given our sample size and other factors. This computation – the computation of how certain we are about our parameter estimates – is the key benefit that a statistical analysis yields. Simply put, we wish not only to compute the estimate of our parameter of interest, we also wish to compute the distribution of that parameter if we were to 40% 10% 5% 2% PSY 394U – Do-It-Yourself Statistics Figure 2.1 Histogram of Reaction times for two age groups. So why are these two groups actually different in terms of statistical significance when they are obviously so similar? The answer is that tests of significance are usually concerned with quantities (parameters) like the mean and, since the sample size was so high, 10,000 measurements per group, we know the mean with a very, very high confidence or, equivalently, we have a very small margin of error. Which brings us to what “statistical significance” and the associated probability actually mean. When you read that “… and this is significant at the 0.05 level” what the author is actually saying is “If there were actually no difference between the means of the two populations, then there is less than a 5% chance that I would observe a difference between the means of the two groups at least as large as the one I obtained.” The first part of that sentence is a statement of the null hypothesis and is, unfortunately, often substituted for a prediction-generating theory. In terms of the calculations, it is taken literally; under the null hypothesis, the two group means are identical to any decimal place. A moment’s thought should convince you that the probability of this actually being true in the real world is (usually) vanishingly small, and it is thus no surprise that, if you collect enough data, you will eventually reach statistical significance – there is almost always some difference between two groups out in the real world. The real question is whether the difference is important. So why do statistics at all? Briefly, the reason is that if we do see a pattern of data that we consider to be important, statistics can tell us how likely it is that the pattern actually arose from the vagaries of chance, or from one set of assumptions about the world vs. another. I mentioned above that we are very PSY 394U – Do-It-Yourself Statistics good at detecting patterns. Sometimes we are too good, and statistics can help us avoid being fooled by random fluctuations. Consider a simple situation in which we are looking for a difference between two means. Once we do a statistical test (a “null hypothesis significant test” abbreviated NHST), we will find ourselves in one four possible situations, illustrated in the following table. Is there an important effect actually present? Possible States of the World and an Experimental Outcome. Yes No Yes Enough data (and, hopefully) careful thought Too much data and lack of thought, or bad luck, or both Was a significant effect found in an NHST analysis? No Not enough data (due to lack of thought) or bad luck (the data collected should have been sufficient for detection of the difference, but were not by chance) Enough data and careful thought, or not enough data and lack of thought One of the keys to being a good experimental scientist is to keep yourself in the yes/yes and no/no categories for the right reasons. The ways you do this are to carefully distinguish between a “statistically significant” effect and an important one, and to make sure that the sampling distributions that you derive from you data grant you the confidence to say whether or not an important difference between your experimental groups exists. This brings us to the next chapter, in which we will more fully explore the concept of sampling distributions.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved