Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Conditions and Fitness of Non-Probability Sampling Strategies in Inference, Lecture notes of Statistics

StatisticsProbability and StatisticsSurvey ResearchData Collection MethodsSampling Methods

The use of non-probability sampling methods in statistical inference, focusing on the conditions under which they may be appropriate. The document also explores various online sampling methods and their limitations, as well as the importance of model-based approaches and estimation methods. The document also mentions the work of various researchers in this field and provides references to their work.

What you will learn

  • What are some of the key issues with non-probability sampling strategies?
  • What are the conditions under which non-probability sampling may be a reasonable approach?
  • How can model-based approaches and estimation methods improve non-probability sampling inference?

Typology: Lecture notes

2021/2022

Uploaded on 08/05/2022

aichlinn
aichlinn 🇮🇪

4.4

(45)

1.9K documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Conditions and Fitness of Non-Probability Sampling Strategies in Inference and more Lecture notes Statistics in PDF only on Docsity! Proceedings of Statistics Canada Symposium 2014 Beyond traditional survey taking: adapting to a changing world Explorations in Non-Probability Sampling Using the Web J. Michael Brick1 Abstract Although estimating finite populations characteristics from probability samples has been very successful for large samples, inferences from non-probability samples may also be possible. Non-probability samples have been criticized due to self-selection bias and the lack of methods for estimating the precision of the estimates. The wide spread access to the Web and the ability to do very inexpensive data collection on the Web has reinvigorated interest in this topic. We review of non-probability sampling strategies and summarize some of the key issues. We then propose conditions under which non-probability sampling may be a reasonable approach. We conclude with ideas for future research. Key Words: Inference, representativeness, self-selection bias 1. Introduction Probability sampling is generally accepted as the most appropriate method for making inference that can be generalized to a finite population. This method has a rich history and a solid theoretical foundation that has been proven to be effective in numerous empirical studies. With a probability sample, every unit in the population has a known, non-zero chance of being sampled, and in the design-based framework these probabilities are the basis for the inferences (Hansen, Hurwitz, and Madow, 1953; Särndal, Swensson, and Wretman, 1992; Lohr, 2009). Almost all official statistics use this methodology and many national statistical offices require probability sampling for making inferences. But probability sampling is not the only method for drawing samples and making inferences. In fact, during the 20th century the shift to probability sampling began well after the publication of the theoretical basis for probability sampling by Neyman (1934). Quota samples that only require samples meet target numbers of individuals with specific characteristics such as age and sex have been used for many years, especially in market research. Stephan and McCarthy (1958) review this method of non-probability sampling in election and other types of surveys in the middle of the 20th century in the U.S. The type of nonprobability sampling used in commercial and market research practice changed dramatically in the last twenty years as access to the Internet became more common in North America and many parts of Europe. Especially in the last decade, online surveys – with respondents drawn from “opt-in” panels – have become extremely popular. The vast majority of these surveys are not probability samples. The reason for their popularity is the low cost per completed interview, with costs much lower than even low-cost probability sample survey methods such as mail. Some of the attractiveness of probability samples has also been lost due to rising nonresponse (Brick and Williams, 2013) and concerns about the frame undercoverage. These issues raise concerns about the validity of inferences from a probability sample. Even staunch advocates of probability sampling have been forced to confront the issue of whether a probability sample with a low response or coverage rate retains the highly valued properties of a valid probability sample (Groves, 2006). The next section summarizes some important findings from a non-probability sampling task force commissioned by the American Association of Public Opinion Research (AAPOR). This serves as a prelude to some current methods and avenues for further research. 1 Westat and JPSM, 1600 Research Blvd. Rockville, MD USA 20850 2. Task Force Report The AAPOR Task Force was asked “to examine the conditions under which various survey designs that do not use probability samples might still be useful for making inferences to a larger population.” The task force report, completed in early 2013, can be downloaded from that organization’s web site (www.aapor.org). Baker et al. (2013) summarized the report; comments from five experts in the field and a rejoinder are published in the same issue of the journal. Rather than repeat the findings again, we have chosen a few critical ones (in quotes below) that have been the topic of several discussions subsequent to the publication of the report and its summary. “Unlike probability sampling, there is no single framework that adequately encompasses all of non-probability sampling.” The point of this statement is sometimes misunderstood. The intent is to highlight that talking about all non-probability methods together is of little value because the methods are so different. Issues and concerns about respondent driven sampling methods and opt-in Web panels are very different. Even within the generic term of opt- in Web panels the methods used to select respondents and produce estimates may be distinctive. “The most promising non-probability methods for surveys are those that are based on models that attempt to deal with challenges to inference in both the sampling and estimation stages.” This finding is more hypothesized than based on empirical results. In many ways it parallels the expectation that responsive design may lead to lower nonresponse bias in probability samples (Lundquist and Särndal, 2013). The rationale is that a more diverse set of respondents will reduce biases, given the equivalent weighting scheme. While this seems reasonable, it has not yet been consistently validated in either probability samples (with responsive design) or non-probability samples. “If non-probability samples are to gain wider acceptance among survey researchers there must be a more coherent framework and accompanying set of measures for evaluating their quality.” No one study or set of studies can prove that a data collection and estimation strategy will produce estimates that are reasonable for most uses. For example, the Literary Digest had correctly predicted the winner in every election from 1920 until its infamous error in predicting Landon as a landslide winner in 1936. Empirical results are important but there must be a set of principles that support the data collection and estimation process so that failures can be explained. Probability sampling has such a foundation, and the theory is why when probability sample estimates are not accurate the failures can be link to deviations such as nonresponse and the theory does not have to be discarded. “Non-probability samples may be appropriate for making statistical inferences, but the validity of the inferences rests on the appropriateness of the assumptions underlying the model and how deviations from those assumptions affect the specific estimates.” The members of the task force believed this finding would be the most controversial (Bethlehem and Cooben, 2013). While this was a contentious issue when the report was first released, we found many agreed with the position, including most of the experts in the discussion of the journal article. Another area of statistical research that is in much the same position as non-probability sampling is observational studies. Madigan et al. (2014) commented that “Threats to the validity of observational studies on the effects of interventions raise questions about the appropriate role of such studies in decision making. Nonetheless, scholarly journals in fields such as medicine, education, and the social sciences feature many such studies, often with limited exploration of these threats, and the lay press is rife with news stories based on these studies…the introspective and ad hoc nature of the design of these analyses appears to elude any meaningful objective assessment of their performance...” Despite these concerns about of validity observational studies, researchers in that area understand the critical importance of the role of these studies and are focused on assessing what can be done to improve the science. Our view is that the same sense of urgency to improve non-probability samples is needed, rather than simply disregarding all forms of non-probability sampling as unsound. There is evidence that work in inference from non-probability samples is continuing, although much of it is more empirical than theoretical. For example, Barratt, Ferris and Lenton (2014) use an online sample to estimate the size and characteristics of a rare subpopulation. Their evaluation method is similar to many previous studies; they compare the online sample estimates to those of a probability sample and find some important differences. Even though there are differences, they suggest the online sample can be useful when combined with a probability sample. types of studies. Coverage adjustment, estimated control totals, and nonignorable nonresponse adjustments might be estimation methods of practical importance. In addition to these survey methods, other areas offer ideas that should be considered. Causal inference has a rich tradition of dealing with selection bias and newer methods are continuing to be introduced and explored. Fields such as cognitive psychology and behavioral research have also expanded from when they were first introduced as a toolkit into survey research in the 1970s. Of course, information science has undergone a revolution and new areas, such as Big Data, and old ideas with new technologies, like administrative records, could provide new insights and need to be considered. While this is not a recipe for improving non-probability sampling inference, it does imply that research is possible and essential. Tools and methods exist that may help provide the framework for making inferences from non- probability samples, but without innovative research we will remain in the current muddle. References Baker, Reg, J. Michael Brick, Nancy A. Bates, Mike Battaglia, Mick P. Couper, Jill A. Dever, Krista J. Gile, and Roger Tourangeau. (2013), “Summary Report of the AAPOR Task Force on Non-probability Sampling”, Journal of Survey Statistics and Methodology 1, 90-143. Barratt, Monica J., Jason A. Ferris, and Simon Lenton. (2014), “Hidden Populations, Online Purposive Sampling, and External Validity Taking off the Blindfold”, Field Methods, 1525822X14526838. Berzofsky, Marcus, Rick Williams, and Paul Biemer. (2009), “Combining Probability and Non-Probability Sampling Methods: Model-Aided Sampling and the O*NET Data Collection Program”, Survey Practice, 2.6, Downloaded December 15, 2015 from www.surveypractice.org/index.php/SurveyPractice/article/view/184/html. Bethlehem, Jelke. (2008), “Can we make official statistics with self-selection web surveys?”, In Proceedings of Statistics Canada Symposium. Bethlehem, Jelke, and Fannie Cooben (2013), “Web Panels for Official Statistics?” Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong. Downloaded on May 1, 2014 from http://2013.isiproceedings.org/Files/IPS064-P1-S.pdf. Brick, J. M. (2014), “On Making Inferences from Non-Probability Samples”, Washington Statistical Society 2014 President’s Invited Seminar, Washington DC (March 26, 2014). Brick, J.M., and Douglas Williams. (2013), “Explaining rising nonresponse rates in cross-sectional surveys”, ANNALS of the American Academy of Political and Social Science, 645, 36-59. Callegaro, Mario, R. Baker, J. Bethlehem, A. Göritz, J. Krosnick, and P. Lavrakas, eds. 2014. Online Panel Research: A Data Quality Perspective. John Wiley & Sons. Cochran, William. (1977), Sampling techniques. New York, Wiley and Sons. Dever, Jill A., Ann Rafferty, and Richard Valliant. (2008), “Internet Surveys: Can Statistical Adjustments Eliminate Coverage Bias?”, Survey Research Methods, 2, 47–62. Groves, Robert M. (2006), “Nonresponse rates and nonresponse bias in household surveys”, Public Opinion Quarterly, 70, 646-675. Hansen, Morris H. (1987), “Some History and Reminiscences on Survey Sampling”, Statistical Science, 2, 2, 180- 190. Hansen, Morris H., William N. Hurwitz, and William G. Madow. (1953), Sampling survey methods and theory. Vol I, John Wiley and Son Inc., New York. Hansen, Morris H., William G. Madow, and Benjamin J. Tepping. (1983), “An evaluation of model-dependent and probability-sampling inferences in sample surveys”, Journal of the American Statistical Association 78, 384, 776-793. Knaub J. (2007), “Cutoff Sampling and Inference”, InterStat, April. Lohr, Sharon. (2009), Sampling: design and analysis. Cengage Learning. Lundquist, Peter, and Carl-Erik Särndal. (2013), “Aspects of responsive design with applications to the Swedish Living Conditions Survey”, Journal of Official Statistics 29, 557-582. Madigan, David, P. Stang, J. Berlin, M. SchuemieJ, M. Overhage, M. Suchar, B. Dumouchel, A.Hartzema, P. Ryan (2014), “A Systematic Statistical Approach to Evaluating Evidence from Observational Studies”, Annual Review of Statistics and Its Application, 1, 11 -39 Messer, Benjamin L., and Don A. Dillman. (2011), “Surveying the general public over the Internet using address- based sampling and mail contact procedures”, Public Opinion Quarterly 75, 429-457. Neyman, Jerzy. (1934), “On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection”, Journal of the Royal Statistical Society, 97, 558-625. Rivers, Douglas, and Delia Bailey. (2009), “Inference from Matched Samples in the 2008 U.S. National Elections”, Paper presented at the 64th Annual Conference of the American Association for Public Opinion Research, Hollywood, Florida, May. Särndal, Carl-Erik, Bengt Swensson, and Jan Wretman. (1992), Model assisted survey sampling. Springer. Squire, Peverill. (1988.), “Why the 1936 Literary Digest Poll Failed?”, Public Opinion Quarterly 52, 125–33. Stephan, Fredrick. F., and Philip J. McCarthy. (1958), Sampling opinions: An analysis of survey procedure, John Wiley and Son Inc., New York. Wang, Wei, David Rothschild, Sharad Goel, and Andrew Gelman. (2014), “Forecasting Elections with Non- Representative Polls”, International Journal of Forecasting), doi:10.1016/j.ijforecast.2014.06.001.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved