Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Survey Design and Weighting: Representative Sampling and Statistical Analysis, Study notes of Economics

The design considerations and implications of conducting surveys, including issues of representativeness, clustering, oversampling, and unit non-response. It also covers sampling plans such as simple random sampling, stratified random sampling, and multi-stage sampling, as well as the use of weights in statistical analyses. The document also mentions poststratification and adjusting for clustering.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-182
koofers-user-182 🇺🇸

4

(1)

10 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download Survey Design and Weighting: Representative Sampling and Statistical Analysis and more Study notes Economics in PDF only on Docsity! 1 Survey Design and Weighting (material drawn from Korn & Graubard 1999) A. Some basic design considerations and their implications 1. Many considerations when conducting a survey a. want the final results to be representative of an analysis population b. may want the survey to include specific subpopulations i. racial/ethnic diversity ii. economic diversity iii. age diversity iv. programmatic diversity c. want to keep costs down by surveying many people in a limited number of areas d. may have other design issues, e.g., AddHealth samples within schools and also samples peer networks e. possible differential unit-non-response 2. These design issues affect subsequent statistical analyses a. affect the representativeness of the observations b. also affect the independence of the observations 3. Consider some general issues a. clustering, selecting subjects from a few areas, may lead to spatial correlation in the data i. although this might not lead to biased estimates of population parameters, it could affect calculations of standard errors ii. essentially, there is less variation than a sample taken completely at random 2 b. oversampling of particular populations, which is done to ensure adequate representation of those individual populations, leads to a sample that doesn’t represent the general population c. different rates of unit non-response can also lead to a loss in representativeness 4. In analyses, we address these issues by a. including survey weights b. including design variables B. Sampling plans 1. Simple random sampling a. consider a given population of size N b. choose a subset of n individuals where each possible subset is equally likely to be sampled c. individuals are chosen without replacement (we won’t refer to this distinction subsequently) d. the ratio n/N is called the sampling rate or inclusion probability e. most sample estimators, such as the usual mean and variance estimators, assume this type of sampling 2. Stratified simple random sampling a. population is first divided into mutually exclusive and exhaustive strata b. simple random sampling is then carried out within each strata c. sampling rates may (will likely) vary across strata i. for example, if the populations of the strata differ but the sample sizes don’t, the sampling rates will 5 unequally, or can be adjusted to reflect the larger population, it makes sense to use weights to reduce biases 2. Weights are usually the product of a. inverse sampling probabilities, sometimes referred to as the base weight (note: these sampling probabilities can themselves be products of probabilities if multi- stage sampling is used), b. inverse response probabilities, sometimes referred to as non-response adjustments, and c. poststratification adjustments 3. For most cross-sectional statistics, calculations for weighted estimates are straightforward a. let Xi be a variable for subject i and let Wi be the associated weight b. the weighted mean is ∑∑ = ii i w XWW X 1 c. let xi and yi, be deviations of variables Xi and Yi from their weighted means; the slope coefficient from a weighted regression of Yi on Xi is ∑ ∑=β 2 ii iii w xW yxW 4. In SAS, most statistical procedures include a WEIGHT statement a. syntax WEIGHT <weight_variable>; 6 b. the weight_variable would be the SAS variable containing the weights 5. If the sampling design indicates that weights should be included, you should include them, right? E. An alternative to weighting 1. An alternative to weighting is to model the survey design in your statistical procedure 2. In a multivariate model, this is accomplished by including measures of the characteristics that enter the weighting procedure as additional explanatory variables in an unweighted model a. coefficients on these variables will confound genuine effects and survey design effects b. however, coefficients on the other variables should be purged of the influence of design effects c. this assumes that you have modeled the design correctly d. it also assumes that the design variables don’t introduce other problems 3. Weights are often based on characteristics that we would include in models anyway, such as age, race/ethnicity, age, and socioeconomic status 4. Suggests that weights might not be especially useful in multivariate analyses 5. Other considerations might lead us to drop weights a. dropping incomplete cases (cases with item non- response) from a weighted sample changes the 7 response pattern in that sample i. formally, the sample should be reweighted to reflect the new sample ii. however, this is rarely done iii. result is incorrect weights that might not reduce bias b. in some cases, weights can increase, rather than reduce, the variance of estimators; we might consider bias vs. efficiency (MSE) trade-offs F. Adjusting for clustering 1. If a clustered survey design is used, observations may not be independent within clusters a. this can lead to incorrect (usually downward biased) standard errors b. it also means that the estimation procedure is inefficient c. in the simplest cases, the problem is similar to that from a random effects specification d. issues become more complicated if weights and other design issues enter 2. Two types of corrections are possible a. just fix the standard errors—this leads to correct standard errors but does not address the efficiency concerns b. estimate a FGLS specification i. addresses standard errors and efficiency ii. however, it requires you to take a stand regarding the precise source of spatial correlation c. most researchers simply choose to address the first
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved