Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Book Summary: Econometrics for Dummies, Exercises of Statistics

D Choose a Forecasting Method in Econometrics. 28. E Econometrics for Dummies Cheat Sheet. 28. E.1 The CLRM assumptions .

Typology: Exercises

2021/2022

Uploaded on 08/05/2022

jacqueline_nel
jacqueline_nel 🇧🇪

4.4

(229)

506 documents

Partial preview of the text

Download Book Summary: Econometrics for Dummies and more Exercises Statistics in PDF only on Docsity! Book Summary: Econometrics for Dummies Yan Zeng Version 1.0.3, last revised on 2016-08-04. Abstract Summary of Pedace [3] and [4]. Contents I Getting Started with Econometrics 3 1 Econometrics: The Economist’s Approach to Statistical Analysis 3 2 Getting the Hang of Probability 3 3 Making Inferences and Testing Hypotheses 3 II Building the Classical Linear Regression Model 4 4 Understanding the Objectives of Regression Analysis 4 5 Going Beyond Ordinary with the Ordinary Least Squares Technique 5 6 Assumptions of OLS Estimation and the Gauss-Markov Theorem 6 7 The Normality Assumption and Inference with OLS 7 III Working with the Classical Regression Model 9 8 Functional Form, Specification, and Structural Stability 10 9 Regression with Dummy Explanatory Variables 11 IV Violations of Classical Regression Model Assumptions 12 10 Multicollinearity 12 11 Heteroskedasticity 14 12 Autocorrelation 16 V Discrete and Restricted Dependent Variables in Econometrics 19 1 13 Qualitative Dependent Variables 19 14 Limited Dependent Variable Models 20 VI Extending the Basic Econometric Model 22 15 Static and Dynamic Models 22 16 Diving into Pooled Cross-Section Analysis 24 17 Panel Econometrics 24 VII The Part of Tens 26 18 Ten Components of a Good Econometrics Research Project 26 19 Ten Common Mistakes in Applied Econometrics 27 VIII Appendices 27 A Specifying Your Econometrics Regression Model 27 B Choosing the Functional Form of Your Regression Model 28 C Working with Special Dependent Variables in Econometrics 28 D Choose a Forecasting Method in Econometrics 28 E Econometrics for Dummies Cheat Sheet 28 E.1 The CLRM assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 E.2 Useful formulas in econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 E.3 Common functional forms for regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 E.4 Typical problems estimating econometric models . . . . . . . . . . . . . . . . . . . . . . . . . 30 2 5 Going Beyond Ordinary with the Ordinary Least Squares Tech- nique Regression coefficients in a model with one independent variable: β̂1 = ∑n i=1(Yi − Y )(Xi −X)∑n i=1(Xi −X)2 = ŝ2XY ŝ2X , β̂0 = Y − β̂1X. Intercept term is usually ignored in applied work, because situations where all of the explanatory variables equal zero are unlikely to occur. Justifying the least squares principle. In most situations, OLS remains the most popular technique for estimating regressions for the following three reasons: • Using OLS is easier than the alternatives. Other techniques require more mathematical sophistication and more computing power. • OLS is sensible. You can avoid positive and negative residuals canceling each other out and find a regression line that’s as close as possible to the observed data points. • OLS results have desirable characteristics. XThe regression line always passes through the sample means of Y and X, or Y = β̂0 + β̂1X (the point (X,Y ) falls on the line y = β̂0 + β̂1x): by the definition of β̂0 and β̂1. XThe mean of the estimated (predicated) Y value is equal to the mean value of the actual Y , or Ŷ = β̂0 + β̂1X = β̂0 + β̂1X = Y . XThe mean of the residuals is zero, or ε̂ = Y − (β̂0 + β̂1X) = Y − (β̂0 + β̂1X) = 0. XThe residuals are uncorrelated with the predicted Y , or ∑n i=1(Ŷi − Y )ε̂i = 0. XThe residuals are uncorrelated with observed values of the independent variable, or ∑n i=1 ε̂iXi = 0. Standardizing regression coefficients. Comparing coefficient values is not as straightforward as you may first think. Here are a few reasons why: • In standard OLS regression, the coefficient with the largest magnitude is not necessarily associated with “the most important” variable. • Coefficient magnitudes can be affected by changing the units of measurement; in other words, scale matters. • Even variables measured on similar scales can have different amounts of variability. If you want to compare coefficient magnitudes in a multiple regression, you need to calculate the stan- dardized regression coefficients. You can do so in two ways: • Calculating a Z-score for every variable of every observation and then performing OLS with the Z values rather than the raw data. • Obtaining the OLS regression coefficients using the raw data and then multiplying each coefficient by( σ̂Xk σ̂Y ) . Mathematically, you transform the original regression equation Yi = β0+β1Xi1+β2Xi2+ · · ·+βpXip+εi to Yi − Y σ̂Y = β1 ( Xi1 −X1 σ̂X1 )( σ̂X1 σ̂Y ) + β2 ( Xi2 −X2 σ̂X2 )( σ̂X2 σ̂Y ) + · · ·+ βp ( Xip −Xp σ̂Xp )( σ̂Xp σ̂Y ) + ε̂i σ̂Y . where we have taken advantage of one of the desirable OLS properties, namely that the average residual is zero. Note regular OLS coefficients and standardized regression coefficients do not have the same meaning. The standardized regression coefficient estimates the standard deviation change in your dependent variable for a 1-standard-deviation change in the independent variable, holding other variables constant. Measuring goodness of fit. • Explained sum of squares (ESS), residual sum of squares (RSS), and total sum of squares (TSS): ESS = n∑ i=1 (Ŷi − Y )2, RSS = n∑ i=1 (Yi − Ŷi) 2 = n∑ i=1 ε̂2i , TSS = n∑ i=1 (Yi − Y )2 = ESS +RSS. 5 • Coefficient of determination (R-squared) and adjusted R-squared (adjusted by degrees of freedom): R2 = ESS TSS = 1− RSS TSS , R2 adj = 1− RSS n−p−1 TSS n−1 . where n is the number of observations, and p is the number of independent variables in the model. If you increase the number of explanatory variables in a regression model, your R-squared value increases or remains the same, but it can never cause your R-squared value to decrease. When you add more variables, you lose degree of freedom (the number of observations above and beyond the number of estimated coeffi- cients). Fewer degrees of freedom make your estimates less reliable (for more on this topic, turn to Chapter 6). In order to compare two models on the basis of R-squared (adjusted or not), the dependent variable and sample size must be the same. Here are a few reasons why you shouldn’t use R-squared (adjusted or not) as the only measure of your regression’s quality: • A regression may have a high R-squared but have no meaningful interpretation because the model equation is not supported by economic theory or common sense. • Using a small data set or one that includes inaccuracies can lead to a high R-squared value but deceptive results. • Obsessing over R-squared may cause you to overlook important econometric problems. In economic settings, a high R-squared (close to 1) is more likely to indicate that something is wrong with the regression instead of showing that it’s of high quality. High R-squared values may be associated with regressions that violate assumptions and/or have nonsensical results (coefficients with the wrong sign, unbelievable magnitudes, and so on.). When evaluating regression quality, give these outcomes more weight than the R-squared. 6 Assumptions of OLS Estimation and the Gauss-Markov Theo- rem The OLS/CLRM assumptions and their intuition. • The model is linear in parameters and has an additive error term. Other techniques, such as maximum likelihood (ML) estimation, can be used when the function you need to estimate is not linear in parameters. • The value for the independent variables are derived from a random sample of the population and contain variability. Strictly speaking, the CLRM assumes that the values of the independent variables are fixed in repeated random samples. The more common version of the assumption is that the values of the independent variable are random from sample to sample but independent of the error term. The weaker version is equivalent asymptotically (with large samples). This assumption isn’t likely to hold when you use lagged values of your dependent variable as an indepen- dent variable (autoregression) or when the value of your dependent variable simultaneously affects the value of one (or more) of your independent variables (simultaneous equations). Therefore, OLS is inappropriate in these situations. In practice, for each random sample Xi we often observe Y only once. So we either assume a simple parametric model, e.g. linear regression, or use points in a neighborhood of Xi for averaging, e.g. K-nearest neighbor regression (KNN regression). See James et al. [2, page 104] for details. • No independent variable is a perfect linear function of any other independent variable(s) (no perfect collinearity). If you have perfect collinearity, the software program you use to calculate regression results cannot estimate the regression coefficients, since perfect collinearity causes you to lose linear independence and the computer can’t identify the unique effect of each variable. In applied cases, high collinearity is much more common than perfect collinearity. • The model is correctly specified and the error term has a zero conditional mean. 6 E[ε|X = x] = 0 means for given x, the residuals ε(x) = y − (β0 + β1x) oscillate around 0 with average equal to 0. Graphically, this means the values of the dependent variable oscillate around the regression line with averages falling on the regression line. This assumption may fail if you have misspecification (you fail to include a relevant independent variable or you use an incorrect functional form) or a restricted dependent variable (namely, a qualitative or limited dependent variable). • The error term has a constant variance (no heteroskedasticity). Graphically, this means the “scatteredness” of the values of the independent variable around the regression line is approximately the same everywhere. Heteroskedasticity is a common problem for OLS regression estimation, espcially with cross-sectional and panel data. • The values of the error term aren’t correlated with each other (no autocorrelation or no serial correlation). Graphically, no autocorrelation means the scatter plot of (εi−k, εi) ∞ i=k+1 spreads out homogeneously in all directions, for any k ≥ 1. Autocorrelation can be quite common when you are estimating models with time-series data, because when observations are collected over time, they are unlikely to be independent from one another. The Gauss-Markov Theorem. This theorem states that the ordinary least squares (OLS) estimators are the best linear unbiased estimators (BLUE) given the assumptions of the CLRM. • Linearity of OLS (as a function of the observed Y values): β̂1 = n∑ i=1 ci(Yi − Y ), β̂0 = Y − [ n∑ i=1 ci(Yi − Y ) ] X, where ci = Xi−X∑n i=1(Xi−X)2 , i = 1, · · · , n. • Unbiasedness: E[β̂1] = β1, E[β̂0] = β0. • Best means achieving the smallest possible variance among all similar estimators. Var(β̂1) = σ2 ε∑n i=1(Xi −X)2 . When judging how good or bad an estimator is, econometricians usually evaluate the amount of bias and variance of that estimator. The BLUE property of OLS estimators is viewed as the gold standard. Econometricians have devised methods to deal with failures of the CLRM assumptions, but they aren’t always successful in proving that the alternative method produces a BLUE. In those cases, they usually settle for an asymptotic property known as consistency. Estimators are consistent if, as the sample size approaches infinity, the variance of the estimator gets smaller and the value of the estimator approaches the true population parameter value. Also refer to Table 6-1: Summary of Gauss-Markov Assumptions [3], page 19. 7 The Normality Assumption and Inference with OLS The normality assumption. The normality assumption in econometrics states that, for any given X value, the error term follows a normal distribution with a zero mean and constant variance: ε|X ∼ N(0, σ2 ε). The normality assumption isn’t required for performing OLS estimation. It’s necessary only when you want to produce confidence intervals and/or perform hypothesis tests with your OLS estimates. In some applications, the assumption of a normal distribution for the error term may be difficult to justify. These situations typically involve a dependent variable Y that has limited or highly skewed values. Econometricians have shown that with large sample sizes, normality is not a major issue because the OLS estimators are approximately normal even if the errors are not normal. The sampling distribution of OLS coefficients. All OLS coefficients are a linear function of the error term. If you assume that the error term has a normal distribution, you’re also assuming that the 7 Part III Working with the Classical Regression Model 8 Functional Form, Specification, and Structural Stability Functional Form. • Dimension/unit/scale. Change in absolute amount or in percentage? XLog-log model (elasticity, i.e. the estimated percentage change in the dependent variable for a percentage change in the independent variable). XLog-linear model (the estimated percentage change in the dependent variable for a unit change in the independent variable). XLinear-log model (the estimated unit change in the dependent variable for a percentage change in the independent variable). • Graph of the dependent-independent variable chart. XQuadratic function (best for finding minimums and maximums). XCubic function (good for inflexion). XInverse function (limiting the value of the dependent variable). XLinear-log model (the impact of the independent variable on the dependent variable decreases as the value of the independent variable increases). Misspecification. • Omitting relevant variables. You have an omitted variable bias if an excluded variable has some effect on your dependent variable and it’s correlated with at least one of your independent variables. The intuition is best illustrated by projection in Hilbert space. • Including irrelevant variable. The estimated coefficients remain unbiased but the standard errors are increased–the estimated standard error for any given regression coefficient is given by σ̂β̂k = √ σ̂2 ε∑n i=1(Xi −X)2(1−R2 k) where R2 k is the R-squared from the regression of Xk on the other independent variables. Including irrelevant variables does not change σ̂2 ε while increasing R2 k. Just because your estimated coefficient isn’t statistically significant doesn’t make it irrelevant. A well- specified model usually includes some variables that are statistically significant and some that aren’t. Addi- tionally, variables that aren’t statistically significant can contribute enough explained variation to have no detrimental impact on the standard errors. Structural Stability. • Perform a RESET to test the severity of specification issues. Ramsey’s regression specification error test (RESET) is conducted by adding a quartic function of the fitted values of the dependent variable (Ŷ 2 i , Ŷ 3 i , and Ŷ 4 i ) to the original regression and then testing the joint significance of the coefficients for the added variables. The logic of using a quartic of your fitted values is that they serve as proxies for variables that may have been omitted. Because the proxies are essentially nonlinear functions of your Xs, RESET is also testing misspecification from functional form. 1. Estimate the model you want to test for specification error. E.g. Yi = β0 + β1Xi1 + · · ·+ εi. 2. Obtain the fitted values after estimating your model and estimate Yi = β0 + β1Xi1 + · · ·+ αŶ 2 i + γŶ 3 i + δŶ 4 i + εi. 3. Test the joint significance of the coefficients on the fitted values of Yi terms (α, γ, and δ) using an F -statistic. A RESET allows you to identify whether misspecification is a serious problem with your model, but it doesn’t allow you to determine the source. 10 • Use the Chow test to determine structural stability. Sometimes specification issues arise because the parameters of the model either aren’t stable or they change. We can conduct a Chow test for structural stability between any two groups (A and B) in just three steps: 1. Estimate your model combining all data and obtain the residual sum of squares (RSSr) with degrees of freedom n− p− 1. 2. Estimate your model separately for each group and obtain the residual sum of squares for group A, RSSur,A, with degrees of freedom nA− p− 1 and the residual sum of squares for group B, RSSur,B , with degrees of freedom nB − p− 1. 3. Compute the F -statistic by using this formula: F = RSSr−(RSSur,A+RSSur,B) p+1 RSSur,A+RSSur,B n−2p−2 . The null hypothesis for the Chow test is structural stability. The larger the F -statistic, the more evidence you have against structural stability and the more likely the coefficients are to vary from group to group. Note the result of the F -statistic for the Chow test assumes homoskedasticity. A large F -statistic only informs you that the parameters vary between the groups, but it doesn’t tell you which specific parameter(s) is (are) the source(s) of the structural break. • Robustness/sensitivity analysis. If the coefficients of your core variables aren’t sensitive (maintain the same sign with similar magnitudes and levels of significance), then they are considered robust. Some variables, despite not being of primary interest (that is, despite not being core), are likely to be essential control variables that would be included in any analysis of your outcome of interest (you should rely on economic theory and your common sense here). 9 Regression with Dummy Explanatory Variables Interpretation. • The coefficient for your dummy variables(s) in a regression containing a quantitative variable shifts the regression function up or down. The same holds true when there’s more than one dummy variable. • The inclusion of an interaction term in your econometric model allows the regression function to have a different intercept and slope for each group identified by your dummy variables. The coefficient for your dummy variable(s) in a regression shifts the intercept, and the coefficient for your interaction term changes the slope (which is the impact of your quantitative variable). • The inclusion of interacted dummy variables in your econometric model allows the regression function to have different intercepts for each combination of qualitative attributes. The coefficients for your dummy variables and their interaction shift the intercept by the estimated magnitude. Testing for significance. • Testing the joint significance of a group of dummy variables in a gression model is accomplished by generalizing the F -test of overall significance to F = RSSr−RSSur q RSSur n−p−1 = ESSur−ESSr q RSSur n−p−1 ∼ Fq,n−p−1 where RSSr is the residual sum of squares for the restricted model (the model excluding the dummy vari- ables), RSSur is the residual sum of squares for the unrestricted model (the model including the dummy variables), n is the number of sample measurements, p is the number of independent variables in the un- restricted model, and q is the number of dummy variables added in your unrestrictd model that are not contained in your restricted model. • Using a dummy variable and interaction terms, a test of joint significance can be equivalent to per- forming a Chow test. 1. Create a dummy variable (D) that identifies any two groups suspected of a structural break. 11 2. Create interaction variables with your dummy variable and every other variable in your model. 3. Estimate the regression model that includes the quantitative, dummy, and interaction variables. 4. Test the joint significance of the dummy variable identifying the two groups and all the interaction terms that include this dummy variable. The advantage of the dummy variable approach to testing for structural stability is that it allows you to identify the source of the difference between the groups. The disadvantage of the dummy variable approach is that it may not be practical if you’re working with numerous independent variables. Part IV Violations of Classical Regression Model Assumptions 10 Multicollinearity Multicollinearity refers to a linear relationship between two or more independent variables in a regres- sion model. There are two types of multicollinearity: Perfect multicollinearity. When perfectly collinear variables are included as independent variables, you can’t use the OLS technique to estimate the value of the parameters. Your regression coefficients are indeterminate and their standard errors are infinite. High multicollinearity. It’s much more common than its perfect counterpart and can be equally problem- atic when it comes to estimating an econometric model. Technically, the presence of high multicollinearity doesn’t violate any CLRM assumptions. Consequently, OLS estimates can be obtained and are BLUE with high multicollinearity. The larger variances (and standard errors) of the OLS estimators are the main reason to avoid high multicollinearity. When econometricians point to a multicollinearity issue, they’re typically referring to high multicollinear- ity rather than perfect multicollinearity. Most econometric software programs identify perfect multicollinear- ity and drop one (or more) variables prior to providing the estimation results. • Causes of multicollinearity include XYou use variables that are lagged values of one another. XYou use variables that share a common time trend component. XYou use variables that capture similar phenomena. • Consequences of high multicollinearity include XLarger standard errors and insignificant t-statistics: σ2 β̂k = σ̂2 ε∑ (Xik −Xk)2(1−R2 k) , where σ̂2 ε is the mean squared error (MSE) and R2 k is the R-squared value from regressing Xk on the other Xs. Higher multicollinearity results in a larger R2 k, which increases the standard error of the coefficient. Because the t-statistic associated with a coefficient is tk = β̂k σ̂β̂k , high multicollinearity also tends to result in insignificant t-statistics. XCoefficient estimates that are sensitive to changes in specification. If the independent variables are highly collinear, the estimates must emphasize small differences in the variables in order to assign an independent effect to each of them. XNonsensical coefficient signs and magnitudes. With higher multicollinearity, the variance of the estimated coefficients increases, which in turn increases the chances of obtaining coefficient estimates with extreme values. 12 where ε̂2i are calculated from the residuals and used as proxies for ε2i . Alternatively, a White test can be performed by estimating ε̂2i = δ0 + δ1Ŷi + δ2Ŷ 2 i where Ŷi represents the predicted values from Ŷi = β̂0 + β̂1Xi1 + · · ·+ β̂pXip. Here’s how to perform a White test: 1. Estimate your model, Yi = β0 + β1Xi1 + · · ·+ βpXip + εi, using OLS. 2. Obtain the predicted Y values (Ŷi) after estimating your model. 3. Estimate the model ε̂2i = δ0 + δ1Ŷi + δ2Ŷ 2 i using OLS. 4. Retain the R-squared value (R2 ε̂2) from this regression. 5. Calculate the F -statistic, F = R2 ε̂2 2 (1−R2 ε̂2 ) n−3 , or the chi-squared statistic, χ2 = nR2 ε̂2 . If either of these test statistics is significant, then you have evidence of heteroskedasticity. • The Goldfeld-Quandt test. The Goldfeld-Quandt (GQ) test begins by assuming that a defining point exists and can be used to differentiate the variance of the error term. Sample observations are divided into two groups, and evidence of heteroskedasticity is based on a comparison of the residual sum of squares (RSS) using the F -statistic. 1. Estimate your model separately for each group and obtain the residual sum of squares for Group A (RSSA) and the residual sum of squares for Group B (RSSB). 2. Compute the F -statistic by F = RSSA n−p−1 RSSB n−p−1 . The null hypothesis for the GQ test is homoskedasticity. The larger the F -statistic, the more evidence you’ll have against the homoskedasticity assumption. • The Park test. The Park test assumes that the heteorskedasticity may be proportional to some power of an independent variable (Xk) in the model: σ2 iε = σ2 εX α ik. 1. Estimate the model Yi = β0 + β1Xi1 + · · ·+ βpXip + εi using OLS. 2. Obtain the squared residuals, ε̂2i , after estimating your model. 3. Estimate the model ln ε̂2i = γ + α lnXik + ui using OLS. 4. Examine the statistical significance of α using the t-statistic: t = α̂ σ̂α̂ . If the estimate of α coefficient is statistically significant, then you have evidence of heteroskedasticity. Correcting your regression model for the presence of heteroskedasticity. • Weighted least squares (WLS). The goal of the WLS transformation is to make the error term in the original econometric model homoskedastic. First, you assume that the heteroskedasticity is determined proportionally from some function of the independent variables: V ar(ε|Xi) = σ2 εh(Xi). Then you use knowl- edge of this relationship to divide both sides of the original model by the component of heteroskedasticity that give the error term a constant variance. More specifically, the objective of OLS is min ∑( Yi − β̂0 − β̂1Xi1 − · · · − β̂pXip )2 . The objective of WLS is min ∑( Yi − β̂0 − β̂1Xi1 − · · · − β̂pXip )2 h(Xi) . In practice, knowing the exact functional form of h(Xi) is impossible. In applied settings, the exponential function is the most common approach to modeling heteroskedasticity: V ar(ε|Xi) = σ2 ε exp(α0 + α1Xi1 + · · ·+ αpXip). 1. Estimate the original model, Yi = β0 + β1Xi1 + · · ·+ βpXip + εi, and obtain the residuals, ε̂i. 2. Square the residuals and take their natural log to generate ln ε̂2i . 3. Estimate the regression ln ε̂2i = γ + δ1Xi1 + · · ·+ δpXip + vi or ln ε̂2i = γ + ϕ1Ŷi + ϕ2Ŷ 2 i + ui and obtain the fitted values: ĝi = γ̂ + ϕ̂1Ŷi + ϕ̂2Ŷ 2 i . 4. Take the inverse natural log of the fitted residuals exp(ĝi) to obtain ĥi. 5. Estimate the regression Yi = β0 + β1Xi1 + · · ·+ βpXip + εi by WLS using ĥi as weights. 15 If the proposed model of heteroskedasticity is misspecified, then WLS may not be more efficient than OLS. The problem is that misspecificaiton of the heteroskedasticity is difficult to identify. A large difference between OLS and WLS coefficients is more likely to imply that the model suffers from functional form specification bias than to suffer from heteroskedasticity. • Robust standard errors (White-corrected standard errors, heteroskedasticity-corrected standard errors). In a model with one independent variable and homoskedasticity, the variance of the estimator can be reduced to V ar(β̂1) = σ2 ε ∑ c2i ; with heteroskedasticity, the variance of the estimator is V ar(β̂i) = ∑ c2iσ 2 iε. In applied settings, the squared residuals (ε̂2i ) are used as estimates of σ2 iε. In a model with one independent variable, the robust standard error is se(β̂i)HC = √√√√ ∑ (Xi −X)2ε̂2i(∑ (Xi −X)2 )2 . Generalizing this result to a multiple regression model, the robust standard error is se(β̂k)HC = √∑ ω̂2 ikε̂ 2 i ( ∑ ω̂2 ik) 2 where the ω̂2 ik are the residuals obtained from the auxiliary regression of Xj on all the other independent variables. Here’s how to calculate robust standard errors: 1. Estimate your original multivariate model, Yi = β0 + β1Xi1 + · · · + βpXip + εi, and obtain the squared residuals, ε̂2i . 2. Estimate p auxiliary regressions of each independent variable on all the other independent variables and retain all p squared residuals (ω̂2 ik). 3. For any independent variable, calculate the robust standard errors: se(β̂k)HC = √∑ ω̂2 ikε̂ 2 i ( ∑ ω̂2 ik) 2 . Numerous versions of robust standard errors exist for the purpose of improving the statistical properties of the heteroskedasticity correction; no form of robust standard error is preferred above all others. 12 Autocorrelation Patterns of autocorrelation. The CLRM assumes there’s no autocorrelation: Cov(εt, εs) = 0 or Corr(εt, εs) = 0 for all t ̸= s. When the error term exhibits no autocorrelation, the positive and negative error values are random. If autocorrelation is present, positive autocorrelation is the most likely outcome. Positive autocorrelation occurs when an error of a given sign tends to be followed by an error of the same sign, which is called sequencing. Negative autocorrelation occurs when an error of a given sign tends to be followed by an error of the opposite sign, which is called switching. When you’re drawing conclusions about autocorrelation using the error pattern, all other CLRM assump- tions must hold, especially the assumption that the model is correctly specified. If a model isn’t correctly specified, you may mistakenly identify the model as suffering from autocorrelation. Misspecification is a more serious issue than autocorrelation. Effect of autoregressive errors. In the presence of autocorrelation, the OLS estimators may not be efficient. In addition, the estimated standard errors of the coefficients are biased, which results in unreliable hypothesis tests (t-statistics). The OLS estimates, however, remain unbiased. Typically, autocorrelation is assumed to be represented by a first-order autoregression: Yt = β0 + p∑ i=1 βiXti + εt 16 with εt = ρεt−1 + ut, where −1 < ρ < 1 and ut is a random error that satisfies the CLRM assumptions; namely E[ut|εt−1] = 0, V ar(ut|εt−1) = σ2 u, and Cov(ut, us) = 0 for all t ̸= s. By repeated substitution, we obtain εt = ut + ρut−1 + ρ2ut−2 + ρ3ut−3 + · · · . Therefore E[εt] = 0, V ar(εt) = σ2 u + ρ2σ2 u + ρ4σ2 u + · · · = σ2 u 1− ρ2 . The stationarity assumption (|ρ| < 1) is necessary to constrain the variance from becoming an infinite value. OLS assumes no autocorrelation; that is, ρ = 0 in the expression σ2 ε = σ2 u 1−ρ2 . Consequently, in the presence of autocorrelation, the estimated variances and standard errors from OLS are underestimated. Test for autocorrelation. • Graphical inspection of residuals. Look for sequencing or switching of residual errors if autocorre- lation is present. • The run test (the Geary test). You want to use the run test if you’re uncertain about the nature of the autoregressive process (no assumptions about the ρ values). A run is defined as a sequence of positive or negative residuals. The hypothesis of no autocorrelation isn’t sustainable if the residuals have too many or too few runs. The most common version of the test assumes that runs are distributed normally. If the assumption of no autocorrelation is sustainable, with 95% confidence, the number of runs should be between µr ± 1.96σr where µr is the expected number of runs and σr is the standard deviation. These values are calculated by µr = 2T1T2 T1 + T2 + 1, σr = √ 2T1T2(2T1T2 − T1 − T2) (T1 + T2)2(T1 + T2 − 1) where r is the number of observed runs, T1 is the number of positive residuals, T2 is the number of negative residuals, and T is the total number of observations. If the number of observed runs is below the expected interval, it’s evidence of positive autocorrelation; if the number of runs exceeds the upper bound of the expected interval, it provides evidence of negative autocorrelation. • The Durbin-Watson test for AR(1) processes. The Durbin-Watson (DW) test begins by assuming that if autocorrelation is present, then it can be described by an AR(1) process: Yt = β0 + p∑ i=1 βiXti + εt, εt = ρεt−1 + ut. The value produced by the DW test is called d statistic and is calculated as follows: d = ∑T t=2(ε̂t − ε̂t−1) 2∑T t=1 ε̂ 2 t = ∑T t=2 ε̂ 2 t∑T t=1 ε̂ 2 t + ∑T t=2 ε̂ 2 t−1∑T t=1 ε̂ 2 t − 2 ∑T t=2 ε̂tε̂t−1∑T t=1 ε̂ 2 t ≈ 1 + 1− 2 ρ̂σ̂2 u 1−ρ̂2 σ̂2 u 1−ρ̂2 ≈ 2(1− ρ̂). where T represents the last observation in the time series. From the approximate formula d ≈ 2(1 − ρ̂), the closer d is to 2, the stronger the evidence of no autocorrelation; the closer d is to 0, the more likely positive autocorrelation. If d is closer to 4, then no autocorrelation is rejected in favor of negative autocorrelation. 17 • Accurate prediction defined as (a) P̂i ≥ Y and Y = 1 or (b) P̂i < Y and Y = 0. Accurate predictions aggregated by calculating the total number of accurate predictions as a percentage of the total number of observations. • Accurate prediction defined as P̂i ≥ 0.5 and Y = 1 or P̂i < 0.5 and Y = 0. Accurate predictions aggregated by calculating the percent of accurte predictions in each group (for Y = 0 and Y = 1) and weighting the percent of observations in each group. • Accurate prediction defined as P̂i ≥ Y and Y = 1 or P̂i < Y and Y = 0. Accurate predictions aggregated by calculating the percent of accurate predictions in each group (for Y = 0 and Y = 1) and weighting the percent of observations in each group. The three main LPM problems. • Non-normality of the error term. The assumption that the error is normally distributed is critical for performing hypothesis tests. The error term of an LPM has a binomial distribution instead of a normal distribution. It implies that the traditional t-tests for individual significance and F -test for overall significance are invalid. • Heteroskedasticity. The assumption of homoskedasticity is required to prove that the OLS estimators are efficient. The presence of heteroskedasticity can cause the Gauss-Markov theorem to be violated and lead to other undesirable characteristics for the OLS estimators. The error term in an LPM is heteroskedastic because its variance isn’t constant: V ar(εi) = (β0 + β1Xi)(1− β0 − β1Xi). • Unbounded predicted probabilities. The probit and logit models. In a probit or logit model, we estimate E[Y |Xi] = P (Y = 1|Xi) = F (β0 + β1Xi), where F is a monotone increasing function with range (0, 1). For a probit model, F is the CDF of a standard normal: F (x) = 1√ 2π ∫ x −∞ e−ξ2/2dξ; for a logit model, F (x) = ex 1+ex . Probit and logit functions are both nonlinear in parameters, so OLS can’t be used to estimate the βs. Instead, we use maximum likelihood estimation: we solve for arg max β0,β1 (probability of observing Y1, · · · , Yn) = arg max β0,β1 n∏ i=1 F (β0 + β1Xi) Yi [1− F (β0 + β1Xi)] 1−Yi . Finding the optimal values for the β̂ terms requires solving the following first-order conditions ∂ ln L̂ ∂β̂0 = ∑n i=1 [ YiF ′(β̂0+β̂1Xi) F (β̂0+β̂1Xi) − (1−Yi)F ′(β̂0+β̂1Xi) 1−F (β̂0+β̂1Xi) ] = 0 ∂ ln L̂ ∂β̂1 = ∑n i=1 [ YiF ′(β̂0+β̂1Xi) F (β̂0+β̂1Xi) − (1−Yi)F ′(β̂0+β̂1Xi) 1−F (β̂0+β̂1Xi) ] Xi = 0 Probit and logit estimation always produces a Pseudo R-squared measure of fit: R̃2 = 1 − ln L̂ur ln L̂0 , where lnLur is the log likelihood for the estimated model and lnL0 is the log likelihood in the model with only an intercept. You can obtain more appropriate measures of fit for probit and logit models by comparing the model’s predicted probabilities to the observed Y values. Appropriate measures of fit typically capture the fraction of times the model accurately predicts the outcome, e.g. the four measures of fit used for the LPM. 14 Limited Dependent Variable Models Limited dependent variables. • Censored dependent variables. With a censored dependent variable, some of the actual values for the dependent variable are limited to a minimum and/or maximum threshold value. This leads to nonzero 20 conditional mean of the error and correlation between the value of the error and the value of the independent variable. • Truncated dependent variables. With a truncated dependent variable, some of the values for the variables are missing (meaning they aren’t observed if they are above or below some threshold). Sometimes observations included in the sample have missing values for both the independent and dependent variables, and in other cases only the values for the dependent variable are missing. Common scenarios resulting in truncation include nonrandom sample selection and self-selection. Truncated data leads to nonzero conditional mean of the error and correlation between the value of the error and the value of the independent variable. The primary difference between a truncated and a censored variable is that the value of a truncated variable isn’t observed at all. However, a value is observed for a censored variable, but it’s suppressed for some observations at the threshold point. Regression analysis for limited dependent variables. • Tobin’s Tobit for censored dependent variables. If you use OLS estimation with the observed data as if they’re all uncensored values, you get biased coefficients. To avoid them, the estimation procedure must properly account for the censoring of the dependent variable. Maximum likelihood (ML) estimation does so. Suppose you have the following model with upper-limit censoring (the most common type): Y ∗ i = β0 + β1Xi + εi, ε ∼ N(0, σ2 ε), Yi = { Y ∗ i Y ∗ i < b b Y ∗ i ≥ b. Using the probability of censorship, estimation is accomplished with ML, where the log likelihood function to be maximized is lnL = n∑ i=1 { lnF ( β0 + β1Xi − b σε ) + ln [ 1 σε F ′ ( Yi − β0 − β1Xi σε )]} where F denotes the standard normal CDF Tobit estimation produces a likelihood ratio chi-squared statistic. It’s analogous to the F -statistic in OLS, and it tests the null hypothesis that the estimated model doesn’t produce a higher likelihood than a model with only a constant term. • Truncated regression for truncated dependent variables with unobserved independent variables. In this case, you can’t apply OLS estimation to the observed data as if it’s representative of the entire population. If you do, you’ll wind up with biased coefficients. Instead, you need to use maximum likelihood (ML) estimation so you can properly account for the truncation by rescaling the normal distribution so that the cumulative probabilities add up to one over the restricted area. Consider the following model Y ∗ i = β0 + β1Xi + εi, ε ∼ N(0, σ2 ε), Yi = { Y ∗ i Y ∗ i < b · Y ∗ i ≥ b. The dot (·) represents a missing value at and above the truncation point. Using a rescaling of the normal distribution, estimation is accomplished with ML, where the log likelihood function to be maximized is lnL = −n 2 ln(2πσ2 ε)− 1 2σ2 ε n∑ i=1 (Yi − β0 − β1Xi) 2 − n∑ i=1 lnF ( b− β0 − β1Xi σε ) where F denotes the standard normal CDF. Truncated normal estimation also produces a chi-squared statistic, which is like the F -statistic in OLS. It confirms or rejects the null hypothesis that the estimated model doesn’t produce a higher likelihood than a model with only a constant term. 21 Ignoring the truncation and estimating the model using OLS will produce coefficients biased toward finding no relationship (smaller coefficients/effects). • Heckman’s selection bias correction for truncated dependent variables with observed independent variables. Assume we work with the following model: Y ∗ i = β0 + β1Xi + εi, ε ∼ N(0, σ2 ε) with self-selection defined by Si = γ0 + γ1Wi1 + γ2Wi2 + · · ·+ ui, Si = { 1 if Y ∗ i observed 0 if Y ∗ i not observed, u ∼ N(0, 1), Corr(ε, u) = ρ. The log likelihood function that’s maximized is lnL = n∑ i=1 { lnF [ ((γ0 + γ1Wi1 + γ2Wi2 + · · · ) + (Y ∗ i − β0 − β1Xi)ρ)/σε√ 1− ρ2 ] −1 2 ( Y ∗ i − β0 − β1Xi σε )2 − ln( √ 2πσε) + lnF (−γ0 − γ1Wi1 − γ2Wi2 − · · · ) } where F denotes the standard normal CDF. In a Heckman model, the variables that influence truncation usually aren’t identical to those that influence the value of the dependent variable (in contrast to the Tobit model, where they’re assumed to be the same). Sometimes the ML estimation fails to converge, and an alternative is to use the Heckit model. It can be accomplished by following these steps: 1. Estimate the selection equation Si = γ0 + γ1Wi1 + γ2Wi2 + · · ·+ u with a probit model. 2. Compute the inverse Mills ratio: λ̂i = F ′(γ̂0 + γ̂1Wi1 + γ̂2Wi2 + · · · ) F (γ̂0 + γ̂1Wi1 + γ̂2Wi2 + · · · ) where F is the standard normal CDF. 3. Estimate the model Yi = β0 + β1Xi + β2λ̂i + εi using the selected sample. Estimation of a Heckman selection model also produces a chi-squared statistic, which is similar to the F -statistic in OLS and tests the null hypothesis that esttimated model doesn’t produce a higher likelihood than a model with only a constant term. Part VI Extending the Basic Econometric Model 15 Static and Dynamic Models Using contemporaneous and lagged variables in regression analysis. • Problems with dynamic models. When you’re using time-series data, you can assume that the independent variables have a contemporaneous (static) or lagged (dynamic) effect on our dependent variable. A generic dynamic model is a distributed lag model. You can specify it as Yt = α+ δ0Xt + δ1Xt−1 + δ2Xt−2 + · · ·+ δrXt−r + εt. In practice, distributed lag models can be plagued by estimation problems. The two most common issues are high multicollinearity and the loss of degrees of freedom: high multicollinearity usually causes the coefficient estimates to display erratic behavior, while loss of degrees of freedom increases the standard errors and reduces the chances of finding statistically significant coefficients. 22 XDummy variable (DV) regression. XThe fixed effects (FE) estimator (the method most commonly used by applied econometricians). First difference (FD) transformation. In order to use the FD approach, we rely on a couple of assumptions. First, we assume that the values for the unobserved variable remain constant through time for a given subject, but vary across subjects; ωit = ωi ∀t. Second, we assume that the model doesn’t change over time. Under these two assumptions, we can take the first difference (FD) of individual observations over time: Yit = β0 + β1Xit + β2wit + εit and Yit−1 = δ0 + β1Xit + β2wit + εit, and obtain ∆Yi = Yit − Yit−1 = (β0 − δ0) + β1(Xit −Xit−1) + β2(ωit − ωit−1) + (εit − εit−1) = α0 + β1∆Xi +∆εi. Dummy variable (DV) regression. A DV model can be represented as Yit = n∑ i=1 αi0Ai + p∑ k=1 βkXit,k + εit where A = 1 for any observation that pertains to individual i and 0 otherwise. Fixed effects (FE) estimator. FE estimation is applied by time demeaning the data. Demeaning deals with unobservable factors because it takes out any component that is constant over time. By assumption, that would be the entire amount of the unobservable variable. Typically, FE model also include time effect controls. You can add them by adding dummy variables for each time period in which cross-sectional observations were obtained. Increasing the efficiency of estimation with random effects. If you have panel data, your econo- metric model can explicitly estimate the unobserved effects associated with your cross-sectional unit using the fixed effects (FE) model: Yit = β0 + β1Xit + β2ωit + εit, where ωit = ωi are unobserved characteristics for each cross-sectional unit that don’t vary over time. On the other hand, your econometric model can allow all unobserved effects to be relegated to the error term by specifying the model as Yit = β0 + β1Xit + vit where vit = ωit + εit. This approach is known as the random effects (RE) model. With panel data, the advantage of the RE model over the FE model is more efficient estimates of the regression parameters. The RE technique doesn’t estimate the fixed effects separately for each cross-sectional unit, so you get fewer estimated parameters, increased degrees of freedom, and smaller standard errors. A critical assumption of the RE model is that the unobserved individual effect (ωi) isn’t correlated with the independent variable(s). In addition, for the homoskedasticity assumption to hold, we must also impose a constant variance on the individual effects. Although εit satisfies the classical linear regression model (CLRM) assumptions, the inclusion of ωi in the composite error vit = ωi + εit results in a CLRM assumption violation. If you relegate the individual effects (ωi) to the error term, you create positive serial correlation in the composite error. As a result, RE estimation requires feasible generalized least squares (FGLS) rather than OLS to appropriately eliminate serial correlation in the error term and to produce the correct standard errors and test statistics. Testing efficiency against consistency with the Hausman test. The RE model produces more efficient estimates than the FE model. However, if individual fixed effects are correlated with the independent variable(s), then the RE estimates will be biased. In that case, the FE estimates would be preferred. The Hausman test checks the RE assumptions and helps you decide between RE and FE estimation. Note if heteroskedasticity is present, the Hausman test results could be misleading. In a model with one independent variable, the Haussman test statistic is defined as H = (β̂1(FE) − β̂1(RE)) 2 σ2 β̂1(FE) − σ2 β̂1(RE) ∼ χ2 1 25 Part VII The Part of Tens 18 Ten Components of a Good Econometrics Research Project • Introducing Your Topic and Posing the Primary Question of Interest. • Discussing the Relevance and Importance of Your Topic. • Reviewing the Existing Literature. Sources for references include XGoogle Scholar (scholar.google.com) lets you search by keyword. XSocial Science Research Network (www.ssrn.com) contains a repository of working papers with the latest research findings. XEconomic Journals on the web (http://www.oswego.edu/∼economic/journals.htm) provides a list of economic journals. XEconLit (www.aeaweb.org/econlit/) lists sources of economic research and is available through most electronic resources of university libraries. • Describing the Conceptual or Theoretical Framework. One of the characteristics that differentiates applied research in econometrics from other applications of statistical analysis is a theoretical structure supporting the empirical work, rather than focus only on the statistical fit between variables. • Explaining Your Econometric Model. You should explain and justify any specification characteristics of the econometric model (logs, quadratic functions, qualitative dependent variables, and so on) that aren’t directly addressed by the conceptual framework. This can be achieved with intuition, scatter plots, and/or conventions derived by researchers in previously published work. If there are contesting theories, then you should explain whether this implies that you could end up with different estimates of the relationship between the variables in a single model or if you should estimate more than one model. • Discussing the Estimation Method(s). Estimation problems arising from a failure of the CLRM as- sumptions are common in applied econometric research. It’s usually a good idea to estimate your model using OLS to obtain baseline results, even if you ultimately decide to use a different estimation technique. You may find that the results are similar and OLS is the easiest to interpret. • Providing a Detailed Description of Your Data. XHow the dataset was acquired and its source(s) XThe nature of the data (cross sectional, time series, or panel) XThe time span covered by the data XHow and with what frequency the data was collected XThe number of observations present XWhether any observations were thrown out and why XSummary statistics for any variables used in your econometric model(s) • Constructing Tables and Graphs to Display Your Results. • Interpreting the Reported Results. Reporting your econometric results is not enough; you also need to decipher the results for your readers. The most important element is the evaluation of statistical signifi- cance and magnitude for the primary variables of interest. The discussion should include an explanation of magnitude, directionality (positive/negative effects), statistical significance, and the relationship with the research question and theoretical hypotheses posed earlier in your paper. • Summarizing What You Learned. Synthesize your results and explain how they’re connected to your primary question. Avoid Xfocusing on variables with coefficients that are statistically significant even when the magnitude of their effect on the dependent variable is negligible (nearly no effect); Xignoring variables with statistically insignificant coefficients–finding no-relationship between vari- ables is important when economic theory or the prevailing wisdom says differently. 26 19 Ten Common Mistakes in Applied Econometrics • Failing to Use Your Common Sense and Knowledge of Economic Theory. One of the characteristics that differentiate applied research in econometrics from other applications of statistical analysis is the use of economic theory and common sense to motivate the connection between the independent and dependent variables. • Asking the Wrong Questions First. Conceptual questions are more important to ask than technical ones. • Ignoring the Work and Contributions of Others. • Failing to Familiarize Yourself with the Data. Do some exploratory work that includes descriptive statistics, line charts (for time-series data), frequency distributions, and even listing of some individual data values. Notable issues include XVariables you thought were measured continuously are actually in categories or groups. XMeasurements that you believed were real values are actually missing values. XData values that appear perfectly legitimate are actually censored values. • Making It Too Complicated. The art of econometrics lies in finding the appropriate specification or functional form to model your particular outcome of interest. Given the uncertainty of choosing the “perfect” specification, many applied econometricians make the mistake of overspecifying their models or favor complicated estimation methods over more straightforward techniques. If theory and common sense aren’t fairly conclusive about the hypothesized effect of a variable, it’s probably best to refrain from including it. Consequently, additional sophistication in your model should be introduced as necessary and not simply to exhibit your econometric skills. • Being inflexible to Real-World Complications. The ceteris paribus assumption often does not hold. Use proxies that seem appropriate and that others would find acceptable. Avoid forcing a particular dataset into estimation that isn’t appropriate for the research question. • Looking the Other Way When You See Bizarre Results. If some results don’t pass a common-sense test, then the statistical tests are likely to be meaningless and may even indicate that you’ve made a mistake with your variables, the estimation technique, or both. • Obsessing over Measures of Fit and Statistical Significance. The importance of your results shouldn’t be determined on the basis of fit (R-squared values) or statistical significance alone. The primary finding in many of the best papers using econometrics involves findings of statistical insignificance. • Forgetting about Economic Significance. The most important element in the discussion of your results is the evaluation of statistical significance and magnitude for the primary variables of interest. If a variable has a statistically significant coefficient but the magnitude is too small to be of any importance, then you should be clear about its lack of economic significance. • Assuming Your Results Are Robust. You want to perform robustness (or sensitivity) analysis to show that your model estimates aren’t sensitive (are robust) to slight variations in specification. Part VIII Appendices A Specifying Your Econometrics Regression Model As you define your regression model, you need to consider several elements: • Economic theory, intuition, and common sense should all motivate your regression model. • The most common regression estimation technique, ordinary least squares (OLS), obtains the best estimates of your model if the classical linear regression model (CLRM) assumptions hold. • Assuming a normal distribution of the error term is important for hypothesis testing and predic- tion/forecasting. 27
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved