Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Panel Data Analysis: Understanding Cross-Sectional and Time Series Effects, Lecture notes of Dynamics

An overview of panel data analysis, a statistical technique used to study the relationship between variables for the same units over multiple time periods. Panel data allows researchers to examine both cross-sectional variation and time series effects, as well as investigate issues such as heterogeneity, hierarchical structures, dynamics in economic behavior, and individual/group effects. The document also covers panel data models, including the SUR model and pooled OLS model, and discusses methods for dealing with heteroscedasticity and persistent common shocks.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

riciard
riciard 🇬🇧

4.4

(7)

15 documents

1 / 70

Toggle sidebar

Related documents


Partial preview of the text

Download Panel Data Analysis: Understanding Cross-Sectional and Time Series Effects and more Lecture notes Dynamics in PDF only on Docsity! RS-15 1 Lecture 15 Panel Data Models • A panel, or longitudinal, data set is one where there are repeated observations on the same units: individuals, households, firms, countries, or any set of entities that remain stable through time. • Repeated observations create a potentially very large panel data sets. With N units and T time periods  Number of observations: NT. – Advantage: Large sample! Great for estimation. – Disadvantage: Dependence! Observations are, very likely, not independent. • Modeling the potential dependence creates different models. Panel Data Sets RS-15 2 • The National Longitudinal Survey (NLS) of Youth is an example. The same respondents were interviewed every year from 1979 to 1994. Since 1994 they have been interviewed every two years. • Panel data allows us a researcher to study cross section effects –i.e., along N, variation across the firms- & time series effects –i.e., along T, variation across time. Panel Data Sets                     NTiTTT Ntittt Ni Ni yyyy yyyy yyyy yyyy       21 21 222212 112111 Time series Cross section • A standard panel data set model stacks the yi’s and the xi’s: y = X + c +  X is a ΣiTixk matrix  is a kx1 matrix c is ΣiTix1 matrix, associated with unobservable variables. y and  are ΣiTix1 matrices Panel Data Sets                                                                                     jjj kTTT ktt k k i kTTT ktt k k iT it i i i T t www www xww www X xxx xxx xxx xxx X y y y y y y y y y y ... ... ... ... ... ;....; ... ... ... ... ... ;....; 21 221 22212 12111 21 221 22212 12111 1 2 1 1 1 12 11 1 111         • Notation: RS-15 5 Panel Data Models: Example 2 - Pooling • Assumptions (A1) yit = xit’ + zi’ γ + it - the DGP i = 1, 2, ...., N - we think of i as individuals or groups. t = 1, 2, ...., Ti - usually, N >> T. (A2) E[i|X,z] = 0, - X and z: exogenous (A3) Var[i|X,z] = 2I. - Heteroscedasticity can be allowed. (A4) Rank(X) = full rank • We think of X as a vector of observed characteristics. For example, firm size, Market-to-book, Z-score, R&D expenditures, etc. • We think of z as a vector of unobserved characteristics (individual effects). For example, quality of management, growth opportunities, etc. 25 – Indices: - i: individuals –i.e., the unit of observation–, - t: time period, - j: observed explanatory variables, - p: unobserved explanatory variables. – Time trend t allows for a shift of the intercept over time, capturing time effects –technological change, regulations, etc. But, if the implicit assumption of a constant rate of change is strong (=δ), we use a set of dummy variables, one for each time period except reference period. it s p pip k j jitjit tZXy     12 1 • The DGP (A1) is linear: Panel Data Models: Basic Model RS-15 6 • We can rewrite the regression model as:    s p pipi Zc 1 iti k j jitjit tcXy    2 1 31 – X: The variables of interest –β is the vector of parameter of interest. – Z: The variables responsible for unobserved heterogeneity (& dependence on the yi’s). Usually, a nuisance component of the model. • The Zp variables are unobserved: Impossible to obtain information about the pZp component of the model. We define a term c the unobserved effect, representing the joint impact of the Zp variables on yi – like an index of unobservables for individual i: Panel Data Models: Basic Model iti k j jitjit tcXy    2 1 30 Note: If the Xj’s are so comprehensive that they capture all relevant characteristics of individual i, c can be dropped and, then, pooled OLS may be used. But, this is situation is very unlikely. • In general, dropping c leads to missing variables problem: bias! • We usually think of c as contemporaneously exogenous to the conditional error. That is, E[it|c ] = 0, t =1,..., T • A stronger assumption: Strict exogeneity can also be imposed. Then, E[it|xi1, xi2,...,xiT, c ] = 0, t =1,..., T Panel Data Models: Basic Model RS-15 7 30 • Strict exogeneity conditions on the whole history of xi. Under this assumption:  The βj’s are partial effects holding c constant. • Violations of strict exogeneity are not rare. For example, if xit contains lagged dependent variables or if changes in it affect xit+1 (a “feedback” effect). • But to estimate β we still need to say something about the relation between xit and c. Different assumptions will give rise to different models. Panel Data Models: Basic Model tcXcyE i k j jitjiitit    2 1],|[ x • The basic DGP: yit = xit’ + zi’ γ + it & (A2)-(A4) apply. Depending on how we model the heterogeneity in the panel, we have different models. • Four Popular Models: (1) Pooled (Constant Effect) Model zi’γ is a constant. zi = α (and uncorrelated with xit!). Dependence on the yit may enter through the variance. That is, repeated observations on individual i are linearly independent. In this case, yit = xit’ + α + it  OLS estimates α and  consistently. We estimate k+1 parameters. Panel Data Models: Types 31 RS-15 10 Assumptions for Asymptotics (Greene) • Convergence of moments involving cross section Xi. Usually, we assume N increasing, T or Ti assumed fixed. – “Fixed-T asymptotics” (see Greene) – Time series characteristics are not relevant (may be nonstationary) – If T is also growing, need to treat as multivariate time series. • Rank of matrices. X must have full column rank.  Xi may not, if Ti < K. • Strict exogeneity and dynamics. If xit contains yi,t-1 then xit cannot be strictly exogenous. Xit will be correlated with the unobservables in period t-1. Inconsistent OLS estimates! (To be revisited later.) • We can relax assumption (A3). The new DGP model: y = X* * + , with X * = [X ι] - ΣiTix(k+1) matrix. * = [ c]’ - (k+1)x1 matrix Now, we assume (A3’) E[ '|X] = Σ ≠ σ2 IΣiTi • Potentially, a lot of different elements in E[ '|X] in a panel: - Individual heteroscedasticity. Usual groupwise heteroscedasticity. - Autocorrelation (Individual/group/firm) effects. Errors have arbitrary correlation across time for a particular individual i: - Temporal correlation (Time) effects. Errors have arbitrary correlation across individuals at a moment in time (SUR-type correlation). - Persistent common shocks: Errors have some correlation between different firms in different time periods (but, these shocks are assumed to die out over time, and may be ignored after L periods). Panel Data Models: (A3’) - No Homoscedasticity RS-15 11 • To understand the different elements in Σ, consider the following DGP for the errors, εit’s: εit = θi’ft + ηit , ft~D(0, σf 2) & ηit = ϕ ηit-1 + ςit, ςit~D(0, σςi2) ft : vector of random factors common to all individuals/groups/firms. θi: vector of factor loadings, specific to individual i. ςit: random shocks to individual i, uncorrelated across both i and t. ηit: random shocks to i. This term generates autocorrelation effects in i. • θi’ft generates both contemporaneous (SUR) and time-varying cross- correlations between i and j . (Autorrelations die out after L periods.) - If ft is uncorrelated across t  only contemporaneous (SUR) effects. - If ft is persistent in t  both SUR and persistent common effects. Panel Data Models: (A3’) - Error Structures • Different forms for E[ '|X]: - Individual heteroscedasticity. E[i 2|X] = σi 2  standard groupwise heteroscedasticity driven by ςit. - Autocorrelation (Individual) effects: E[itis|X] ≠ 0 (t≠s)  auto-/time-correlation for errors, it driven by ηit. - Temporal correlation effects: E[itjt|X] ≠ 0 (i≠j)  contemporary cross-correlation for errors driven by ft. - Persistent common shocks: E[itjs|X] ≠ 0 (i≠j) and |t − s| < L  time-varying cross-correlation for errors driven by ft. • Remark: Heteroscedasticity points to GLS efficient estimation, but, as before, for consistent inferences we can use OLS with (adjusted for panels) White or NW SE’s. Panel Data Models: (A3’) - Error Structures RS-15 12 • For consistent inferences, we can use OLS with White or NW SE’s: - White SE’s adjust only for heteroscedasticity: S0 = (1/T) i ei 2 xi xi. - NW SE’s adjust for heteroscedasticity and autocorrelation: ST = S0 + (1/T) l wL(l) t=l+1,...,T (xt-let-l etxt+ xtet et-lxt-l) • But, cross-sectional (SUR) or “spatial” dependencies are ignored. If present, the White’s or NW’s HAC need to be adjusted. • Simple intuition: Repeating a dataset 10 times should not increase the precision of parameter estimates. However, the i.i.d. assumption will do this: Now, we divide by NT, not T or N.  We cannot ignore the dependence in the data. Obvious solution: Aggregate the repeated data –i.e., aggregate in groups Panel Data Models: (A3’) – Clustered SE • In general, the observations are not identical, but correlated within a cluster –i.e., a group that share certain characteristic. We assume correlation within a cluster, but independence across clusters. • Simple idea: Aggregate over the clusters. The key is how to cluster. • Canonical example: We want to study the effect of class size on 1st graders' grades, the unobservables of 1st graders belonging to the same classroom will be correlated (say, teachers’ quality, recess routines) while will not be correlated with 1st graders in far away classrooms. Then, we can cluster by school/teacher. • In finance, it is reasonable to expect that shocks to firms in the same industry are not independent. Then, we can cluster by industry. Panel Data Models: (A3’) – Clustered SE RS-15 15 • Before calculating the NW SE, we cluster the data to remove the dependence caused by the within group correlation of the data. • We can cluster the SE by one variable (say, industry) or by several variables (say, year and industry) –“multi-level clustering.” If these several variables are nested (say, industry and state), cluster at highest level. • We assume that the correlations within a cluster (a group of firms, a region, different years for the same firm, different years for the same region) are the same for different observations. • Different clusters can produce very different SE. We want to cluster in groups that produce correlated errors. Usually, we cluster using economic theory (clustering by industry, year, industry and year). Panel Data Models: PCSE – Clustering • Since we allow for correlation between observations, clustered SE will increase CIs. The higher the clustering level, the larger the resulting SE. • Thus, different clusters can produce different SE. Rely when possible on economic theory/intuition to cluster. • It is not a bad idea to try different ways of defining clusters and see how the estimated SE are affected. Be conservative, report largest SE. • Practical rules - If aggregate variables (say, by industry, or zip code) are used in the model, clustering should be done at that level. - When the data correlates in more than one way, we have two cases: - If nested (say, city and state), cluster at highest level of aggregation - If not nested (e.g., time and industry), use “multi-level clustering.” Panel Data Models: PCSE – Clustering Remarks RS-15 16 Pooled Model • General DGP yit = xit’ + ci + it & (A2)-(A4) apply. • The pooled model assumes that unobservable characteristics are uncorrelated with xit . We can rewrite panel DGP as: yit = xit’ + vi, where vi = ci + it ci (compound error) To get a consistent estimator of , we need E[xit’ vi ] = 0. Note: E[xit’ it] is derived from (A2) E[it|xit, c ] = 0. Then, to get consistency, we need E[xit’ ci ] = 0 for all t. • Given the assumptions, we can assume ci = α −a constant, independent of i. That is, no heterogeneity. Then: yit = xit’ + α + it  CLM, with k+1 parameters. Pooled Model • We have the CLM, estimating k+1 parameters : yit = xit’ + α + it  Pooled OLS is BLUE and consistent. • Stacking the variables in matrices, we have: y = X  + α ι +  Dimensions: − y, ι and  are ΣiTix1 − X is ΣiTixk −  is kx1 • We can re-write the pooled equation model as: y = X* * + , X* = [X ι] − ΣiTix(k+1) matrix: * = [ α]’ − (k+1)x1 matrix RS-15 17 • In this context, OLS produces BLUE and consistent estimator. In this model, we refer to pooled OLS estimation • Of course, if our assumption regarding the unobservable variables is wrong, we are in the presence of an omitted variable, c. • Then, we have potential bias and inconsistency of pooled OLS. The magnitude of these problems depends on how the true model behaves: ‘fixed’ or ‘random.’ Pooled Model • In the pooled model, there is no model for group/individual i heterogeneity. Thus, pooled regression may result in heterogeneity bias: Pooled regression: yit= β0+β1xit+εit True model: Firm 1 True model: Firm 2 True model: Firm 3 True model: Firm 4 y x • • • • • • • • • • • • • • • • Pooled Regression: Heterogeneity Bias j RS-15 20 Useful Analysis of Variance Notation (Greene) i i 2T TN 2 N 2 N i=1 t=1 it i=1 t=1 it i i=1 i i Decomposition of Total variation: Σ Σ (z z) Σ Σ (z z .) Σ T z . z           Total variation = Within groups variation + Between groups variation • Interpretation: - Within group variation: Measures variation of individuals over time. - Between group variation: Measures variation of the means across individuals. • The variance (total variation) quantifies the idea that each individual i –say, each firm– differs from the overall average. We can decompose the variance into two parts: a within-group/individual part and a between group/individual part. WHO Data (Greene) Note: The variability is driven by between groups variation RS-15 21 • We start with the pooled model: y = X* * + , with X * = [X ι] - ΣiTix(k+1) matrix. * = [ α]’ - (k+1)x1 matrix Now, we allow E[i j'|Xi ] = σij Ωij • Potentially a lot of different forms for E[i j'|Xi] in a panel: - Individual heteroscedasticity. E[i 2|Xi] = σi 2 - Individual/group effects: E[itis |Xi] ≠ 0 (t≠s) - Time (SUR or spatial) effects: E[itjt |Xi] ≠ 0 (i≠j) - Persistent common shocks: E[itjs|Xi] ≠ 0 (i≠j) and |t − s| < L • Heteroscedasticity points to GLS efficient estimation, but, for consistent inferences we can use OLS with clustered White/NW SE. Pooled Model: Living with (A3’) Ordinary least squares regression ............ LHS=LWAGE Mean = 6.67635 Residuals Sum of squares = 522.20082 Standard error of e = .35447 Fit R-squared = .41121 Model test F[ 8, 4156] (prob) = 362.8(.0000) Panel Data Analysis of LWAGE [ONE way] Unconditional ANOVA (No regressors) Source Variation Deg. Free. Mean Square Between 646.25374 594. 1.08797 Residual 240.65119 3570. .06741 Total 886.90494 4164. .21299 --------+------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] --------+------------------------------------------------- EXP| .04085*** .00219 18.693 .0000 EXPSQ| -.00069*** .480428D-04 -14.318 .0000 OCC| -.13830*** .01480 -9.344 .0000 SMSA| .14856*** .01207 12.311 .0000 MS| .06798*** .02075 3.277 .0010 FEM| -.40020*** .02526 -15.843 .0000 UNION| .09410*** .01253 7.509 .0000 ED| .05812*** .00260 22.351 .0000 Constant| 5.40160*** .04839 111.628 .0000 Pooled OLS: Clustered SE – Results (Greene) RS-15 22 |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | +---------+--------------+----------------+--------+---------+ Constant 5.40159723 .04838934 111.628 .0000 EXP .04084968 .00218534 18.693 .0000 EXPSQ -.00068788 .480428D-04 -14.318 .0000 OCC -.13830480 .01480107 -9.344 .0000 SMSA .14856267 .01206772 12.311 .0000 MS .06798358 .02074599 3.277 .0010 FEM -.40020215 .02526118 -15.843 .0000 UNION .09409925 .01253203 7.509 .0000 ED .05812166 .00260039 22.351 .0000 Clustered SE Constant 5.40159723 .10156038 53.186 .0000 EXP .04084968 .00432272 9.450 .0000 EXPSQ -.00068788 .983981D-04 -6.991 .0000 OCC -.13830480 .02772631 -4.988 .0000 SMSA .14856267 .02423668 6.130 .0000 MS .06798358 .04382220 1.551 .1208 FEM -.40020215 .04961926 -8.065 .0000 UNION .09409925 .02422669 3.884 .0001 ED .05812166 .00555697 10.459 .0000 Note: Clustered SE’s tend to be bigger. The more correlation allowed, the higher the SE. Pooled OLS: Clustered SE – Results (Greene) • The bigger the cross-sectional correlation, the bigger the SE. That is, NW SE’s tend to be smaller than Driscoll and Kraay SE. • In simulations, it is found (as expected) that the PCSE perform better when there is cross-sectional dependence in the data. But, when there is no dependence in the cross-section, the standard White or NW SE do better. In some cases, these differences can be significant • Testing for cross-sectional dependence may be a good idea, especially when results are not robust to different SE. LM tests can be easily implemented. Pesaran (2004) proposes an easy test. Pooled Model: PCSE – Remarks RS-15 25 • We start with the pooled model: y = X* * +  where X * = [X ι] - ΣiTix(k+1) matrix: * = [ α]’ - (k+1)x1 matrix Now, we allow E[i j'|Xi ] = σij Ωij • We can use OLS with PCSE’s or we can do GLS. Note: Why GLS? Efficiency. • Suppose Ωij = IT. Then, we only have cross-equation correlation, not time correlation. We are back in the (aggregation) SUR framework Pooled Model with (A3’) - GLS • Suppose Ωij = IT. We are in the (aggregation) SUR framework: Pooled Model with SUR - GLS yIXXIX yIXXIXyVXXVXGLS ][')]['( ][')]['(')'(ˆ 111 111111     • For FGLS, use the pooled OLS residuals ei and ej to estimate the covariance σij. Note that EE T ee T T t tt ' 1 ' 1ˆ 1    where E is a TxN matrix and et=[e1t e2t ... eNt ]' is Nx1 vector. We need to invert (NxN matrix). Note: In general, the rank(E) ≤ T. Then, rank( ) ≤ T < N => singularity, FGLS cannot be computed. This is a problem of the data, not the model. ̂ ̂ RS-15 26 • Now, suppose we have groupwise heteroscedasticity. That is, E[i j'|Xi ] = 0 for i≠j E[i 2|Xi ] = σi 2 • We do FGLS, as usual, using the pooled OLS residuals ei to estimate the variance σi 2 and, thus, to estimate Σ: Pooled Model with Heteroscedasticity - GLS                2 2 2 2 1 ...00 ......... 0...0 0...0 N   • We can test this model with H0: σ1 2 =σ2 2 =...=σN 2. We can use: W = Σi (si 2 – s2 pooled)/Var(si 2) → χ2 N where si 2 is computed using the pooled OLS ei residuals. • Now, suppose we have individual autocorrelation. That is, E[it js'|Xi ] = 0 for i≠j E[it it-p'|Xi ] ≠ 0 -for example, Var[it|Xi ] = σ2 • We do FGLS, as usual, using the pooled OLS residuals ei to estimate the ρi and, thus, to estimate Σi: Pooled Model with Autocorrelation - GLS                   1... ......... 0...1 ...1 1 )( 21 1 2 2 T i T i i T ii i u i i      • We can test this model with H0: ρ1 = ρ2 =...= ρN=0. We can use an LM test to test H0. ititiit u 1 RS-15 27 Pooled OLS with First Differences • From the general DGP: yit = xit’ + c + it & (A2)-(A4) apply. It may still be possible to use OLS to estimate , when we have individual heterogeneity. We can use OLS if we eliminate the cause of heterogeneity: c We can do this by taking first differences of the DGP. That is, Δyit = yit – yit-1 = (xit – xit-1)’  + Δ c + Δ it = Δxit’  + uit Note: All time invariant variables, including cdisappear from the model (one “diff”). If the model has a time trend –economic fluctuations –, it also disappear, it become the constant term (the other “diff”). Thus, this method is usually called “diffs in diffs” (DD or DiD). • With strict exogeneity of (Xi,ci), the OLS regression of Δyit on Δxit is unbiased and consistent, but inefficient. • Why? The error is not longer it, but uit. The Var[u] is given by: i i 2 2 i,2 i,1 2 2 2 i,3 i,2 2 2 2 2 i,T i,T 1 2 0 0 2Var (Toeplitz form) 0 0 2                                                • That is, first differencing produces heteroscedasticity. Efficient estimation method: GLS. • It turns out that GLS is complicated. Use OLS in first differences and use Newey-West SE/PCSE with one lag. Pooled OLS with First Differences RS-15 30 OLS with First Diffs: Natural Experiment • In Finance & Economics, especially in Corporate Finance, we apply the DD method when we use natural experiments (a change in a law, a policy or a regulation) to study the effect of Xt on yt. (Recall Lecture 8.) • We have two periods: Before and after the natural experiment (the treatment). • If we also have a well-defined control group, where the treatment was not administered –i.e., the natural experiment never occurred–, then, we can use DD estimation. • The number of groups, S, (treated & not treated) under consideration is usually small –typically 2. N is usually very large. Diffs in Diffs: Natural Experiment - 1 Example 1: We are interested in the effect of labor shocks on wages and employments. Natural experiment: The 1980 Mariel boatlifs, a temporary lifting of emigration restrictions in Cuba. Most of the marielitos (the 1980 Cuban immigrants) settled in Miami. • Two periods: Before and after the 1980 Mariel boatlifs. • Control group: Low skilled workers in Houston, LA and Atlanta. • Calculate unemployment and wages of low skilled workers in both periods. Then, regress yit against a set of control variables (industry, education, age, etc.) and a treatment dummy: yit = yi2 – yi1 = δ0 + δ1 Treatmenti + (xi2'– xi1' )  uit • H0: δ1=0. Card (1990) found no effect of massive immigration. RS-15 31 Diffs in Diffs: Natural Experiment - 2 Example 2: Suppose we are interested in the effect of a substantial increase in bank deposit affect lending practices. We can use the shale revolution, which started around 2011, as a natural experiment. • Two periods: Before and after shale revolution (say, 2011). • Control group (banks outside shale formation areas). • Measure lending practices (amount lent, FICO scores of loans, etc.), yi, in both periods & regress yit against a set of control variables (size of county, size of bank, experience of bank employees, etc.) and a treatment dummy: yit = yi2 – yi1 = δ0 + δ1 Treatmenti + (xi2'– xi1' )  uit • H0: δ1=0. Diffs in Diffs: Remarks • We express the DGP in terms of i (indiviuals), s (groups), and t (time): yist = δs + δt + δ1 Treatmentst + xist'  𝜀ist • Usually, we have small S and T; but large N. Since, in general, we have within group correlation (treated individuals show similar errors), the asymptotics of the t-test are driven by S*T. • Donald and Lang (2004): Under the usual (generous) assumptions, it converges to a normal distribution (a tST-K may work better). • Intuition: Suppose that within s,t groups the errors are perfectly correlated. Then, we only have S*T independent observations! • Given the potential (time-varying) correlations in the errors, OLS SE can be terrible. PCSE tend to do better. RS-15 32 Dealing with Attrition • Attrition problem: If an unbalanced panel is a result of some selection process related to εit, then endogeneity is present and need to be dealt with using some correction methods. Otherwise, we have attrition bias. • Example: In the "Quality of Life for cancer patients" study discussed in Greene, appearance for the second interview was low for people with initial low QOL (death or depression) or with initial high QOL (don’t need the treatment). • Solutions to the attrition problem – Heckman selection model (used in the study) • Prob[Present at exit|covariates] = Φ(z’θ) (Probit model) • Additional variable added to difference model i = ϕ(zi’θ)/Φ(zi’θ) – The FDA solution: fill with zeros. (!) Pooled Model: ML Estimation • In the pooled model, y = X  + , we assume t ~N(0, Σ), where t = [1t, 2t,..., Nt ]' and Σ is an NxN matrix. • We can write the log likelihood function as: L = log L(, Σ|X) = -NT/2 ln(2π) – T/2 ln|Σ| – ½ Σt t'Σ-1t • The ML estimator is equal to the iterated FGLS estimator. • Testing is straightforward with likelihood ratio test. Example: H0: No cross correlation across equations: The off-diagonal elements of Σ are zero. LR = T (ln| R| – ln| U|) = T (Σi ln si 2 – ln| |), which follows a χ2 with N(N – 1)/2 d.f. ̂ ̂ ̂ RS-15 35 • The FE model assumes ci = αi (constant; it does not vary with t): yi = Xi + diαi + εi, for each individual i. • Stacking 1 1 2 2 N = = N                                     1 2 N y X d 0 0 0 y X 0 d 0 0 β ε α y X 0 0 0 d β [X,D] ε α Zδ ε       FEM: Estimation • The FEM is the CLM, but with many independent variables: k+N.  OLS is unbiased, consistent, efficient, but impractical if N is large. FEM: Estimation • The OLS estimates of β and α are given by: 1 1 U s in g th e F r is c h -W a u g h th e o re m = [ ]                           D D b X X X D X y a D X D D D y b X M X X M y • In practice, we do not estimate a –the ci–, they are not very interesting. Moreover, since we are in a fixed-T situation, a is unbiased, but not consistent. In addition, there is the potential incidental parameter problem. Note (Greene): LS is an estimator, not a model. Given the formulation with a lot of dummy variables, this particular LS estimator is called Least Squares Dummy Variable (LSDV) estimator. RS-15 36     i 1 i T i i i i t=1 it,k i.,k it,l i.,lk,l i i i i k (The dummy variables are orthogonal) ( ) (1/T ) , (x -x )(x -x ) ,                            i i 1 D 2 D D N D i D T i i i T i N i i D i=1 D D N i i D i=1 D D M 0 0 0 M 0M 0 0 M M I d d d d = I d d X M X = X M X X M X X M y = X M y X M y iT t=1 it,k i.,k it i. (x -x )(y -y )  • That is, we subtract the group mean from each individual observation. Then, the individual effects disappear. Now, OLS can easily be used to estimate the k β parameters, using the demeaned data. • We know this method: The within-groups estimation. FEM: Estimation iit k j jijitjiit ttXXYY     )()( 2 iti k j jitjit tXY    2 1 ii k j jiji tXY    2 1 • The within-groups method estimates the parameters using demeaned data. That is, Recall: It is called within-groups/individuals method because because it relies on variations within individuals rather than between individuals. • For the usual asymptotic results, we need: – (A2) E[Δεit|Xi] = 0. – (A3’) E[εi’εi|Xi, ci] = Σ –different formulations OK. – (A4) E[ΔXi’ΔXi] has full rank. FEM: Within Transformation Removes Effects RS-15 37 FEM: Within Transformation Removes Effects iit k j jijitjiit ttxxyy    )()( 2 • There are costs in the simplicity of the within-groups estimation: 1) All time-invariant variables (including constant) for each individual i drop out of the model (gender, years of experience, previous job network, etc.). This eliminates all between-individuals variability (which may be contaminated by omitted variable bias) and leaves only the within-subject variability to analyze. 2) Dependent variables are likely to have smaller variances than in the original specification (measured as deviations from the i mean). 3) The manipulation involves the loss of N degrees of freedom (we are estimating N means!). FEM: LS Dummy Variable (LSDV) Estimator • b is obtained by within-groups least squares (group mean deviations). • Then, we can use the normal equations to estimate a: D’Xb + D’Da=D’y a = (D’D)-1D’(y – Xb) iT i i t= 1 it it ia = ( 1 /T )Σ ( y - )= ex b Note: – This is simple algebra –the estimator is just OLS – Again, LS is an estimator, not a model. – Note what ai is when Ti=1. Follow this with yit-ai-xit’b=0 if Ti=1. RS-15 40 FEM: Calculation of Var[b|X] • Since we have assume strict exogeneity: Cov[εit,(xjs,cj)]=0, we have OLS in a CLM. That is, Asy.Var[b|X] = 2 N 2 N N 1 i=1 i i=1 i i=1( / T )plim[( / T ) ] which is the usual estimator for OLS         i i D iX M X   iTN 2 2 i=1 t=1 it i it N i=1 i (y -a -x )ˆ T - N - K (Note the degrees of freedom correction)       b PCSE Remark: All previous comments and remarks apply to the FEM. • We build the SE according to the type of data we have: - If we do not suspect autocorrelated errors –not a strange situation–, we can rely on clustered White SE’s (S0). - If we suspect autocorrelated errors, then the Driscoll and Kraay SE should be used. FEM: Testing for Fixed Effects • Under H0 (No FE): αi = α for all i.  That is, we test whether to pool or not to pool the data. • Different tests: – F-test based on the LSDV dummy variable model: constant or zero coefficients for D. Test follows an F(N-1,NT-N-K) distribution. – F-test based on FEM (the unrestricted model) vs. pooled model (the restricted model). Test follows an F(N-1,NT-N-K) distribution. – A LR can also be done –usually, assuming normality. Test follows a χ2 N-1 distribution. RS-15 41 FEM: Hypothesis Testing • Based on estimated residuals of the fixed effects model. (1) Estimate FEM: yit = xit’β + αi + it  Keep residuals eFE,it (2) Tests as usual: – Heteroscedasticity • Breusch and Pagan (1980) – Autocorrelation: AR(1) • Breusch and Godfrey (1981) 2 1 1 2 ' ' 1          d FEFE FEFE ee ee T NT LM Application: Cornwell and Rupert Data (Greene) Cornwell and Rupert Returns to Schooling Data, 595 Individuals, 7 Years Variables in the file are: (Not used in regressions) EXP = work experience, EXPSQ = EXP2 WKS = weeks worked OCC = occupation, 1 if blue collar, (IND = 1 if manufacturing industry) (SOUTH = 1 if resides in south) SMSA = 1 if resides in a city (SMSA) MS = 1 if married FEM = 1 if female UNION = 1 if wage set by unioin contract ED = years of education (BLK = 1 if individual is black) LWAGE = log of wage = dependent variable in regressions (Y) These data were analyzed in Cornwell, C. and Rupert, P., "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, 1988, pp. 149-155. RS-15 42 Application: Cornwell and Rupert (Greene) (1) Returns to Schooling – Pooled OLS Results K RSS & R2 X only (2) Returns to Schooling – LSDV Results Application: Cornwell and Rupert (Greene) N+K RSS & R2 X and group effects RS-15 45 REM: Error Components Model • REM Assumptions: yit = xit’ + ci + it = xit’ + ui + it = xit’ + wit E[it |Xi] = 0 E[it 2|Xi] = σε2 E[ui |Xi] = 0 E[ui 2|Xi] = σu 2 E[uiit|Xi] = E[uijt |Xi ] = 0 – u and  are independent. E[uiuj |Xi] = 0 (i≠j) – no cross-correlation of RE. E[itjt |Xi] = 0 (i≠j) – no cross-correlation for the errors, it. E[itjs |Xi] = 0 (t≠s) – there is no autocorrelation for it. 22 , 2222 2   uuuuw itiitiitiit 2 ))(( 2121 uuuww itiitiitit     REM: Notation (Greene) 1 1 1 2 2 2 N N N i u T observations u T observations u T observations = + + T observations = +                                                    1 1 1 2 2 2 N N N N i=1 y X ε i y X ε iβ y X ε i Xβ ε u Xβ w In all that fo        i it it llows, except where explicitly noted, X, X and x contain a constant term as the first element. To avoid notational clutter, in those cases, x etc. will simply denote the counterpart without the constant term. Use of the symbol K for the number of variables will thus be context specific but will usually include the constant term. RS-15 46 2 2 2 2 u u u 2 2 2 2 u u u i i 2 2 2 2 u u u 2 2 u i i 2 2 u i 1 2 N Var[ +u ] = T T = = Var[ | ]                      i i T T ε i I ii I ii Ω Ω 0 0 0 Ω 0 w X 0 0 Ω                                    i (Note these differ only in the dimension T )        REM: Notation (Greene) • Note: If E[itjt |Xi ] ≠ 0 (i≠j) or E[itjs |Xi ] ≠ 0 (t≠s), we no longer have this nice diagonal-type structure for Var[w|X]. REM: Assumptions - Convergence of Moments N i 1 iN i 1 i N i 1 iN i 1 i N N i 1 i u i 1 i i f a weighted sum of individual moment matrices T T f a weighted sum of individual moment matrices T T = f f T Note asymptoti                     i i i i i 2 2i i i i X XX X X Ω XX ΩX X X x x  i i i cs are with respect to N. Each matrix is the T moments for the T observations. Should be 'well behaved' in micro level data. The average of N such matrices should be likewise. T or T is assum i iX X ed to be fixed (and small). RS-15 47 REM: Pooled OLS Estimation (Greene) • Standard results for the pooled OLS estimator b in the GR model - Consistent and asymptotic normal - Unbiased - Inefficient • We can use pooled OLS, but for inferences we need the true variance –i.e., the sandwich estimator: 1 1 N N N N i 1 i i 1 i i 1 i i 1 i Var[ | ] T T T T as N with our convergence assumptions                               -1 -1 1 X X X ΩX X Xb X 0 Q Q * Q 0 REM: Sandwich Estimator for OLS (Greene) 1 1 N N N N i 1 i i 1 i i 1 i i 1 i N i 1 iN i 1 i N i 1 iN i 1 i V a r[ | ] T T T T f , w he re = = E[ | ] T T In the sp ir it o f the W h ite e s tim a to r, u se ˆ ˆ ˆf , T T                                          i i i i i i i i i i i i 1 X X X Ω X X Xb X X Ω XX Ω X Ω w w X X w w XX Ω X w = H ypo thes is te s ts a re th en based on W a ld s ta tis tic s . i iy - X b T H IS IS T H E 'C L U S T E R ' ES T IM A T O R • Recall: Clustered standard errors or PCSE There is a grouping, or “cluster,” within which the error term is possibly correlated, but outside of which (across groups) it is not. RS-15 50 i 2 2 u TN 2 2 i 1 t 1 it i it LSDV N i 1 i N 2 2 i 1 u Feasible GLS requires (only) consistent estimators of and . Candidates: (y a )From the robust LSDV estimator: ˆ T K N From the pooled OLS estimator:                         x b i i i T 2 t 1 it OLS it OLS N i 1 i N 2 2 2 i 1 it i MEANS u T 1 TN 2 2 i 1 t 1 s t 1 it is it is i u u N i 1 i (y a ) T K 1 (y a )From the group means regression: / T N K 1 ˆ ˆw w(Wooldridge) Based on E[w w | ] if t s, ˆ T K                                    x b x b X  N There are many others. REM: FGLS - Estimators for the Variances Note: A slight chance in notation, x’ does not contain the constant term. REM: Practical Problems with FGLS 2 uAll of the preceding regularly produce negative estimates of . Estimation is made very complicated in unbalanced panels. A bulletproof solution (originally used in TSP, now LIMDEP and others). From th  i i i i TN 2 2 i 1 t 1 it i it LSDV N i 1 i TN 2 2 2 2i 1 t 1 it OLS it OLS u N i 1 i T TN 2 N 2 i 1 t 1 it OLS it OLS i 1 t 1 i u (y a )e robust LSDV estimator: ˆ T (y a )From the pooled OLS estimator: ˆT (y a ) (yˆ                                         x b x b x b 2 t i it LSDV N i 1 i a ) 0 T     x b • Bullet proof solution: Do not correct by degrees of freedom. Then, given that the unrestricted RSS (LSDV) will be lower than the restricted (pooled OLS) RSS, σu 2 will be positive! RS-15 51 Application: Fixed Effects Estimates (Greene) ---------------------------------------------------------------------- Least Squares with Group Dummy Variables.......... LHS=LWAGE Mean = 6.67635 Residuals Sum of squares = 82.34912 Standard error of e = .15205 These 2 variables have no within group variation. FEM ED F.E. estimates are based on a generalized inverse. --------+------------------------------------------------------------- Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------- EXP| .11346*** .00247 45.982 .0000 19.8538 EXPSQ| -.00042*** .544864D-04 -7.789 .0000 514.405 OCC| -.02106 .01373 -1.534 .1251 .51116 SMSA| -.04209** .01934 -2.177 .0295 .65378 MS| -.02915 .01897 -1.536 .1245 .81441 FEM| .000 ......(Fixed Parameter)....... UNION| .03413** .01491 2.290 .0220 .36399 ED| .000 ......(Fixed Parameter)....... --------+------------------------------------------------------------- REM: Computing Variance Estimators (Greene) 2 2 u Using full list of variables (FEM and ED are time invariant) OLS sum of squares = 522.2008. + = 522.2008 / (4165 - 9) = 0.12565. Using full list of variables and a generalized inverse (same as dropp   2 2 u 2 u ing FEM and ED), LSDV sum of squares = 82.34912. = 82.34912 / (4165 - 8-595) = 0.023119. 0.12565 - 0.023119 = 0.10253 Both estimators are positive. We stop here. If were negative, we would u     se estimators without DF corrections. RS-15 52 REM: Application (Greene) ---------------------------------------------------------------------- Random Effects Model: v(i,t) = e(i,t) + u(i) Estimates: Var[e] = .023119 Var[u] = .102531 Corr[v(i,t),v(i,s)] = .816006 Lagrange Multiplier Test vs. Model (3) =3713.07 ( 1 degrees of freedom, prob. value = .000000) (High values of LM favor FEM/REM over CR model) Fixed vs. Random Effects (Hausman) = .00 (Cannot be computed) ( 8 degrees of freedom, prob. value = 1.000000) (High (low) values of H favor F.E.(R.E.) model) Sum of Squares 1411.241136 R-squared -.591198 +---------+--------------+----------------+--------+---------+----------+ |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X| +---------+--------------+----------------+--------+---------+----------+ EXP .08819204 .00224823 39.227 .0000 19.8537815 EXPSQ -.00076604 .496074D-04 -15.442 .0000 514.405042 OCC -.04243576 .01298466 -3.268 .0011 .51116447 SMSA -.03404260 .01620508 -2.101 .0357 .65378151 MS -.06708159 .01794516 -3.738 .0002 .81440576 FEM -.34346104 .04536453 -7.571 .0000 .11260504 UNION .05752770 .01350031 4.261 .0000 .36398559 ED .11028379 .00510008 21.624 .0000 12.8453782 Constant 4.01913257 .07724830 52.029 .0000 Testing for Random Effects: LM Test 2 2N 2 N 2 i 1 i i 1 i i N N 2 N i 1 i 1 i t i 1 i B r e u s c h a n d P a g a n L a g ra n g e M u lt ip l ie r s ta t is t ic A s s u m in g n o rm a li t y ( a n d fo r c o n v e n ie n c e n o w , a b a la n c e d p a n e l) ( T e ) [ ( T e ) ]N T N TL M = 1 2 ( T -1 ) e 2 ( T -1 ) C o                     i i e e e e i N 2 N i 1 i i 1 i n v e rg e s to c h i- s q u a re d [ 1 ] u n d e r th e n u l l h y p o th e s is o f n o c o m m o n e f fe c ts . ( F o r u n b a la n c e d p a n e ls , th e s c a le in f r o n t b e c o m e s ( T ) / [2 T ( T 1 ) ] .)    • We want to test for RE. That is, H0: σu 2 =0. • We can use the Breusch-Pagan (1980) Test for RE effects. Similar to the LM-BP test for autocorrelation, it is based on the pooled OLS residuals, ei. It is easy to compute – distributed as 1 2: RS-15 55 • Case against RE: - If either of the conditions for using RE is violated, we should use FE. • Condition (1): Randomly drawn unobserved Zp variables. This is a reasonable assumption in many cases: Many of the panels are designed to be a random sample (for example, NLSY). But, it would not be a reasonable assumption if the units of observation in the panel data set were data from the S&P 500 firms. • Condition (2): Zp is independently of all of the Xj variables. A violation of condition (2) causes inconsistency in the RE estimation FE vs. RE • FE estimation is always consistent. On the other hand, a violation of condition (2) causes inconsistency in the RE estimation. That is, if there are omitted variables, which are correlated with the Xit in the model, then the FEM provides a way for controlling for omitted variable bias. In a FEM, individuals serve as their own controls. • Q: How can we tell if condition (2) is violated? A: A DHW test can help. FE vs. RE RS-15 56 DHW (Hausman) Specification Test: FE vs. RE Estimator Random Effects E[ci|Xi] = 0 Fixed Effects E[ci|Xi] ≠ 0 FGLS (Random Effects) Consistent and Efficient Inconsistent LSDV (Fixed Effects) Consistent Inefficient Consistent Possibly Efficient • Under an H0 (RE is true), we have one estimator that is efficient (RE) and one inefficient (LSDV). We can use a Durbin-Hausman-Wu test. As in its other applications, the DHW test determines whether the estimates of the coefficients, taken as a group, are significantly different in the two regressions. -1 d ˆ ˆBasis for the test, ˆ ˆˆ ˆ ˆ ˆWald Criterion: = ; W = [Var( )] A lemma (Hausman (1978)): Under the null hypothesis (RE) ˆ nT [ ] N[ , ] (efficient) ˆ nT [   FE RE FE RE RE RE FE β - β q β - β q q q β - β 0 V β d] N[ , ] (inefficient) ˆ ˆˆNote: = ( )-( ). The lemma states that in the ˆ ˆjoint limiting distribution of nT [ ] and nT , the limiting covariance, is . But, =   FE FE RE RE Q,RE Q,RE FE,R - β 0 V q β - β β β β - β q C 0 C C - . Then, Var[ ] = + - - . Using the lemma, = . It follows that Var[ ]= - . Based on the preceding ˆ ˆ ˆ ˆ ˆ ˆH=( ) [Est.Var( ) - Est.Var( )] (   E RE FE RE FE,RE FE,RE FE,RE RE FE RE -1 FE RE FE RE FE RE V q V V C C C V q V V β - β β β β - β ) Note: β does not contain the constant term. DHW (Hausman) Specification Test: FE vs. RE RS-15 57 Computing the DHW Statistic 1 2 N i 1 i i -1 2 2 N i ui i 1 i i 2 2 i i u 2 2 u 1ˆEst.Var[ ] Iˆ T Tˆ ˆˆEst.Var[ ] I , 0 = 1ˆˆ T Tˆ ˆ ˆAs long as and are consistent, as N , Est.Var[ˆ ˆ                                             FE i RE i F β X ii X β X ii X β 2 ˆ] Est.Var[ ] will be nonnegative definite. In a finite sample, to ensure this, both must be computed using the same estimate of . The one based on LSDV willˆ generally be the better choice. Note    E REβ ˆthat columns of zeros will appear in Est.Var[ ] if there are time invariant variables in . FEβ X Note: Pooled OLS is consistent, but inefficient under H0. Then, the RE estimation is GLS. DHW Specification Test: Application (Hoechle) • Bid-Ask Spread Panel estimation. • Rejection at the 5% level, like in this case, indicates that βFE ≠ βRE. - Usually, this result is taken as an indication of a FEM. RS-15 60 Wu Test: Application 2 (Hoechle) • Bid-Ask Spread Wu test estimation with PCSE’s. Stata code: Wu Test: Application 2 (Hoechle) • Bid-Ask Spread Wu test estimation with Driscoll and Kraay SE’s. Stata code for auxiliary regression: • Now, you cannot reject the REM at the 5% level. Here you can say, “after accounting for cross-sectional and temporal dependence, the Hausman test indicates that the coefficient estimates from pooled OLS estimation are consistent.” • Different PCSE’s can give different results. RS-15 61 DHW Specification Test: Remarks • Issues with Hausman tests –as discussed in Wooldridge (2009): (1) Fail to reject means either: - FE and RE are similar -i.e., this is great! - FE estimates are very imprecise - Large differences from RE are nevertheless insignificant - That can happen if the data are awful/noisy. Be careful. (2) Watch for difference between “statistical significance” and “practical significance.” - With a huge sample, the Hausman test may "reject" even though RE is nearly the same as FE - If differences are tiny, you can feel comfortable using the REM. (3) PCSE’s matter  Q: Which ones to use? Allison’s Hybrid Approach • Allison (2009) suggests a ‘hybrid’ approach that provides the benefits of FE and RE – Also discussed in Gelman & Hill (2007) textbook – Builds on idea of decomposing X into mean, deviation Steps: – 1. Compute case-specific mean variables – 2. Transform X variables into deviations (within transformation) – 3. Do not transform the dependent variable Y – 4. Include both X deviation & X mean variables – 5. Estimate with a RE model RS-15 62 • Benefits of hybrid approach: – 1. Effects of “X-deviation” variables (within effects) are equivalent to results from a FEM. • All time-constant factors are controlled – 2. Effects of time-constant X variables (between effects) – 3. You can build a general multilevel model • Random slope coefficients; more than 2 level models… – 4. You can directly test FE vs RE – No Hausman test needed • REM: X-mean and X-deviation coefficients should be equal • Conduct a Wald test for equality of coefficients – Also differing X-mean & X-deviation coefficents are informative. Allison’s Hybrid Approach Measurement Error • It can have a severe effect on panel data models. • It is no longer obvious that a panel data estimator is preferred to a cross-section estimator. • Measurement error often leads to “attenuation” of signal to noise ratio in panels – biases coefficients towards zero. RS-15 65 • One solution: Use FD and instrumental variables – Strategy: If there’s a problem between the error, it, and lag yi, let’s find a way to calculate a new version of lag yi that doesn’t pose a problem • Idea: Further lags of yi are not an issue in a FD model. • Use them as “instrumental variables,” as a proxy for lag yi. – Arellano-Bond (1991): GMM estimator • A FD estimator. • Lag of levels as an instrument for differenced yi. – Arellano-Bover (1995)/Blundell-Bond: “System GMM” • Expand on this by using lags of differences and levels as IVs. • Generalized Method of Moments (GMM) estimation. Dynamic Panel Models      β xi,t i,t i,t 1 i,t i (Arellano/Bond/Bover, Journal of Econometrics, 1995) y y u Dynamic random effects model for panel data. Can't use least squares to estimate consistently. Can't use FGLS without esti       x x i,1 i i,1 i,1 i i,2 mates of parameters. Many moment conditions: What is orthogonal to the period 1 disturbance? E[( u ) ] 0 = K orthogonality conditions, K+1 parameters E[( u ) ] 0 = K more orthogonality   xi,1 i i,1 conditions, same K+1 parameters ... E[( u ) ] 0 = K orthogonality conditions, same K+1 parameters The same variables are orthogonal to the period 2 disturbance. There are hundreds, sometimes thousands of moment conditions, even for fairly small models. Dynamic Panel Models: GMM RS-15 66 • Key usual assumptions / issues – Serial correlation of differenced errors limited to 1 lag – No overidentifying restrictions (No Hansen - Sargan test) – Q: How many instruments? • Criticisms: - Angrist and Pichke (2009): Assumptions are not always plausible. - Allison (2009) - Bollen and Brand (2010): Hard to compare models. Dynamic Panel Models: GMM • General remarks: - Ignoring dynamics –i.e., lags– not a good idea: omitted variables problem. - It is important to think carefully about dynamic processes: • How long does it take things to unfold? • What lags does it make sense to include? • With huge datasets, we can just throw lots in – With smaller datasets, it is important to think things through. Dynamic Panel Models: Remarks I RS-15 67 • Traditional IV panel estimator: itiititit cYxy   • X = exogenous covariates • Y = other endogenous covariates (may be related to εit) • ci = unobserved unit-specific characteristic • εit = idiosyncratic error – Treat ci as random, fixed, or use differencing to wipe it out – Use contemporaneous or lagged X and (appropriate) lags of Y as instruments in two-stage estimation of yit. Note: This approach works well if lagged Y is plausibly exogenous. Dynamic Panel Models: IV Framework Time Series Cross Section (TSCS) Data • Time Series Cross Section (TSCS) Data - Panel Data with large T, small N - Example I: economic variables for industrialized countries Often 10-30 countries Often around 30 to 40 years of data - Example II: financial variables Often more than 1,000 firms Often 40-50 years of data for well-established markets (10-30 for emerging markets). – Beck’s (2001) advice: • No specific minimum for T; but be suspicious of T<10 • Large N is not required (though, it does not hurt)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved