Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Economic Analysis of Wage Differentials: Human Capital and Labor Markets, Study notes of Economics

An analysis of wage differentials between educated and uneducated workers using a one task linear model. topics such as return to education, instrumental variables estimation, and Nash equilibrium in labor markets. The document also discusses the role of risk aversion and contract design in labor markets.

Typology: Study notes

2021/2022

Uploaded on 08/05/2022

jacqueline_nel
jacqueline_nel 🇧🇪

4.4

(229)

506 documents

1 / 299

Toggle sidebar

Related documents


Partial preview of the text

Download Economic Analysis of Wage Differentials: Human Capital and Labor Markets and more Study notes Economics in PDF only on Docsity! Lectures in Labor Economics Daron Acemoglu David Autor Lectures in Labor Economics Chapter 11. Basic Equilibrium Search Framework 229 1. Motivation 229 2. The Basic Search Model 229 3. Efficiency of Search Equilibrium 239 4. Endogenous Job Destruction 242 5. A Two-Sector Search Model 247 Chapter 12. Composition of Jobs 253 1. Endogenous Composition of Jobs with Homogeneous Workers 253 2. Endogenous Composition of Jobs with Heterogeneous Workers 267 Chapter 13. Wage Posting and Directed Search 273 1. Inefficiency of Search Equilibria with Investments 273 2. The Basic Model of Directed Search 279 3. Risk Aversion in Search Equilibrium 287 v Part 1 Introduction to Human Capital Investments Lectures in Labor Economics empirical wage distributions, but there are many notable exceptions, some of which will be discussed later. Here it is useful to mention three: (1) Compensating differentials: a worker may be paid less in money, because he is receiving part of his compensation in terms of other (hard-to-observe) characteristics of the job, which may include lower effort requirements, more pleasant working conditions, better amenities etc. (2) Labor market imperfections: two workers with the same human capital may be paid different wages because jobs differ in terms of their productivity and pay, and one of them ended up matching with the high productivity job, while the other has matched with the low productivity one. (3) Taste-based discrimination: employers may pay a lower wage to a worker because of the worker’s gender or race due to their prejudices. In interpreting wage differences, and therefore in thinking of human capital in- vestments and the incentives for investment, it is important to strike the right bal- ance between assigning earning differences to unobserved heterogeneity, compensat- ing wage differentials and labor market imperfections. 2. Uses of Human Capital The standard approach in labor economics views human capital as a set of skills/characteristics that increase a worker’s productivity. This is a useful start- ing place, and for most practical purposes quite sufficient. Nevertheless, it may be useful to distinguish between some complementary/alternative ways of thinking of human capital. Here is a possible classification: (1) The Becker view: human capital is directly useful in the production process. More explicitly, human capital increases a worker’s productivity in all tasks, though possibly differentially in different tasks, organizations, and situa- tions. In this view, although the role of human capital in the production process may be quite complex, there is a sense in which we can think of it as represented (representable) by a unidimensional object, such as the stock 4 Lectures in Labor Economics of knowledge or skills, h, and this stock is directly part of the production function. (2) The Gardener view: according to this view, we should not think of human capital as unidimensional, since there are many many dimensions or types of skills. A simple version of this approach would emphasize mental vs. physical abilities as different skills. Let us dub this the Gardener view af- ter the work by the social psychologist Howard Gardener, who contributed to the development of multiple-intelligences theory, in particular emphasiz- ing how many geniuses/famous personalities were very “unskilled” in some other dimensions. (3) The Schultz/Nelson-Phelps view: human capital is viewed mostly as the capacity to adapt. According to this approach, human capital is especially useful in dealing with “disequilibrium” situations, or more generally, with situations in which there is a changing environment, and workers have to adapt to this. (4) The Bowles-Gintis view: “human capital” is the capacity to work in or- ganizations, obey orders, in short, adapt to life in a hierarchical/capitalist society. According to this view, the main role of schools is to instill in individuals the “correct” ideology and approach towards life. (5) The Spence view: observable measures of human capital are more a signal of ability than characteristics independently useful in the production process. Despite their differences, the first three views are quite similar, in that “human capital” will be valued in the market because it increases firms’ profits. This is straightforward in the Becker and Schultz views, but also similar in the Gardener view. In fact, in many applications, labor economists’ view of human capital would be a mixture of these three approaches. Even the Bowles-Gintis view has very similar implications. Here, firms would pay higher wages to educated workers because these workers will be more useful to the firm as they will obey orders better and will be more reliable members of the firm’s hierarchy. The Spence view is different from 5 Lectures in Labor Economics the others, however, in that observable measures of human capital may be rewarded because they are signals about some other characteristics of workers. We will discuss different implications of these views below. 3. Sources of Human Capital Differences It is useful to think of the possible sources of human capital differences before discussing the incentives to invest in human capital: (1) Innate ability: workers can have different amounts of skills/human capital because of innate differences. Research in biology/social biology has docu- mented that there is some component of IQ which is genetic in origin (there is a heated debate about the exact importance of this component, and some economists have also taken part in this). The relevance of this observation for labor economics is twofold: (i) there is likely to be heterogeneity in human capital even when individuals have access to the same investment opportunities and the same economic constraints; (ii) in empirical appli- cations, we have to find a way of dealing with this source of differences in human capital, especially when it’s likely to be correlated with other variables of interest. (2) Schooling: this has been the focus of much research, since it is the most easily observable component of human capital investments. It has to be borne in mind, however, that the R2 of earnings regressions that control for schooling is relatively small, suggesting that schooling differences account for a relatively small fraction of the differences in earnings. Therefore, there is much more to human capital than schooling. Nevertheless, the analysis of schooling is likely to be very informative if we presume that the same forces that affect schooling investments are also likely to affect non-schooling investments. So we can infer from the patterns of schooling investments what may be happening to non-schooling investments, which are more difficult to observe. 6 Lectures in Labor Economics where s (t) ∈ [0, 1] is the fraction of time that the individual spends for investments in schooling, and G : R2+ × [0, 1]→ R+ determines how human capital evolves as a function of time, the individual’s stock of human capital and schooling decisions. In addition, we can impose a further restriction on schooling decisions, for example, (1.3) s (t) ∈ S (t) , where S (t) ⊂ [0, 1] and may be useful to model constraints of the form s (t) ∈ {0, 1}, which would correspond to the restriction that schooling must be full-time (or other such restrictions on human capital investments). The individual is assumed to face an exogenous sequence of wage per unit of human capital given by [w (t)]Tt=0, so that his labor earnings at time t are W (t) = w (t) [1− s (t)] [h (t) + ω (t)] , where 1− s (t) is the fraction of time spent supplying labor to the market and ω (t) is non-human capital labor that the individual may be supplying to the market at time t. The sequence of non-human capital labor that the individual can supply to the market, [ω (t)]Tt=0, is exogenous. This formulation assumes that the only margin of choice is between market work and schooling (i.e., there is no leisure). Finally, let us assume that the individual faces a constant (flow) interest rate equal to r on his savings. Using the equation for labor earnings, the lifetime budget constraint of the individual can be written as (1.4) Z T 0 exp (−rt) c (t) dt ≤ Z T 0 exp (−rt)w (t) [1− s (t)] [h (t) + ω (t)] dt. The Separation Theorem, which is the subject of this section, can be stated as follows: Theorem 1.1. (Separation Theorem) Suppose that the instantaneous utility function u (·) is strictly increasing. Then the sequence h ĉ (t) , ŝ (t) , ĥ (t) iT t=0 is a solution to the maximization of (1.1) subject to (1.2), (1.3) and (1.4) if and only ifh ŝ (t) , ĥ (t) iT t=0 maximizes (1.5) Z T 0 exp (−rt)w (t) [1− s (t)] [h (t) + ω (t)] dt 9 Lectures in Labor Economics subject to (1.2) and (1.3), and [ĉ (t)]Tt=0 maximizes (1.1) subject to (1.4) givenh ŝ (t) , ĥ (t) iT t=0 . That is, human capital accumulation and supply decisions can be separated from consumption decisions. Proof. To prove the “only if” part, suppose that h ŝ (t) , ĥ (t) iT t=0 does not max- imize (1.5), but there exists ĉ (t) such that h ĉ (t) , ŝ (t) , ĥ (t) iT t=0 is a solution to (1.1). Let the value of (1.5) generated by h ŝ (t) , ĥ (t) iT t=0 be denoted Y . Sinceh ŝ (t) , ĥ (t) iT t=0 does not maximize (1.5), there exists [s (t) , h (t)]Tt=0 reaching a value of (1.5), Y 0 > Y . Consider the sequence [c (t) , s (t) , h (t)]Tt=0, where c (t) = ĉ (t) + ε. By the hypothesis that h ĉ (t) , ŝ (t) , ĥ (t) iT t=0 is a solution to (1.1), the budget con- straint (1.4) implies Z T 0 exp (−rt) ĉ (t) dt ≤ Y . Let ε > 0 and consider c (t) = ĉ (t) + ε for all t. We have thatZ T 0 exp (−rt) c (t) dt = Z T 0 exp (−rt) ĉ (t) dt+ [1− exp (−rT )] r ε. ≤ Y + [1− exp (−rT )] r ε. Since Y 0 > Y , for ε sufficiently small, R T 0 exp (−rt) c (t) dt ≤ Y 0 and thus [c (t) , s (t) , h (t)]Tt=0 is feasible. Since u (·) is strictly increasing, [c (t) , s (t) , h (t)]Tt=0 is strictly preferred to h ĉ (t) , ŝ (t) , ĥ (t) iT t=0 , leading to a contradiction and proving the “only if” part. The proof of the “if” part is similar. Suppose that h ŝ (t) , ĥ (t) iT t=0 maximizes (1.5). Let the maximum value be denoted by Y . Consider the maximization of (1.1) subject to the constraint that R T 0 exp (−rt) c (t) dt ≤ Y . Let [ĉ (t)]Tt=0 be a solution. This implies that if [c0 (t)]Tt=0 is a sequence that is strictly preferred to [ĉ (t)] T t=0, thenR T 0 exp (−rt) c0 (t) dt > Y . This implies that h ĉ (t) , ŝ (t) , ĥ (t) iT t=0 must be a solution to the original problem, because any other [s (t) , h (t)]Tt=0 leads to a value of (1.5) Y 0 ≤ Y , and if [c0 (t)]Tt=0 is strictly preferred to [ĉ (t)] T t=0, then R T 0 exp (−rt) c0 (t) dt > Y ≥ Y 0 for any Y 0 associated with any feasible [s (t) , h (t)]Tt=0. ¤ 10 Lectures in Labor Economics The intuition for this theorem is straightforward: in the presence of perfect capi- tal markets, the best human capital accumulation decisions are those that maximize the lifetime budget set of the individual. It can be shown that this theorem does not hold when there are imperfect capital markets. Moreover, this theorem also fails to hold when leisure is an argument of the utility function of the individual. Nevertheless, it is a very useful benchmarkas a starting point of our analysis. 5. Schooling Investments and Returns to Education We now turn to the simplest model of schooling decisions in partial equilibrium, which will illustrate the main tradeoffs in human capital investments. The model presented here is a version of Mincer’s (1974) seminal contribution. This model also enables a simple mapping from the theory of human capital investments to the large empirical literature on returns to schooling. Let us first assume that T = ∞, which will simplify the expressions. The flow rate of death, ν, is positive, so that individuals have finite expected lives. Suppose that (1.2) and (1.3) are such that the individual has to spend an interval S with s (t) = 1–i.e., in full-time schooling, and s (t) = 0 thereafter. At the end of the schooling interval, the individual will have a schooling level of h (S) = η (S) , where η (·) is an increasing, continuously differentiable and concave function. For t ∈ [S,∞), human capital accumulates over time (as the individual works) according to the differential equation (1.6) ḣ (t) = ghh (t) , for some gh ≥ 0. Suppose also that wages grow exponentially, (1.7) ẇ (t) = gww (t) , with boundary condition w (0) > 0. Suppose that gw + gh < r + ν, 11 Lectures in Labor Economics (1.13) ln ci + ln ĉi where ĉ is the consumption of the offspring. There is heterogeneity among children, so the cost of education, θi varies with i. In the second period skilled individuals (those with education) receive a wage ws and an unskilled worker receives wu. First, consider the case in which there are no credit market problems, so parents can borrow on behalf of their children, and when they do so, they pay the same interest rate, r, as the rate they would obtain by saving. Then, the decision problem of the parent with income yi is to maximize (1.13) with respect to ei, ci and ĉi, subject to the budget constraint: ci + ĉi 1 + r ≤ wu 1 + r + ei ws − wu 1 + r + yi − eiθi Note that ei does not appear in the objective function, so the education decision will be made simply to maximize the budget set of the consumer. This is the essence of the Separation Theorem, Theorem 1.1 above. In particular, here parents will choose to educate their offspring only if (1.14) θi ≤ ws − wu 1 + r One important feature of this decision rule is that a greater skill premium as captured by ws−wu will encourage schooling, while the higher interest rate, r, will discourage schooling (since schooling is a form of investment with upfront costs and delayed benefits). In practice, this solution may be difficult to achieve for a variety of reasons. First, there is the usual list of informational/contractual problems, creating credit constraints or transaction costs that introduce a wedge between borrowing and lend- ing rates (or even make borrowing impossible for some groups). Second, in many cases, it is the parents who make part of the investment decisions for their children, so the above solution involves parents borrowing to finance both the education expenses and also part of their own current consumption. These loans are then supposed to be paid back by their children. With the above setup, this arrangement 14 Lectures in Labor Economics works since parents are fully altruistic. However, if there are non-altruistic parents, this will create obvious problems. Therefore, in many situations credit problems might be important. Now imagine the same setup, but also assume that parents cannot have negative savings, which is a simple and severe form of credit market problems. This modifies the constraint set as follows ci ≤ yi − eiθi − si si ≥ 0 ĉi ≤ wu + ei (ws − wu) + (1 + r) s First note that for a parent with yi − eiθi > ws, the constraint of nonnegative savings is not binding, so the same solution as before will apply. Therefore, credit constraints will only affect parents who needed to borrow to finance their children’s education. To characterize the solution to this problem, let us look at the utilities from investinging and not investing in education of a parent. Also to simplify the discus- sion let us focus on parents who would not choose positive savings, that is, those parents with (1 + r) yi ≤ wu. The utilities from investing and not investing in education are given, respectively, by U(e = 1 | yi, θi) = ln(yi − θi) + lnws, and U(e = 0 | yi, θi) = ln yi + lnwu. Comparison of these two expressions implies that parents with θi ≤ yi ws − wu ws will invest in education. It is then straightforward to verify that: (1) This condition is more restrictive than (1.14) above, since (1 + r) yi ≤ wu < ws. (2) As income increases, there will be more investment in education, which contrasts with the non-credit-constrained case. 15 Lectures in Labor Economics One interesting implication of the setup with credit constraints is that the skill premium, ws − wu, still has a positive effect on human capital investments. How- ever, in more general models with credit constraints, the conclusions may be more nuanced. For example, if ws − wu increases because the unskilled wage, wu, falls, this may reduce the income level of many of the households that are marginal for the education decision, thus discourage investment in education. 7. Evidence on Human Capital Investments and Credit Constraints This finding, that income only matters for education investments in the presence of credit constraints, motivates investigations of whether there are significant differ- ences in the educational attainment of children from different parental backgrounds as a test of the importance of credit constraints on education decisions. In addition, the empirical relationship between family income and education is interesting in its own right. A typical regression would be along the lines of schooling=controls + α · log parental income which leads to positive estimates of α, consistent with credit constraints. The prob- lem is that there are at least two alternative explanations for why we may be esti- mating a positive α: (1) Children’s education may also be a consumption good, so rich parents will “consume” more of this good as well as other goods. If this is the case, the positive relationship between family income and education is not ev- idence in favor of credit constraints, since the “separation theorem” does not apply when the decision is not a pure investment (enters directly in the utility function). Nevertheless, the implications for labor economics are quite similar: richer parents will invest more in their children’s education. (2) The second issue is more problematic. The distribution of costs and bene- fits of education differ across families, and are likely to be correlated with income. That is, the parameter θi in terms of the model above will be 16 Lectures in Labor Economics mobility may be very nonlinear, with a lot of mobility among middle income fami- lies, but very little at the tails. Work by Solon and Zimmerman has dealt with the first two problems. They find that controlling for these issues increases the degree of persistence substantially to about 0.45 or even 0.55. The next figure shows Solon’s baseline estimates. Figure 1.1 A paper by Cooper, Durlauf and Johnson, in turn, finds that there is much more persistence at the top and the bottom of income distribution than at the middle. That the difference between 0.3 and 0.55 is in fact substantial can be seen by looking at the implications of using α = 0.55 in (1.16). Now the long-run income distribution will be substantially more disperse than the transitory shocks. More specifically, we will have σ2y ≈ 1.45 · σ2ε. To deal with the second empirical issue, one needs a source of exogenous variation in incomes to implement an IV strategy. There are no perfect candidates, but some imperfect ones exist. One possibility, pursued in Acemoglu and Pischke (2001), is to exploit changes in the income distribution that have taken place over the past 30 19 Lectures in Labor Economics years to get a source of exogenous variation in household income. The basic idea is that the rank of a family in the income distribution is a good proxy for parental human capital, and conditional on that rank, the income gap has widened over the past 20 years. Moreover, this has happened differentially across states. One can exploit this source of variation by estimating regression of the form (1.17) siqjt = δq + δj + δt + βq ln yiqjt + εiqjt, where q denotes income quartile, j denotes region, and t denotes time. siqjt is education of individual i in income quartile q region j time t. With no effect of income on education, βq’s should be zero. With credit constraints, we might expect lower quartiles to have positive β’s. Acemoglu and Pischke report versions of this equation using data aggregated to income quartile, region and time cells. The estimates of β are typically positive and significant, as shown in the next two tables. However, the evidence does not indicate that the β’s are higher for lower income quartiles, which suggests that there may be more to the relationship between income and education than simple credit constraints. Potential determinants of the rela- tionship between income and education have already been discussed extensively in the literature, but we still do not have a satisfactory understanding of why parental income may affect children’s educational outcomes (and to what extent it does so). 8. The Ben-Porath Model The baseline Ben-Porath model enriches the models we have seen so far by al- lowing human capital investments and non-trivial labor supply decisions throughout the lifetime of the individual. It also acts as a bridge to models of investment in human capital on-the-job, which we will discuss below. Let s (t) ∈ [0, 1] for all t ≥ 0. Together with the Mincer equation (1.12) above, the Ben-Porath model is the basis of much of labor economics. Here it is sufficient to consider a simple version of this model where the human capital accumulation equation, (1.2), takes the form (1.18) ḣ (t) = φ (s (t)h (t))− δhh (t) , 20 Lectures in Labor Economics Figure 1.2 where δh > 0 captures “depreciation of human capital,” for example because new machines and techniques are being introduced, eroding the existing human capital of the worker. The individual starts with an initial value of human capital h (0) > 0. The function φ : R+ → R+ is strictly increasing, continuously differentiable and strictly concave. Furthermore, we simplify the analysis by assuming that this function satisfies the Inada-type conditions, lim x→0 φ0 (x) =∞ and lim x→h(0) φ0 (x) = 0. 21 Lectures in Labor Economics is the elasticity of the function φ0 (·) and is positive since φ0 (·) is strictly decreasing (thus φ00 (·) < 0). Combining this equation with (1.20), we obtain (1.23) ẋ (t) x (t) = 1 εφ0 (x (t)) (r + ν + δh − φ0 (x (t))) . Figure 1.4 plots (1.18) and (1.23) in the h-x space. The upward-sloping curve corresponds to the locus for ḣ (t) = 0, while (1.23) can only be zero at x∗, thus the locus for ẋ (t) = 0 corresponds to the horizontal line in the figure. The arrows of motion are also plotted in this phase diagram and make it clear that the steady-state solution (h∗, x∗) is globally saddle-path stable, with the stable arm coinciding with the horizontal line for ẋ (t) = 0. Starting with h (0) ∈ (0, h∗), s (0) jumps to the level necessary to ensure s (0)h (0) = x∗. From then on, h (t) increases and s (t) decreases so as to keep s (t)h (t) = x∗. Therefore, the pattern of human capital investments implied by the Ben-Porath model is one of high investment at the beginning of an individual’s life followed by lower investments later on. In our simplified version of the Ben-Porath model this all happens smoothly. In the original Ben-Porath model, which involves the use of other inputs in the production of human capital and finite horizons, the constraint for s (t) ≤ 1 typically binds early on in the life of the individual, and the interval during which s (t) = 1 can be interpreted as full-time schooling. After full-time schooling, the individual starts working (i.e., s (t) < 1). But even on-the-job, the individual continues to accumulate human capital (i.e., s (t) > 0), which can be interpreted as spending time in training programs or allocating some of his time on the job to learning rather than production. Moreover, because the horizon is finite, if the Inada conditions were relaxed, the individual could prefer to stop investing in human capital at some point. As a result, the time path of human capital generated by the standard Ben- Porath model may be hump-shaped, with a possibly declining portion at the end. Instead, the path of human capital (and the earning potential of the individual) in the current model is always increasing as shown in Figure 1.5. The importance of the Ben-Porath model is twofold. First, it emphasizes that schooling is not the only way in which individuals can invest in human capital 24 Lectures in Labor Economics and there is a continuity between schooling investments and other investments in human capital. Second, it suggests that in societies where schooling investments are high we may also expect higher levels of on-the-job investments in human capital. Thus there may be systematic mismeasurement of the amount or the quality human capital across societies. This model also provides us with a useful way of thinking of the lifecycle of the individual, which starts with higher investments in schooling, and then there is a period of “full-time” work (where s (t) is high ), but this is still accompanied by investment in human capital and thus increasing earnings. The increase in earnings takes place at a slower rate as the individual ages. There is also some evidence that earnings may start falling at the very end of workers’ careers, though this does not happen in the simplified version of the model presented here (how would you modify it to make sure that earnings may fall in equilibrium?). The available evidence is consistent with the broad patterns suggested by the model. Nevertheless, this evidence comes from cross-sectional age-experience pro- files, so it has to be interpreted with some caution (in particular, the decline at the very end of an individual’s life cycle that is found in some studies may be due to “selection,” as the higher-ability workers retire earlier). Perhaps more worrisome for this interpretation is the fact that the increase in earnings may reflect not the accumulation of human capital due to investment, but either: (1) simple age effects; individuals become more productive as they get older. Or (2) simple experience effects: individuals become more productive as they get more experienced–this is independent of whether they choose to invest or not. It is difficult to distinguish between the Ben-Porath model and the second ex- planation. But there is some evidence that could be useful to distinguish between age effects vs. experience effects (automatic or due to investment). 25 Lectures in Labor Economics Josh Angrist’s paper on Vietnam veterans basically shows that workers who served in the Vietnam War lost the experience premium associated with the years they served in the war. This is shown in the next figure. Presuming that serving in the war has no productivity effects, this evidence suggests that much of the age-earnings profiles are due to experience not simply due to age. Nevertheless, this evidence is consistent both with direct experience effects on worker productivity, and also a Ben Porath type explanation where workers are purposefully investing in their human capital while working, and experience is proxying for these investments. 9. Selection and Wages–The One-Factor Model Issues of selection bias arise often in the analysis of education, migration, labor supply, and sectoral choice decisions. This section illustrates the basic issues of selec- tion using a single-index model, where each individual possesses a one-dimensional skill. Richer models, such as the famous Roy model of selection, incorporate multi- dimensional skills. While models with multi-dimensional skills make a range of additional predictions, the major implications of selection for interpreting wage dif- ferences across different groups can be derived using the single-index model. Suppose that individuals are distinguished by an unobserved type, z, which is assumed to be distributed uniformly between 0 and 1. Individuals decide whether to obtain education, which costs c. The wage of an individual of type z when he has no education is w0 (z) = z and when he obtains education, it is (1.24) w1 (z) = α0 + α1z, where α0 > 0 and α1 > 1. α0 is the main effect of education on earnings, which applies irrespective of ability, whereas α1 interacts with ability. The assumption that α1 > 1 implies that education is complementary to ability, and will ensure that high-ability individuals are “positively selected” into education. 26 LECTURES IN LABOR ECONOMICS Table 5 Fixed effects regressions for the probability of attending college within two years of high school effects by income quartile region by income quartile cells, 1972-1992* Ever attending any college Ever attending four-year college Independent variable (1) (2) Q) @ (5) (6) 0) (8) Log mean family 0.018 0.154 0.139 — 0.039 0.010 0.108 0.064 — 0,016 income (0.143) (0.056) (0.064) (0.187) (0.085) (0.052) (0.053) (0.190) Quartile 1 Log mean family 0.229 0.189 0.167 0.201 0.151 0,128 0.087 — 0.205 income (0.258) (0.113) (@NN7) (0.334) (0.153) (0.105) 101) (0.339) Quartile 2 Log mean family 0.617 0.161 148 0.328 0428 0.174 0.150 — 0.039 income (0.273) (0.116) (0.129) (0.283) (0.162) (0.107) 0.112) (0.287) Quartile 3 Log mean family 0.405 0.012 — 0.005 0.231 0.392 0.212 0.183 0.147 income (0.152) (0.071) (0.072) (@.132) (0.092) (0.066) (0.063) (0.134) Quartile 4 Return to college 0.691 — 1.049 — — 0.053 _— — 1577 _ Quartile 1 (1.052) (0.759) (0.623) (0.659) Return to college L144 — — 1.032 _ 0.599 ~ 1.121 Quantite 2 (0.938) (0.726) (0.556) (0.630) Return to college 0481 _ 0.963 OLE _— — LS Quartile 3 (1.050) (0.722) (0.622) (0.627) Return to college 1.367 _— — 0.438 - 1.304 — 0.226 — Quartile 4 (0.952) (0.723) (0.564) (0.627) Region effects Yes Yes Yes Yes Yes Yes Yes Yes Income quartile Yes ‘Yes Yes Yes Yes Yes Yes Yes effeets Year effects No Yes, Yes Yes No Yes Yes Yes Income quartile No No No Yes No Yes. Yes Yes x Region effects Income quartile No No No Yes No Yes Yes Yes x Year effects Region x Year No No No ‘Yes No No No Yes effects “Data are cell level means for 4 Census regions, 4 years, and 4 quartiles for the income of the student's family. Number of cells is 64. Dependent variable is the fraction of students enrolled in any college or in a four-year college within two years of high school graduation calculated from the NLS-72, HSB Senior and Sophomore cohorts, and the NELS. Students left high school in 1972, 1980, 1982, and 1992. Return to college is the relative wage of those with exactly 4 years of college to those with a high school degree (for workers with 1-5 years of experience) calculated from the Census for 1970, 1980, and 1990, 29 FIGURE 1.3 Lectures in Labor Economics h(t) 0 h(t)=0 h* x* x(t) x(t)=0 h(0) x’’(0) x’(0) Figure 1.4. Steady state and equilibrium dynamics in the simplified Ben Porath model. 30 Lectures in Labor Economics h(t) t 0 h* h(0) Figure 1.5. Time path of human capital investments in the simpli- fied Ben Porath model. 31 CHAPTER 2 Human Capital and Signaling 1. The Basic Model of Labor Market Signaling The models we have discussed so far are broadly in the tradition of Becker’s approach to human capital. Human capital is viewed as an input in the production process. The leading alternative is to view education purely as a signal. Consider the following simple model to illustrate the issues. There are two types of workers, high ability and low ability. The fraction of high ability workers in the population is λ. Workers know their own ability, but employers do not observe this directly. High ability workers always produce yH , while low ability workers produce yL. In addition, workers can obtain education. The cost of obtaining education is cH for high ability workers and cL for low ability workers. The crucial assumption is that cL > cH , that is, education is more costly for low ability workers. This is often referred to as the “single-crossing” assumption, since it makes sure that in the space of education and wages, the indifference curves of high and low types intersect only once. For future reference, let us denote the decision to obtain education by e = 1. For simplicity, we assume that education does not increase the productivity of either type of worker. Once workers obtain their education, there is competition among a large number of risk-neutral firms, so workers will be paid their expected productivity. More specifically, the timing of events is as follows: • Each worker finds out their ability. • Each worker chooses education, e = 0 or e = 1. • A large number of firms observe the education decision of each worker (but not their ability) and compete a la Bertrand to hire these workers. 35 Lectures in Labor Economics Clearly, this environment corresponds to a dynamic game of incomplete informa- tion, since individuals know their ability, but firms do not. In natural equilibrium concept in this case is the Perfect Bayesian Equilibrium. Recall that a Perfect Bayesian Equilibrium consists of a strategy profile σ (designating a strategy for each player) and a brief profile μ (designating the beliefs of each player at each information set) such that σ is sequentially rational for each player given μ (so that each player plays the best response in each information set given their beliefs) and μ is derived from σ using Bayes’s rule whenever possible. While Perfect Bayesian Equilibria are straightforward to characterize and often reasonable, in incomplete information games where players with private information move before those with- out this information, there may also exist Perfect Bayesian Equilibria with certain undesirable characteristics. We may therefore wish to strengthen this notion of equilibrium (see below). In general, there can be two types of equilibria in this game. (1) Separating, where high and low ability workers choose different levels of schooling, and as a result, in equilibrium, employers can infer worker ability from education (which is a straightforward application of Bayesian updat- ing). (2) Pooling, where high and low ability workers choose the same level of edu- cation. In addition, there can be semi-separating equilibria, where some education levels are chosen by more than one type. 1.1. A separating equilibrium. Let us start by characterizing a possible sep- arating equilibrium, which illustrates how education can be valued, even though it has no directly productive role. Suppose that we have (2.1) yH − cH > yL > yH − cL 36 Lectures in Labor Economics To illustrate the main idea, let us simplify the discussion by slightly strengthening condition (2.1) to (2.2) yH − cH > (1− λ) yL + λyH and yL > yH − cL. Now take the pooling equilibrium above. Consider a deviation to e = 1. There is no circumstance under which the low type would benefit from this deviation, since by assumption (2.2) we have yL > yH − cL, and the most a worker could ever get is yH , and the low ability worker is now getting (1− λ) yL+λyH . Therefore, firms can deduce that the deviation to e = 1 must be coming from the high type, and offer him a wage of yH . Then (2.2) also ensures that this deviation is profitable for the high types, breaking the pooling equilibrium. The reason why this refinement is referred to as “The Intuitive Criterion” is that it can be supported by a relatively intuitive “speech” by the deviator along the following lines: “you have to deduce that I must be the high type deviating to e = 1, since low types would never ever consider such a deviation, whereas I would find it profitable if I could convince you that I am indeed the high type).” You should bear in mind that this speech is used simply as a loose and intuitive description of the reasoning underlying this equilibrium refinement. In practice there are no such speeches, because the possibility of making such speeches has not been modeled as part of the game. Nevertheless, this heuristic device gives the basic idea. The overall conclusion is that as long as the separating condition is satisfied, we expect the equilibrium of this economy to involve a separating allocation, where education is valued as a signal. 2. Generalizations It is straightforward to generalize this equilibrium concept to a situation in which education has a productive role as well as a signaling role. Then the story would be one where education is valued for more than its productive effect, because it is also associated with higher ability. 39 Lectures in Labor Economics Figure 2.1 Let me give the basic idea here. Imagine that education is continuous e ∈ [0,∞). And the cost functions for the high and low types are cH (e) and cL (e), which are both strictly increasing and convex, with cH (0) = cL (0) = 0. The single crossing property is that c0H (e) < c0L (e) for all e ∈ [0,∞), that is, the marginal cost of investing in a given unit of education is always higher for the low type. Figure 3.1 shows these cost functions. Moreover, suppose that the output of the two types as a function of their edu- cations are yH (e) and yL (e), with yH (e) > yL (e) for all e. Figure 2.2 shows the first-best, which would arise in the absence of incomplete information. 40 Lectures in Labor Economics Figure 2.2. The first best allocation with complete information. In particular, as the figure shows, the first best involves effort levels (e∗l , e ∗ h) such that (2.3) y0L (e ∗ l ) = c0L (e ∗ l ) and (2.4) y0H (e ∗ h) = c0H (e ∗ h) . With incomplete information, there are again many equilibria, some separating, some pooling and some semi-separating. But applying a stronger form of the In- tuitive Criterion reasoning, we will pick the Riley equilibrium of this game, which is a particular separating equilibrium. It is characterized as follows. We first find the most preferred education level for the low type in the perfect information case, which coincides with the first best e∗l determined in (2.3). Then we can write the 41 Lectures in Labor Economics where the first line is introduced by adding and subtracting cL (eh). The second line follows from single crossing, since cH (eh)− cL (eh) < cH (e ∗ l )− cL (e ∗ l ) in view of the fact that e∗l < eh. The third line exploits (2.6), and the final line simply cancels the two cL (e∗l ) terms from the right hand side. Figure 2.3 depicts this equilibrium diagrammatically (for clarity it assumes that yH (e) and yL (e) are linear in e). Notice that in this equilibrium, high type workers invest more than they would have done in the perfect information case, in the sense that eh characterized here is greater than the education level that high type individuals chosen with perfect information, given by e∗h in (2.4). 3. Evidence on Labor Market Signaling Is the signaling role of education important? There are a number of different ways of approaching this question. Unfortunately, direct evidence is difficult to find since ability differences across workers are not only unobserved by firms, but also by econometricians. Nevertheless, number of different strategies can be used to gauge the importance of signaling in the labor market. Here we will discuss a number of different attempts that investigate the importance of labor market signaling. In the next section, we will discuss empirical work that may give a sense of how important signaling considerations are in the aggregate. Before this discussion, note the parallel between the selection stories discussed above and the signaling story. In both cases, the observed earnings differences between high and low education workers will include a component due to the fact that the abilities of the high and low education groups differ. There is one important difference, however, in that in the selection stories, the market observed ability, it was only us, the economists or the econometricians, who were unable to do so. In the signaling story, the market is also unable to observed ability, and is inferring it from education. For this reason, proper evidence in favor of the signaling story should go beyond documenting the importance of some type of “selection”. 44 Lectures in Labor Economics There are four different approaches to determining whether signaling is impor- tant. The first line of work looks at whether degrees matter, in particular, whether a high school degree or the fourth year of college that gets an individual a university degree matter more than other years of schooling (e.g., Kane and Rouse). This approach suffers from two serious problems. First, the final year of college (or high school) may in fact be more useful than the third-year, especially because it shows that the individual is being able to learn all the required information that makes up a college degree. Second, and more serious, there is no way of distinguishing selec- tion and signaling as possible explanations for these patterns. It may be that those who drop out of high school are observationally different to employers, and hence receive different wages, but these differences are not observed by us in the standard data sets. This is a common problem that will come back again: the implications of unobserved heterogeneity and signaling are often similar. Second, a creative paper by Lang and Kropp tests for signaling by looking at whether compulsory schooling laws affect schooling above the regulated age. The reasoning is that if the 11th year of schooling is a signal, and the government legis- lates that everybody has to have 11 years of schooling, now high ability individuals have to get 12 years of schooling to distinguish themselves. They find evidence for this, which they interpret as supportive of the signaling model. The problem is that there are other reasons for why compulsory schooling laws may have such effects. For example, an individual who does not drop out of 11th grade may then decide to complete high school. Alternatively, there can be peer group effects in that as fewer people drop out of school, it may become less socially acceptable the drop out even at later grades. The third approach is the best. It is pursued in a very creative paper by Tyler, Murnane and Willett. They observe that passing grades in the Graduate Equivalent Degree (GED) differ by state. So an individual with the same grade in the GED exam will get a GED in one state, but not in another. If the score in the exam is an unbiased measure of human capital, and there is no signaling, these two individuals 45 Lectures in Labor Economics should get the same wages. In contrast, if the GED is a signal, and employers do not know where the individual took the GED exam, these two individuals should get different wages. Using this methodology, the authors estimate that there is a 10-19 percent return to a GED signal. The attached table shows the results. An interesting result that Tyler, Murnane and Willett find is that there are no GED returns to minorities. This is also consistent with the signaling view, since it turns out that many minorities prepare for and take the GED exam in prison. Therefore, GED would not only be a positive signal about ability, but also potentially a signal that the individual was at some point incarcerated. This latter feature makes a GED less of that positive signal for minorities. 46 Lectures in Labor Economics the second period. The labor market is not competitive; instead, firms and workers are matched randomly, and each firm meets a worker. The only decision workers and firms make after matching is whether to produce together or not to produce at all (since there are no further periods). If firm f and worker i produce together, their output is (3.3) kαf h ν i , where α < 1, ν ≤ 1− α. Since it is costly for the worker-firm pair to separate and find new partners in this economy, employment relationships generate quasi-rents. Wages will therefore be determined by rent-sharing. Here, simply assume that the worker receives a share β of this output as a result of bargaining, while the firm receives the remaining 1− β share. An equilibrium in this economy is a set of schooling choices for workers and a set of physical capital investments for firms. Firm f maximizes the following expected profit function: (3.4) (1− β)kαfE[hνi ]−Rkf , with respect to kf . Since firms do not know which worker they will be matched with, their expected profit is an average of profits from different skill levels. The function (3.4) is strictly concave, so all firms choose the same level of capital investment, kf = k, given by (3.5) k = µ (1− β)αH R ¶1/(1−α) , where H ≡ E[hνi ] is the measure of aggregate human capital. Substituting (3.5) into (3.3), and using the fact that wages are equal to a fraction β of output, the wage income of individual i is given by Wi = β ((1− β)αH)α/(1−α)R−α/(1−α)(hi) ν. Taking logs, this is: (3.6) lnWi = c+ α 1− α lnH + ν lnhi, where c is a constant and α/ (1− α) and ν are positive coefficients. 49 Lectures in Labor Economics Human capital externalities arise here because firms choose their physical capital in anticipation of the average human capital of the workers they will employ in the future. Since physical and human capital are complements in this setup, a more educated labor force encourages greater investment in physical capital and to higher wages. In the absence of the need for search and matching, firms would immediately hire workers with skills appropriate to their investments, and there would be no human capital externalities. Nonpecuniary and pecuniary theories of human capital externalities lead to sim- ilar empirical relationships since equation (3.6) is identical to equation (3.2), with c = lnB and δ = α/ (1− α). Again presuming that these interactions exist in local labor markets, we can estimate a version of (3.2) using differences in schooling across labor markets (cities, states, or even countries). 1.3. Signaling and negative externalities. The above models focused on positive externalities to education. However, in a world where education plays a signaling role, we might also expect significant negative externalities. To see this, consider the most extreme world in which education is only a signal–it does not have any productive role. Contrast two situations: in the first, all individuals have 12 years of schooling and in the second all individuals have 16 years of schooling. Since education has no productive role, and all individuals have the same level of schooling, in both allocations they will earn exactly the same wage (equal to average productivity). Therefore, here the increase in aggregate schooling does not translate into aggregate increases in wages. But in the same world, if one individual obtains more education than the rest, there will be a private return to him, because he would signal that he is of higher ability. Therefore, in a world where signaling is important, we might also want to estimate an equation of the form (3.2), but when signaling issues are important, we would expect δ to be negative. The basic idea here is that in this world, what determines an individual’s wages is his “ranking” in the signaling distribution. When others invest more in their 50 Lectures in Labor Economics education, a given individual’s rank in the distribution declines, hence others are creating a negative externality on this individual via their human capital investment. 2. Evidence Ordinary Least Squares (OLS) estimation of equations like (3.2) using city or state-level data yield very significant and positive estimates of δ, indicating substan- tial positive human capital externalities. The leading example is the paper by Jim Rauch. There are at least two problems with this type OLS estimates. First, it may be precisely high-wage cities or states that either attract a large number of high education workers or give strong support to education. Rauch’s estimates were using a cross-section of cities. Including city or state fixed affects ameliorates this problem, but does not solve it, since states’ attitudes towards education and the demand for labor may comove. The ideal approach would be to find a source of quasi- exogenous variation in average schooling across labor markets (variation unlikely to be correlated with other sources of variation in the demand for labor in the state). Acemoglu and Angrist try to accomplish this using differences in compulsory schooling laws. The advantage is that these laws not only affect individual schooling but average schooling in a given area. There is an additional econometric problem in estimating externalities, which remains even if we have an instrument for average schooling in the aggregate. This is that if individual schooling is measured with error (or for some other reason OLS returns to individual schooling are not the causal effect), some of this discrepancy between the OLS returns and the causal return may load on average schooling, even when average schooling is instrumented. This suggests that we may need to instrument for individual schooling as well (so as to get to the correct return to individual schooling). More explicitly, let Yijt be the log weekly wage, than the estimating equation is (3.7) Yijt = X 0 iμ+ δj + δt + γ1Sjt + γ2isi + ujt + εi, 51 Lectures in Labor Economics that in the aggregate signaling considerations are unlikely to be very important (at the very least, they do not dominate positive externalities). 3. School Quality Differences in school quality could be a crucial factor in differences in human capital. Two individuals with the same years of schooling might have very different skills and very different earnings because one went to a much better school, with better teachers, instruction and resources. Differences in school quality would add to the unobserved component of human capital. A natural conjecture is that school quality as measured by teacher-pupil ratios, spending per-pupil, length of school year, and educational qualifications of teachers would be a major determinant of human capital. If school quality matters indeed a lot, an effective way of increasing human capital might be to increase the quality of instruction in schools. This view was however challenged by a number of economists, most notably, Hanushek. Hanushek noted that the substantial increase in spending per student and teacher-pupil ratios, as well as the increase in the qualifications of teachers, was not associated with improved student outcomes, but on the contrary with a deterioration in many measures of high school students’ performance. In addition, Hanushek conducted a meta-analysis of the large number of papers in the education literature, and concluded that there was no overwhelming case for a strong effect of resources and class size on student outcomes. Although this research has received substantial attention, a number of careful papers show that exogenous variation in class size and other resources are in fact associated with sizable improvements in student outcomes. Most notable: (1) Krueger analyzes the data from the Tennessee Star experiment where stu- dents were randomly allocated to classes of different sizes. 54 Lectures in Labor Economics (2) Angrist and Lavy analyze the effect of class size on test scores using a unique characteristic of Israeli schools which caps class size at 40, thus creating a natural regression discontinuity as a function of the total number of students in the school. (3) Card and Krueger look at the effects of pupil-teacher ratio, term length and relative teacher wage by comparing the earnings of individuals working in the same state but educated in different states with different school resources. (4) Another paper by Card and Krueger looks at the effect of the “exogenously” forced narrowing of the resource gap between black and white schools in South Carolina on the gap between black and white pupils’ education and subsequent earnings. All of these papers find sizable effects of school quality on student outcomes. Moreover, a recent paper by Krueger shows that there were many questionable decisions in the meta-analysis by Hanushek, shedding doubt on the usefulness of this analysis. On the basis of these various pieces of evidence, it is safe to conclude that school quality appears to matter for human capital. 4. Peer Group Effects Issues of school quality are also intimately linked to those of externalities. An important type of externality, different from the external returns to education dis- cussed above, arises in the context of education is peer group effects, or generally social effects in the process of education. The fact that children growing up in different areas may choose different role models will lead to this type of externali- ties/peer group effects. More simply, to the extent that schooling and learning are group activities, there could be this type of peer group effects. There are a number of theoretical issues that need to be clarified, as well as important work that needs to be done in understanding where peer group effects are coming from. Moreover, empirical investigation of peer group effects is at its 55 Lectures in Labor Economics infancy, and there are very difficult issues involved in estimation and interpretation. Since there is little research in understanding the nature of peer group effects, here we will simply take peer group effects as given, and briefly discuss some of its efficiency implications, especially for community structure and school quality, and then very briefly mention some work on estimating peer group affects. 4.1. Implications of peer group effects for mixing and segregation. An important question is whether the presence of peer group effects has any particular implications for the organization of schools, and in particular, whether children who provide positive externalities on other children should be put together in a separate school or classroom. The basic issue here is equivalent to an assignment problem. The general princi- ple in assignment problems, such as Becker’s famous model of marriage, is that if in- puts from the two parties are complementary, there should be assortative matching, that is the highest quality individuals should be matched together. In the context of schooling, this implies that children with better characteristics, who are likely to create more positive externalities and be better role models, should be segregated in their own schools, and children with worse characteristics, who will tend to create negative externalities will, should go to separate schools. This practically means segregation along income lines, since often children with “better characteristics” are those from better parental backgrounds, while children with worse characteristics are often from lower socioeconomic backgrounds So much is well-known and well understood. The problem is that there is an important confusion in the literature, which involves deducing complementarity from the fact that in equilibrium we do observe segregation (e.g., rich parents sending their children to private schools with other children from rich parents, or living in suburbs and sending their children to suburban schools, while poor parents live in ghettos and children from disadvantaged backgrounds go to school with other disadvantaged children in inner cities). This reasoning is often used in discussions of Tiebout competition, together with the argument that allowing parents with 56 Lectures in Labor Economics where φ > 1 and λ > 0 but small, so that human capital is increasing in parental background. With this production function, we again have ∂h1/∂e2 > 0 and ∂h2/∂e1 > 0, but now in contrast to (3.15), we now have ∂2h1 ∂e2∂e1 and ∂2h2 ∂e1∂e2 < 0. This can be thought as corresponding to the “good apple” theory of the classroom, where the kids with the best characteristics and attitudes bring the rest of the class up. In this case, because the cross-partial derivative is negative, the marginal will- ingness to pay of low-background parents to have their kid together with high- background parents is higher than that of high-background parents. With perfect markets, we will observe mixing, and in equilibrium schools will consist of a mixture of children from high- and low-background parents. Now combining the outcomes of these two models, many people jump to the conclusion that since we do observe segregation of schooling in practice, parental backgrounds must be complementary, so segregation is in fact efficient. Again the conclusion is that allowing Tiebout competition and parental sorting will most likely achieve efficient outcomes. However, this conclusion is not correct, since even if the correct production func- tion was (3.17), segregation would arise in the presence of credit market problems. In particular, the way that mixing is supposed to occur with (3.17) is that low- background parents make a payment to high-background parents so that the latter send their children to a mixed school. To see why such payments are necessary, recall that even with (3.17) we have that the first derivatives are positive, that is ∂h1 ∂e2 > 0 and ∂h2 ∂e1 > 0. This means that everything else being equal all children benefit from being in the same class with other children with good backgrounds. With (3.17), however, chil- dren from better backgrounds benefit less than children from less good backgrounds. 59 Lectures in Labor Economics This implies that there has to be payments from parents of less good backgrounds to high-background parents. Such payments are both difficult to implement in practice, and practically im- possible taking into account the credit market problems facing parents from poor socioeconomic status. This implies that, if the true production function is (3.17) but there are credit market problems, we will observe segregation in equilibrium, and the segregation will be inefficient. Therefore we cannot simply appeal to Tiebout competition, or deduce efficiency from the equilibrium patterns of sorting. Another implication of this analysis is that in the absence of credit market problems (and with complete markets), cross-partials determine the allocation of students to schools. With credit market problems, first there of it has become important. This is a general result, with a range of implications for empirical work. 4.2. The Benabou model. A similar point is developed by Benabou even in the absence of credit market problems, but relying on other missing markets. His model has competitive labor markets, and local externalities (externalities in schooling in the local area). All agents are assumed to be ex ante homogeneous, and will ultimately end up either low skill or high skill. Utility of agent i is assumed to be U i = wi − ci − ri where w is the wage, c is the cost of education, which is necessary to become both low skill or high skill, and r is rent. The cost of education is assumed to depend on the fraction of the agents in the neighborhood, denoted by x, who become high skill. In particular, we have cH (x) and cL (x) as the costs of becoming high skill and low skill. Both costs are decreasing in x, meaning that when there are more individuals acquiring high skill, becoming high skill is cheaper (positive peer group effects). In addition, we have cH (x) > cL (x) 60 Lectures in Labor Economics Figure 3.1 so that becoming high skill is always more expensive, and as shown in Figure 3.1 c0H (x) < c0L (x) , so that the effect of increase in the fraction of high skill individuals in the neighbor- hood is bigger on the cost of becoming high skill. Since all agents are ex ante identical, in equilibrium we must have U (L) = U (H) that is, the utility of becoming high skill and low skill must be the same. Assume that the labor market in the economy is global, and takes the constant returns to scale form F (H,L). The important implication here is that irrespective of where the worker obtains his education, he will receive the same wage as a function of his skill level. Also assume that there are two neighborhoods of fixed size, and individuals will compete in the housing market to locate in one neighborhood or the other. As shown in Figures 3.2 and 3.3, there can be two types of equilibria: 61 Lectures in Labor Economics It turns out that it may or may not. To see this consider the problem of a utili- tarian social planner maximizing total output minus costs of education for workers. This implies that the social planner will maximize F (H,L)−H1cH (x1)−H2cH (x2)− L1cL (x1)− L2cL (x2) where x1 = H1 L1 +H1 and x2 = H2 L2 +H2 This problem can be broken into two parts: first, the planner will choose the ag- gregate amount of skilled individuals, and then she will choose how to actually allocate them between the two neighborhoods. The second part is simply one of cost minimization, and the solution depends on whether Φ (x) = xcH (x) + (1− x) cL (x) is concave or convex. This function is simply the cost of giving high skills to a fraction x of the population. When it is convex, it means that it is best to choose the same level of x in both neighborhoods, and when it is concave, the social planner minimizes costs by choosing two extreme values of x in the two neighborhoods. It turns out that this function can be convex, i.e. Φ00 (x) > 0. More specifically, we have: Φ00 (x) = 2 (c0H (x)− c0L (x)) + x (c00H (x)− c00L (x)) + c00L (x) We can have Φ00 (x) > 0 when the second and third terms are large. Intuitively, this can happen because although a high skill individual benefits more from being together with other high skill individuals, he is also creating a positive externality on low skill individuals when he mixes with them. This externality is not internalized, potentially leading to inefficiency. This model gives another example of why equilibrium segregation does not imply efficient segregation. 4.3. Empirical issues and evidence. Peer group effects are generally difficult to identify. In addition, we can think of two alternative formulations where one is practically impossible to identify satisfactorily. To discuss these issues, let us go back 64 Lectures in Labor Economics to the previous discussion, and recall that the two “structural” formulations, (3.15) and (3.16), have very similar reduced forms, but the peer group effects work quite differently, and have different interpretations. In (3.15), it is the (predetermined) characteristics of my peers that determine my outcomes, whereas in (3.16), it is the outcomes of my peers that matter. Above we saw how to identify externalities in human capital, which is in essence similar to the structural form in (3.15). More explicitly, the equation of interest is (3.18) yij = θxij + αX̄j + εij where X̄ is average characteristic (e.g., average schooling) and yij is the outcome of the ith individual in group j. Here, for identification all we need is exogenous variation in X̄. The alternative is (3.19) yij = θxij + αȲj + εij where Ȳ is the average of the outcomes. Some reflection will reveal why the parame- ter α is now practically impossible to identify. Since Ȳj does not vary by individual, this regression amounts to one of Ȳj on itself at the group level. This is a serious econometric problem. One imperfect way to solve this problem is to replace Ȳj on the right hand side by Ȳ −ij which is the average excluding individual i. Another approach is to impose some timing structure. For example: yijt = θxijt + αȲj,t−1 + εijt There are still some serious problems irrespective of the approach taken. First, the timing structure is arbitrary, and second, there is no way of distinguishing peer group effects from “common shocks”. As an example consider the paper by Sacerdote, which uses random assignment of roommates in Dartmouth. He finds that the GPAs of randomly assigned roommates are correlated, and interprets this as evidence for peer group effects. The next table summarizessome of the key results. 65 Lectures in Labor Economics Figure 3.4 Despite the very nice nature of the experiment, the conclusion is problematic, because Sacerdote attempts to identify (3.19) rather than (3.18). For example, to the extent that there are common shocks to both roommates (e.g., they are in a noisier dorm), this may not reflect peer group effects. Instead, the problem would not have arisen if the right-hand side regressor was some predetermined characteristic of the 66 Part 2 Incentives, Agency and Efficiency Wages A key issue in all organizations is how to give the right incentives to employees. This topic is central to contract theory and organizational economics, but it also needs to be taken into account in labor economics, especially in order to better understand the employment relationship. Here we give a quick overview of the main issues. CHAPTER 4 Moral Hazard: Basic Models Moral hazard refers to a situation where individual takes a “hidden action” that affects the payoffs to his employer (the principal). We generally think of this as the level of “effort”, but other actions, such as the composition of effort, the allocation of time, or even stealing, are potential examples of moral hazard-type behavior. Although effort is not observed, some of the outcomes that the principal cares about, such as output or performance, are observed. Because the action is hidden, the principal cannot simply dictate the level of effort. She has to provide incentives through some other means. The simplest way to approach the problem is to think of the principal as providing “high-powered” incentives, and rewarding success. This will work to some degree, but will run into two sorts of problems; (1) Limited liability (2) Risk More explicitly, high-powered incentives require the principal to punish the agent as well as to reward him, but limited liability (i.e., the fact that the agent cannot be paid a negative wage in many situations) implies that this is not possible. Therefore high-powered incentives come at the expense of high average level of payments. The risk problem is that rewarding the agent as a function of performance con- flicts with optimal risk sharing between the principal and the agent. Generally, we think of the agent as earning most of his living from this wage income, whereas the principal employs a number of similar agents, or is a corporation with diffuse ownership. In that case, we can think of the firm as risk neutral and the employ- ment contract should not only provide incentives to the agent, but also insure him 71 Lectures in Labor Economics Digression: what is the difference between observable and contractible? What happens if something is observable only by the principal and the agent, but by nobody else? What we have here is a dynamic game, so the timing is important. It is: Timing: (1) The principal offers a contract s : Ω→ R to the agent. (2) The agent accepts or rejects the contract. If he rejects the contract, he receives his outside utility H. (3) If the agent accepts the contract s : Ω→ R, then he chooses effort a. (4) Nature draws θ, determining x(a, θ). (5) Agent receives the payment specified by contract s. This is a game of incomplete information and as in signaling games, we will look for a Perfect Bayesian Equilibrium. However, in this context, the concept of Perfect Bayesian Equilibrium will be strong enough. 2. Incentives without Asymmetric Information Let us start with the case of full information. Then the problem is straightfor- ward. The principal chooses both the contract s(x, a) (why is it a function of both x and a?), and the agents chooses a. The Perfect Bayesian Equilibrium can be characterized by backward induction. The first interesting action is at step 3, where the agent chooses the effort level given the contract and then at step 2, where the agent decides whether to accept contract s. Given what types of contracts will be accepted by the agent and what the corresponding effort level will be, at step 1 the principal chooses the contract that maximizes her utility. With analogy to oligopoly games, we can think of the principal, who moves first, as a Stackleberg leader. As usual with Stackelberg leaders, when choosing the contract the principal anticipates the action that the agent will choose. Thus, we should think of the principal as choosing the effort level as well, and the optimization condition of the agent will be 74 Lectures in Labor Economics a constraint for the principal. This is what we refer to as the incentive compatibility constraint (IC). Thus the problem is max s(x,a),a E [V (x− s(x, a)] s.t. E [H(s(x, a), a)] ≥ H Participation Constraint (PC) and a ∈ argmax a0 E [H(s(x, a0), a0)] Incentive Constraint (IC) where expectations are taken over the distribution of θ. This problem has exactly the same structure as the canonical moral hazard prob- lem, but is much simpler, because the principal is choosing s (x, a). In particular, she can choose s such that s (x, a) = −∞ for all a 6= a∗, thus effectively implement- ing a∗. This is because there is no moral hazard problem here given that there is no hidden action. Therefore, presuming that the level of effort a∗ is the optimum from the point of view of the principal, the problem collapses to max s(x) E [V (x− s(x)] subject to E [U (s (x))] ≥ H + c (a∗) where we have already imposed that the agent will choose a∗ and the expectation is conditional on effort level a∗. We have also dropped the incentive compatibility constraint, and rewrote the participation constraint to take into account of the equilibrium level of effort by the agent. This is simply a risk-sharing problem, and the solution is straightforward. It can be found by setting up a simple Lagrangean: min λ max s(x) L =E [V (x− s(x)]− λ £ H + c (a∗)− E [U (s (x))] ¤ Now this might appear as a complicated problem, because we are choosing a function s (x), but this specific case is not difficult because there is no constraint on the form of the function, so the maximization can be carried out pointwise (think, for example, that x only took discrete values). 75 Lectures in Labor Economics We might then be tempted to write: E [V 0(x− s(x)] = λE [U 0 (s (x))] . However, this is not quite right, and somewhat misleading. Recall that x = x (a, θ), so once we fix a = a∗, and conditional on x, there is no more uncertainty. In other words, the right way to think about the problem is that for a given level of a, the variation in θ induces a distribution of x, which typically we will refer to as F (x | a) in what follows. For now, since a = a∗ and we can choose s (x) separately for each x, there is no more uncertainty conditional on x. Hence, the right first-order conditions are: (4.1) V 0(x− s(x)) U 0 (s (x)) = λ for all x, i.e., perfect risk sharing. In all states, represented by x, the marginal value of one more dollar to the principal divided by the marginal value of one more dollar to the agent must be constant. 3. Incentives-Insurance Trade-off Next, let us move to the real principle-agent model where Ω only includes the output performance, x, so feasible contracts are of the form s (x), and are not conditioned on a. The effort is chosen by the agents to maximize his utility, and the incentive compatibility constraint will play an important role. The problem can be written in a similar form to before as max s(x),a E [V (x− s(x)] s.t. E [H(s(x), a)] ≥ H Participation Constraint (PC) and a ∈ argmax a0 E [H(s(x), a0)] Incentive Constraint (IC) with the major difference that s (x) instead of s (x, a) is used. As already hinted above, the analysis is more tractable when we suppress θ, and instead directly work with the distribution function of outcomes as a function of the effort level, a: F (x | a) 76 Lectures in Labor Economics is no constraints on the level of payments; we will see below how this will change with limited liability constraints). We can also use (4.2) to derive further insights about the trade-off between insurance and incentives. To do this, let us assume that V 0 is constant, so that the principal is risk neutral. Let us ask what it would take to make sure that we have full insurance, i.e., V 0(x− s(x))/U 0(s(x))= constant. Since V 0 is constant, this is only possible if U 0 is constant. Suppose that the agent is risk-averse, so that U is strictly concave or U 0 is strictly decreasing. Therefore, full insurance (or full risk sharing) is only possible if s (x) is constant. But in turn, if s (x) is constant, the incentive compatibility constraint will be typically violated (unless the optimal contract asks for a = 0), and the agent will choose a = 0. Next, consider another extreme case, where the principal simply sells the firm to the agent for a fixed amount, so s (x) = x− s0. In this case, the agent’s first-order condition will give a high level of effort (we can think of this as the “first-best” level of effort, though this is not literally true, since this level of effort potentially depends on s0): Z U (x− s0) fa (x | a) dx = c0 (a) . This higheeer level of effort comes at expense of no insurance for the agent. Instead of these two extremes, the optimal contract will be “second-best”, trad- ing off incentives and insurance. We can interpret the solution (4.2) further. But first, note that as the optimiza- tion problem already makes it clear, as long as the IC constraint of the agent has a unique solution, once the agent signs to contract s(x), there is no uncertainty about action choice a. Nevertheless, lack of full insurance means that the agent is being punished for low realizations of x. Why is this? At some intuitive level, this is because had it not been so, ex ante the agent would have had no incentive to exert high effort. What supports high effort here is the threat of punishment ex post. 79 Lectures in Labor Economics This interpretation suggests that there is no need for the principal to draw inferences about the effort choice a from the realizations of x. However, it turns out that the optimal way of incentivizing the agent has many similarities to an optimal signal extraction problem. To develop this intuition, consider the following maximum likelihood estimation problem: we know the distribution of x conditional on a, we observe x, and we want to estimate a. This is a solution to the following maximization problem max a0 ln f(x | a0), for given x, which has the first-order condition fa(x | a0) f(x | a0) = 0 which can be solved for a(x). Let the level of effort that the principal wants to implement be ā, then a (x) = ā, this first-order condition is satisfied. Now going back to (4.2), we can write this as: V 0(x− s(x)) U 0(s(x)) = λ+ μ fa(x | ā) f(x | ā) . If a (x) > ā, then fa(x | ā)/f(x | ā) > 0. Since μ > 0, this implies that V 0/U 0 must be greater and therefore U 0 must be lower. This is in turn possible only when s (x) is increasing in x. Therefore, when the realization of output is good news relative to what was expected, the agent is rewarded, when it is bad news, he is punished. Thus in a way, the principal is acting as if she’s trying to infer what the agent did, even though of course the principal knows the agent’s action along the equilibrium path. 4. The Form of Performance Contracts Can we say anything else on the form of s (x)? At a minimum, we would like to say that s (x) is increasing, so that greater output leads to greater renumeration for the agent, which seems to be a feature of real world contracts for managers, workers etc. 80 Lectures in Labor Economics Unfortunately, this is not true without putting more structure on technology. Consider the following example where the agent chooses between two effort levels, high and low: a ∈ {aH , aL} and the distribution function of output conditional on effort is as follows: F (x | aH) = ½ 4 with probability 1 2 2 with probability 1 2 F (x | aL) = ½ 3 with probability 1 2 1 with probability 1 2 The agent has an arbitrary strictly concave utility function. It is quite clear that in this case full risk-sharing can be achieved (what does this mean in terms of the multipliers in our formulation above?). In particular, full risk sharing is possible if the principal punishes the agent whenever 1 or 3 is observed. In fact, the following contract would do the trick: s(2) = s(4) = H + c (aH) s(1) = s(3) = −K where K is a very large number. Thus the agent is punished severely for the out- comes 1 or 3, since these occur only when he chooses low effort. When the outcome is 2 or 4, he gets a payment consistent with his participation constraint. Clearly this contract is not increasing in x, in particular, s (3) < s (2). You might wonder whether there is something special here because of the discrete distribution of x. This is not the case. For example, a continuous distribution with peaks at {2, 4} for a = aH and {1, 3} for a = aL would do the same job. So how can we ensure that s (x) is increasing in x? Milgrom, Bell Journal, 1981, “Good News, Bad News” shows the following result: A sufficient condition for s(x) to be increasing is that higher values of x are “good news” about a i.e., fa(x | a) f(x | a) is increasing in x 81 CHAPTER 5 Moral Hazard with Limited Liability, Multitasking, Career Concerns, and Applications 1. Limited Liability Let us modify the baseline moral hazard model by adding a limited liability constraint, so that s (x) ≥ 0. The problem becomes: max s(x),a Z V (x− s(x))dF (x | a) subject to Z [U(s(x)− c(a))] dF (x | a) ≥ H a ∈ argmax a0 Z [U(s(x))− c(a0)] dF (x | a0) s (x) ≥ 0 for all x Again taking the first-order approach, and assigning a multiplier η (x) to the last set of constraints, the first but her conditions become: V 0(x− s(x)) = ∙ λ+ μ fa(x | a) f(x | a) ¸ U 0(s(x)) + η (x) . If s (x) was going to be positive for all x in any case, the multiplier for the last set of constraints, η (x), would be equal to zero, and the problem would have an identical solution to before. However, if, previously, s (x) < 0 for some x, the structure of the solution has to change. In particular, to obtain the intuition, suppose that we shift up the entire function s (x) to s̃ (x) so that s̃ (x) ≥ 0. Since the participation constraint was binding at s (x), it must be slack at s̃ (x). Clearly this will not be optimal and in fact because of income effects, this shifted-up schedule may no longer lead to the 85 Lectures in Labor Economics same optimal choice of effort for the agent. In particular, as we increase the level of payments at low realizations of x, the entire payment schedule has to change in a more complex way. Nevertheless, this “shifting-up” intuition makes it clear that the participation constraint will no longer be binding, thus λ = 0. This informally is the basis of the intuition that without limited liability con- straints, there are no rents; but with limited liability there will be rents, making the agent’s participation constraint slack. Let us now illustrate this with a simple example. Suppose that effort takes two values a ∈ {aL, aH}. Assume that output also takes only two values: x ∈ {0, 1}, moreover, F (x | aH) = 1 with probability 1 F (x | aL) = ½ 1 with probability q 0 with probability 1− q Normalize H̄ and c (aL) to zero, and assume c (aH) = cH < 1− q. Finally, to make things even simpler, assume that both the agent and the prin- cipal are risk neutral. Let us first look at the problem without the limited liability constraint. The assumption that c (aH) = cH < 1 − q implies that high effort is optimal, so in an ideal world this would be the effort level. Let us first start by assuming that the principal would like to implement this. In this case, the problem of the principal can be written as min s(0),s(1) s (1) subject to s (1)− cH ≥ qs (1) + (1− q) s (0) s (1)− cH ≥ 0 where s (0) and s (1) are the payments to the agent conditional on the outcome (Why are these the only two control variables?) 86 Lectures in Labor Economics 2. Linear Contracts One problem with the baseline model developed above is that, despite a number of useful insights, it is quite difficult to work with. Moreover, the exact shape of the density functions can lead to very different forms of contracts, some with very nonlinear features. One approach in the literature has been to look for “robust” contracts that are both intuitively simpler and easier to work with to derive some first-order predic- tions. But why should optimal contracts be “robust”? And, how do we model “robust” contracts? A potentially promising answer to this question is developed in an important paper by Holmstrom and Milgrom. They established the optimality of linear con- tracts under certain conditions, which is interesting both because linear contracts can be viewed at as more robust than highly nonlinear contracts, and also because the intuition of their result stems from robustness considerations. Providing a detailed exposition of Holmstrom and Milgrom’s model would take us too far afield from our main focus. Nevertheless, it is useful to outline the environment and the main intuition. Holmstrom and Milgrom consider a dynamic principal-agent problem in continuous time. The interaction between the principal and the agent take place over an interval normalized to [0, 1]. The agent chooses an effort level at ∈ A at each instant after observing the relaxation of output up to that instant. More formally, the output process is given by the continuous time random walk, that is, the following Brownian motion process: dxt = atdt+ σdWt where W is a standard Brownian motion (Wiener process). This implies that its increments are independent and normally distributed, that is, Wt+τ −Wt for any t and τ is distributed normally with variance equal to τ . Let Xt = (xτ ; 0 ≤ τ < t) be the entire history of the realization of the increments of output x up until time t (or alternatively a “sample path” of the random variable x). The assumption 89 Lectures in Labor Economics that the individual chooses at after observing past realizations implies that at can be represented by a mapping at : X t → A. Similarly, the principal also observes the realizations of the increments (though obviously not the effort levels and the realizations of Wt), so a contract for the agent is given by a mapping st : Xt → R, specifying what the individual will be paid at time t is a function of the entire realization of output levels up to that point. Holmstrom and Milgrom assume that the utility function of the agent be u µ C1 − Z 1 0 atdt ¶ where C1 is the agent’s consumption at time t = 1. This utility function makes two special assumptions: first, the individual only derives utility from consumption at the end (at time t = 1) and second, the concave utility function applies to consumption minus the total cost of effort between 0 and 1. In addition, Holmstrom andMilgrom assume that u takes the special constant absolute risk aversion, CARA, form (5.1) u (z) = − exp (−rz) with the degree of absolute risk aversion equal to r, and that the principal is risk neutral, so that she only cares about her net revenue at time t = 1, given by x1−C1 (since consumption of the agent at time t = 1 is equal to total payments from the principal to the agent). The key result of Holmstrom and Milgrom is that in this model, the optimal contract is linear in final (cumulative) output x1. In particular, it does not depend on the exact sample path leading to this cumulative output. Moreover, in response to this contract the optimal behavior of the agent is to choose a constant level of effort, which is also independent of the history of past realizations of the stochastic shock (can you see why the utility function (5.1) is important here?). The loose intuition is that with any nonlinear contract there will exist an event, i.e., a sample path, after which the incentives of the agent will be distorted, whereas the linear contract achieves a degree of “robustness”. A more formal intuition is 90 Lectures in Labor Economics that we can think of a discrete approximation to the Brownian motion, which will be a binomial process specifying success or failure for the agent at each instant. The agent should be rewarded for success and punished for failure, and this will amount to the individual being renumerated according to total cumulative output. Moreover, generally this remuneration should depend on the wealth level of the agent, but with CARA, the wealth level does not matter, so the reward is constant. A linear reward schedule is the limit of this process corresponding to the continuous time limit of the binomial process, which is the Brownian motion. Now motivated by this result, many applied papers look at the following static problem: (1) The principal chooses a linear contract, of the form s = α+ βx (note that this implies there is no limited liability; and we have also switched from S to s to simplify notation). (2) The agents chooses a ∈ A ≡ [0,∞]. (3) x = a+ ε where ε ∼ N (0, σ2) In addition, the principal is risk neutral, while the utility function of the agent is U (s, a) = − exp (−r (s− c (a))) with c (a) = ca2/2 corresponding to the cost of effort for some c > 0. The argument is that a linear contract is approximately optimal here. It turns out that the results of this framework are very intuitive and consistent with the baseline model. However, it is important to emphasize that a linear contract is not optimal in this case (it is only optimal in the Holmstrom-Milgrom model with continuous time and the other assumptions; in fact, it is a well-known result in agency theory that a static problem with a normally distributed outcomes has sufficiently unlikely events that the first-best level of effort, which here is afb = 1/c, can be approximated by highly nonlinear contracts, thus the linear contracts studied here are very different from the optimal contracts that would arise if the actual model has been the static model with normally distributed shocks). 91
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved