Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Reviewing and Analyzing Racial Salary Discrimination in MLB, Study notes of Research Methodology

An in-depth analysis of racial salary discrimination in major league baseball (mlb) through a review of previous studies and an updated estimation equation. The author examines various forms of discrimination, including monopolistic, prejudiced-based, and consumer discrimination. Using data from the 2003 mlb season, the study aims to test for racial salary discrimination using performance and salary data. The document also discusses the impact of competition and the availability of player data on eliminating discrimination.

Typology: Study notes

2012/2013

Uploaded on 02/07/2013

kiran
kiran 🇮🇳

3.6

(7)

117 documents

1 / 20

Toggle sidebar

Related documents


Partial preview of the text

Download Reviewing and Analyzing Racial Salary Discrimination in MLB and more Study notes Research Methodology in PDF only on Docsity! 1 Measuring Racial Salary Discrimination in MLB I. Introduction ver the last thirty years, Major League Baseball has served as the foundation of numerous studies that have examined the economics of labor markets. According to Kahn (1991), “Baseball provides a whole series of precise quantitative measures of performance which can be standardized across teams and players.” One strain of research has focused on racial discrimination in the workplace. The results of these studies have lead economists to believe that racial discrimination in Major League Baseball (MLB) was a real phenomenon in the past, but may have moderated more recently. The intent of this paper is to examine the existence of racial salary discrimination in MLB today. I’ll do this by first reviewing several studies from the last thirty years that have dealt with some sort of discrimination in Major League Baseball. Then I’ll develop an estimated equation to test for racial salary discrimination and explain each variable that is included into my estimated equation. Next, I’ll talk about the composition of data, which describes and splits up players by races that are included in this sample study. Then, I’ll test and discuss the results of the estimated equation for Multicollinearity and Heteroskedisticity. After this I will discuss the results of the Regression Model and then give a conclusion. After performing these tasks, I will hypothesize that racial salary discrimination does actually exist in Major League Baseball today. O 2 II. Review of the Literature In the United States, Canada, and a number of other countries there has been a strong commitment to providing equal opportunity in all aspects of employment and in the workplace (Schuler 1990). The past thirty years have seen dramatic steps towards opening the workplace to many minority groups that had been for so long excluded. One of the factors that have helped this progression towards eliminating salary discrimination in the workplace is increased competition for human capital. Companies in a competitive market that discriminate not only receive huge risk for legal actions, but also lose minority employees to their competitors. In this case, if a Major League Baseball team practices discrimination, they could lose very talented minority players to other teams. “From an economic perspective, increased competition in the labor market will lower employment discrimination in wages” (Becker 1971). According to Scully (1973), one of the biggest steps to eliminating racial salary discrimination in Major League Baseball had to do with the elimination of the reserve clause. The reserve clause basically prevented Major League players from bargaining with teams other than the original team that they signed with. This meant that a player was trapped on the team that they first signed with for life, unless the team decided to trade that player. Thus the player had little choice but to endure a form of monopsonistic salary discrimination. But that all changed in 1976 under the Basic Agreement between the players’ union and management. Under this agreement, players with at least 6 years in the Major Leagues could become free agents and offer their abilities to other Major League Baseball teams. With this agreement Major League Baseball was opened up to a 5 racial salary discrimination did exist in his 1983 sample study as blacks and Latinos tended to fare worse off than whites. He observed, “if blacks and Latin’s were paid as if they were white players, their current mean salary would be much higher and if whites were paid as if they were blacks or Latin’s, they would receive less than their current mean salary” (pp. 11). III. Empirical Model and Description of Data The intent of this study is to update Reid (1983) with the most recent performance and salary data in order to test for racial salary discrimination in Major League Baseball by using the following regression equation. Using the Eviews Statistical Software I will estimate the following equation: Equation 1: SALARY = β0 + β1 SLUG + β2 STAR + β3 GOLD + β4 FIELD + Β5 REV + β6 YEAR + β7 NL + β8 BLACK + β9 LATIN + ε The dependent variable (SALARY) is measured in thousands of U.S. dollars according to the opening day player salaries of the 2003 Major League Baseball season. This variable does not include pro-rated signing bonuses and other guaranteed income. The explanatory variables, their definitions and their expected effects on salary are included in Table 1. 6 TABLE 1 Definitions of the Independent Variables Included in Equation 1 and Their Expected Sign of Coefficients Variables DEFINATIONS Expected Sign of the Estimated COEFFICIENTS SLUG The total number of bases acquired divided by the total number of at bats in 2002 Positive STAR Total number of career All Star games voted to up to 2003 Positive GOLD Total number of Defensive Gold Gloves Awards won up to 2003 Positive FIELD Total number of (putouts plus assists) divided by the total (putouts plus assists plus errors) in the 2002 season Positive REV A teams income minus their expenses in the 2002 season Positive YEAR Total number of years played in the Major Leagues up to 2003 Positive NL Dummy variable that takes a value of 1 if the player plays in the National League and 0 otherwise. Negative BLACK Dummy variable that takes a value of 1 if the player is black and 0 otherwise. Negative LATIN Dummy variable that takes a value of 1 if the player is Latin and 0 otherwise Negative Unlike other sports where player’s information is guarded closely, baseball player statistics and salary are readily available to anyone. Although most researchers agree that a player’s salary is positively correlated with his performance, the literature has not produced a unique measure of performance. However, the variables that appear in Table 1 are thought to be the best collective measure of performance for a baseball player (nonpitchers). 7 The variable SLUG measures the total number of bases reached divided by the total number of at bats in a season. A slugging percentage is similar to a batting average, which is a more popular measure of an offensive performance. The problem with a batting average is that it does not take into account the total number of bases advanced on a base hit. Extra base hits include: doubles, triples and homeruns. Thus, extra base hits are very important in scoring runs, which means a better chance of winning baseball games. The expected sign of the variable (SLUG) is positive. The variable STAR is a measurement of how many times a player had been elected to play in the all star game in their career up to 2002. The first ever all-star game was played in 1933. Ever since that first game was played, there has been an all-star game played in late June or July of every season. Each season the all-star game is rotated to different various cities. There are many factors that go into the election of all-star players. Usually, these are the players at each position that are having the best productive season. On the other hand, players can be voted into this game just by their popularity with the fans, due to the fact that fans vote on whom they want to see in the big game. Unfortually, due to this factor, not every year the players with the best statistics are voted into this prestigious game. The expected sign of the variable (STAR) is positive. In both the National League and the American League, nine gold glove awards are giving out at the end of each season. This award represents the best defensive player at every position, including the pitcher. Many factors go into winning this award; usually the player with the highest fielding percentage has a much better chance in winning this award. This award also represents outstanding defensive plays or also known as “web- jims”. The variable GOLD measures the number of times a player has won this award in 10 IV. Composition of Data The study sample consists of a total of 354 major league baseball players (non- pitchers) in the opening day lineup for the 2002 baseball season. A total of 193 baseball players from the National League and 161 baseball players from the American League are included in this study sample. The classification of the players that are included in this sample is divided up into three racial categories: Whites, Blacks, and Latin. Following Reid (1983), anyone that was born in a Latin country or also had a Spanish surname was classified as a Latin. Table 1 contains the distribution of the study sample by league and race. The total number of White players that are in this sample is 180, as for Blacks 70 and for Latin players 104. These players are all split up into two different divisions, the American League and the National League. The American League consists of 85 White players, 31 Black players and 45 Latin players. As for the National League, it consists of 95 White players, 39 Black players and 59 Latin players. Each player from both divisions had to play in at least 30 games in the 2002 baseball season and also have at least 50 at bats. 11 Table 2 Distribution by Race of Non-pitchers on Opening Day, April 2002 White Black Latin Total American League 85 31 45 161 National League 95 39 59 193 Total 180 70 104 354 Percentage .51 .20 .29 1 Note: I eliminated a total of 109 major league baseball players due to either not participating in at least 30 games, having less than a total of 50 baseball at bats, or their 2002 salary information was not available. The ethnic backgrounds of these players that were omitted are as followed: American League: 24 Whites, 16 Blacks, 9 Latin’s and 1 Asian. National League: 23 Whites, 17 Blacks, 18 Latin’s and 1 Asian. V. Testing for Multicollinearity There are two types of Multicollinearity, perfect multicollinearity versus imperfect multicollinearity. According to Studenmund (2000) perfect multicollinearity is “the violation of the assumption that no independent variable is a perfect linear function of one or more other independent variables” (Studenmund 243). Imperfect multicollinearity is where two or more independent variable are highly correlated in a particular data set being studied (Studenmund 243). If the relationship is strong enough then “it can significantly affect the estimation of the coefficients of the variables” (Studenmund 247). 12 Multicollinearity problem does not cause bias in the estimated coefficients. This means that even if a regression equation has significant multicollinearity problem, the estimates of the coefficients will still be centered on the true population coefficients. The main negative effect of multicollinearity is that it increases the variances and standard error terms of the estimated coefficients. This makes it more difficult to precisely identify the separate effects of the correlated variables. A higher standard error of the estimated coefficients results in a lower value of the t-scores, which is going to increase the probability of concluding that a relevant independent variable has no significant effect on the dependent variable. But how do we know when two variables are highly correlated? Most researchers consider the correlation coefficient of 0.8 or higher as an evidence of high multicollinearity (Studenmund 256). A test of Multicollinearity was run and is described in Table 3 below. As Table 3 shows, I found no significant correlation between the independent variables. The highest correlation coefficient was 0.59, which was the correlation between STAR and GOLD. I said before, Major League Baseball gives out 18 gold glove awards each year. In 2002, out of those 18 players who received gold glove awards, only 6 of them were voted to play in the All Star Game. Therefore, multicollinearity was not a problem with these two variables. Nor was it a problem with any of the other variables. 15 pure heteroskedasticity, none of the independent variables is correlated with the error term. Pure heteroskedasticity alters the variances of the estimated coefficients making the results of the t-test of significance unreliable (Studenmund 354). Impure heteroskedasticity causes the same problems but it also creates bias in the estimated coefficients. Since I detected the presence of heteroskedasticity in Figure 1, I ran the White Test to formally test for heteroskedasticity. The White Test was discovered by a man named Halbert White. To run the White Test, first I obtained the residuals of the regression equation (Equation 1). Next, these residual values are squared and then included in an additional regression equation as the dependent variables. The explanatory variables of this equation include: Equation 1’s independent variables (X), the square of each independent variable included in Equation 1 (X²), and the product of each two independent variables included in Equation 1 (Studenmund 361). The number of observations (N) times the R² of this equation has a chi-square distribution with 50 degrees of freedom equal to the number of the independent variables of the equation.. The null hypothesis is that there is no heteroskedasticity. According to the White Test, to reject the null hypothesis, N times R² must be greater than the critical chi-square value. If the null hypothesis is rejected, then I know that heteroskedasticity is a problem in my regression equation. If the null hypothesis is not rejected, which means that the critical chi-square value is greater than N times R², then homoskedasticity has occurred and heteroskedasticity is not a problem in the regression equation. Testing for heteroskedasticity by using the White Test, I found at 5% level of significance that I could reject the null hypothesis. My critical Chi- square value was less 16 than the N times R² value. The critical Chi-square value was equal to 67.72 at 50 degrees of freedom level and the value of N times R² was 83.95. This means that I am 95% sure that heteroskedasticity exists in my regression equation. There are three remedies for correcting the existence of heteroskedasticity; I chose the more popular approach of heteroskedasticity-corrected standard errors. This approach focuses on improving the estimation of the standard error terms. Using this approach, new standard errors are calculated for each variable. Using these new standard error terms gives a more accurate standard error term than the uncorrected standard error terms. VII. Results of the Regression Model Table 4 shows the results of estimation of Equation 1. Table 4 Estimation Results for Salary Discrimination Model Variable Coefficient Expected Sign t-Statistic Absolute Value Significance at 5% level SLUG 13208702 + 4.633473 Yes STAR 904172.3 + 5.013695 Yes GOLD -110311.2 + -0.602938 No FIELD -274921.1 + -0.319051 No REV -0.006218 + -0.397513 No YEAR 149766.1 + 3.142611 Yes NL -228267.2 - -0.664239 No BLACK 204015.8 - 0.446549 No LATIN 292738.2 - 0.728807 No * Critical T-stat 1.645 17 After I ran the t-test, I discovered that the coefficients for the variables GOLD, FIELD, REV, NL, BLACK and LATIN were not significant at the 5% level. Consequently, I cannot support the statement that any of these variables have a significant impact on the dependent variable (SALARY). The first variable that did not pass the t-test was the variable GOLD. Thus, means that the total number of Defensive Gold Gloves Awards won by a player up to the 2003 season did not have the expected significant impact on the dependent variable (SALARY). The second variable that did not pass the t-test was the variable FIELD. Thus, means that the total number of (putouts plus assists) divided by the total (putouts plus assists plus errors) in the 2002 season by a player did not have the expected impact on the dependent variable (SALARY). The next variable that did not impact the variable (SALARY) was the independent variable REV. Thus, mean team’s income minus their expenses in the 2002 season did not have the expected significant impact on the dependent variable (SALARY). The next variable that did not pass the t-test was the dummy variable NL. This means that the dummy variable that takes a value of 1 if the player plays in the National League and 0 otherwise did not have the appropriate t-statistical value to be greater than the critical t-value, although this dummy variable did have the expected sign of being negative. Still, the dummy variable NL did not have the expected impact on the dependent variable (SALARY). The next variable that did not pass the t-test was the dummy variable BLACK. This means that the dummy variable that takes a value of 1 if the player is black and 0 otherwise did not have the expected impact on the dependent variable (SALARY). The last variable that did not pass the t-test was the dummy variable LATIN. This means that the dummy variable that
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved