Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Regression Analysis using Dummy Variables in Sociology: An Example with Stata, Study notes of Sociology

An example of how to conduct a regression analysis using dummy variables in sociology with stata software. The frequency distribution of a qualitative variable called 'class' with four categories, the generation of dummy variables, a one-way analysis of variance, and the regression of annual earnings on the dummy variables. The document also explains how to interpret the results and the significance of the f-statistic.

Typology: Study notes

2011/2012

Uploaded on 11/20/2012

shubnam
shubnam 🇮🇳

4.5

(8)

143 documents

1 / 3

Toggle sidebar

Related documents


Partial preview of the text

Download Regression Analysis using Dummy Variables in Sociology: An Example with Stata and more Study notes Sociology in PDF only on Docsity! sociology dummy variables (k>2) Below is an example of a regression in which the qualitative independent variable has 4 categories. The data pertain to U.S. workers in 19991. The variable is class, which is coded 1=worker; 2=manager; 3=independent contractor; 4=capitalist. The first command below yields the frequency distribution for class and at the same time generates 4 dummy variables, one for each class. 1. tab class,gen(cl) class | Freq. Percent Cum. ------------+----------------------------------- 1 | 486 55.61 55.61 2 | 268 30.66 86.27 3 | 66 7.55 93.82 4 | 54 6.18 100.00 ------------+----------------------------------- Total | 874 100.00 Stata uses the prefix “cl” that I chose and then automatically labels the dummies cl1, cl2, cl3, cl4. I use summ to check it out; it looks ok. 2. summ cl1-cl4 Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- cl1 | 874 .5560641 .4971314 0 1 cl2 | 874 .3066362 .461361 0 1 cl3 | 874 .0755149 .2643716 0 1 cl4 | 874 .0617849 .2409023 0 1 Before doing a regression of annual earnings (dollars) on class, I do the equivalent—a oneway analysis of variance. Notice I lost some cases because not all respondents supplied a value for annual earnings. 3. oneway dollars class,tab | Summary of dollars class | Mean Std. Dev. Freq. ------------+------------------------------------ 1 | 18943.669 13968.245 434 2 | 30651.812 21149.528 251 3 | 20849.892 19176.965 57 4 | 45817.386 34171.341 46 ------------+------------------------------------ Total | 24379.697 19974.53 788 Analysis of Variance Source SS df MS F Prob > F ------------------------------------------------------------------------ Between groups 4.4550e+10 3 1.4850e+10 43.21 0.0000 Within groups 2.6945e+11 784 343684992 ------------------------------------------------------------------------ Total 3.1400e+11 787 398981861 Now I will do the regression. I should be able to compute the coefficients from the sample category means given in the table above. docsity.com 4. regress dollars cl1-cl4 Source | SS df MS Number of obs = 788 -------------+------------------------------ F( 3, 784) = 43.21 Model | 4.4550e+10 3 1.4850e+10 Prob > F = 0.0000 Residual | 2.6945e+11 784 343684992 R-squared = 0.1419 -------------+------------------------------ Adj R-squared = 0.1386 Total | 3.1400e+11 787 398981861 Root MSE = 18539 ------------------------------------------------------------------------------ dollars | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cl1 | -26873.72 2874.598 -9.35 0.000 -32516.54 -21230.9 cl2 | -15165.57 2973.327 -5.10 0.000 -21002.2 -9328.948 cl3 | -24967.49 3674.367 -6.80 0.000 -32180.26 -17754.73 cl4 | (dropped) _cons | 45817.39 2733.389 16.76 0.000 40451.76 51183.01 ------------------------------------------------------------------------------ Notice above that I gave Stata all 4 dummies on the regression line, but it automatically dropped one, the one for the category “capitalist.” This means all the regression coefficients for the remaining class dummies are calculated as deviations of each class’s sample mean earnings from the mean for capitalists. Notice also that the ANOVA table for the regression and the F-statistic are exactly the same as output by the oneway command. The next regression omits the dummy for class=1, i.e. workers. Hence, all coefficients for the other class dummies are calculated by reference to the sample mean earnings of workers. The ANOVA table and the F-statistic remain unchanged from when capitalist was the omitted category. However, the coefficients and their standard errors and t-ratios are (with one exception) all different. 5. regress dollars cl2-cl4 Source | SS df MS Number of obs = 788 -------------+------------------------------ F( 3, 784) = 43.21 Model | 4.4550e+10 3 1.4850e+10 Prob > F = 0.0000 Residual | 2.6945e+11 784 343684992 R-squared = 0.1419 -------------+------------------------------ Adj R-squared = 0.1386 Total | 3.1400e+11 787 398981861 Root MSE = 18539 ------------------------------------------------------------------------------ dollars | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cl2 | 11708.14 1470.09 7.96 0.000 8822.365 14593.92 cl3 | 1906.223 2611.793 0.73 0.466 -3220.712 7033.158 cl4 | 26873.72 2874.598 9.35 0.000 21230.9 32516.54 _cons | 18943.67 889.8881 21.29 0.000 17196.82 20690.51 ------------------------------------------------------------------------------ Look at this next regression. What is the problem? Why is it not what we want? 6. regress dollars cl1-cl4, noconstant Source | SS df MS Number of obs = 788 -------------+------------------------------ F( 4, 784) = 373.10 Model | 5.1291e+11 4 1.2823e+11 Prob > F = 0.0000 Residual | 2.6945e+11 784 343684992 R-squared = 0.6556 -------------+------------------------------ Adj R-squared = 0.6538 Total | 7.8236e+11 788 992845142 Root MSE = 18539 ------------------------------------------------------------------------------ dollars | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cl1 | 18943.67 889.8881 21.29 0.000 17196.82 20690.51 cl2 | 30651.81 1170.155 26.19 0.000 28354.8 32948.82 cl3 | 20849.89 2455.516 8.49 0.000 16029.73 25670.06 cl4 | 45817.39 2733.389 16.76 0.000 40451.76 51183.01 ------------------------------------------------------------------------------ docsity.com
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved