Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Hypothesis Testing in Multiple Linear Regression | BIOST 515, Study notes of Biostatistics

Material Type: Notes; Class: BIOSTATISTICS II; Subject: Biostatistics; University: University of Washington - Seattle; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-pra-1
koofers-user-pra-1 🇺🇸

10 documents

1 / 28

Toggle sidebar

Related documents


Partial preview of the text

Download Hypothesis Testing in Multiple Linear Regression | BIOST 515 and more Study notes Biostatistics in PDF only on Docsity! Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 1 Types of tests • Overall test • Test for addition of a single variable • Test for addition of a group of variables 4 Test for addition of a group of variables Does the addition of some group of independent variables of interest add significantly to the prediction of y obtained through other independent variables already in the model? yi = β0 + xi1β1 + · · ·+ xi,p−1βp−1 + xipβp + i 5 The ANOVA table Source of Sums of squares Degrees of Mean E[Mean square] variation freedom square Regression SSR = β̂′X ′y − nȳ2 p SSRp pσ 2 + β′RX ′ CXCβR Error SSE = y′y − β̂′X ′y n − (p + 1) SSEn−(p+1) σ 2 Total SSTO = y′y − nȳ2 n − 1 XC is the matrix of centered predictors: XC = 0BB@ x11 − x̄1 x12 − x̄2 · · · x1p − x̄p x21 − x̄1 x22 − x̄2 · · · x2p − x̄p ... ... ... xn1 − x̄1 xn2 − x̄2 · · · xnp − x̄p 1CCA and βR = (β1, · · · , βp)′. 6 The ANOVA table for yi = β0 + xi1β1 + xi2β2 + · · ·+ xipβp + i is often provided in the output from statistical software as Source of Sums of squares Degrees of F variation freedom Regression x1 1 x2|x1 1 ... xp|xp−1, xp−2, · · · , x1 1 Error SSE n − (p + 1) Total SSTO n − 1 where SSR = SSR(x1) + SSR(x2|x1) + · · ·+ SSR(xp|xp−1, xp−2, . . . , x1) and has p degrees of freedom. 9 CHS example, cont. yi = β0 + weightiβ1 + heightiβ2 + i > anova(lmwtht) Analysis of Variance Table Response: DIABP Df Sum Sq Mean Sq F value Pr(>F) WEIGHT 1 1289 1289 10.2240 0.001475 ** HEIGHT 1 120 120 0.9498 0.330249 Residuals 495 62426 126 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 F0 = (1289 + 120)/2 62426/495 = 5.59 > F2,495,.95 = 3.01 We reject the null hypothesis at α = .05 and conclude that at least one of β1 or β2 is not equal to 0. 10 The overall F statistic is also available from the output of summary(). > summary(lmwtht) Call: lm(formula = DIABP ~ WEIGHT + HEIGHT, data = chs) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 55.65777 8.91267 6.245 9.14e-10 *** WEIGHT 0.04140 0.01723 2.403 0.0166 * HEIGHT 0.05820 0.05972 0.975 0.3302 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 11.23 on 495 degrees of freedom Multiple R-Squared: 0.02208, Adjusted R-squared: 0.01812 F-statistic: 5.587 on 2 and 495 DF, p-value: 0.003987 11 Tests on individual regression coefficients Once we have determined that at least one of the regressors is important, a natural next question might be which one(s)? Important considerations: • Is the increase in the regression sums of squares sufficient to warrant an additional predictor in the model? • Additional predictors will increase the variance of ŷ - include only predictors that explain the response (note: we may not know this through hypothesis testing as confounders may not test significant but would still be necessary in the regression model). • Adding an unimportant predictor may increase the residual mean square thereby reducing the usefulness of the model. 14 Tests for groups of predictors Often it is of interest to determine whether a group of predictors contribute to predicting y given another predictor or group of predictors are in the model. • In CHS example, we may want to know if age, height and sex are important predictors given weight is in the model when predicting blood pressure. • We may want to know if additional powers of some predictor are important in the model given the linear term is already in the model. • Given a predictor of interest, are interactions with other confounders of interest as well? 15 Using sums of squares to test for groups of predictors Determine the contribution of a predictor or group of predictors to SSR given that the other regressors are in the model using the extra-sums-of-squares method. Consider the regression model with p predictors y = Xβ + . We would like to determine if some subset of r < p predictors contributes significantly to the regression model. 16 Partition the vector of regression coefficients as β = [ β1 β2 ] where β1 is (p + 1− r)× 1 and β2 is r × 1. We want to test the hypothesis H0 : β2 = 0 H1 : β2 6= 0 Rewrite the model as y = Xβ +  = X1β1 + X2β2 + , (1) where X = [X1|X2]. 19 CHS example, cont. Full model: yi = β0 + weightiβ1 + heightiβ2 H0 : β2 = 0 Df Sum Sq Mean Sq F value Pr(>F) WEIGHT 1 1289.38 1289.38 10.22 0.0015 HEIGHT 1 119.78 119.78 0.95 0.3302 Residuals 495 62425.91 126.11 F0 = 119.78/126.11 = 0.95 < F1,495,0.95 = 3.86 This should look very similar to the t-test for H0. 20 BPi = β0 + weightiβ1 + heightiβ2 + ageiβ3 + genderiβ4 +  > summary(lm(DIABP~WEIGHT+HEIGHT+AGE+GENDER,data=chs)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 90.4481265 15.9317114 5.677 2.34e-08 *** WEIGHT 0.0326655 0.0172310 1.896 0.058579 . HEIGHT -0.0009921 0.0852395 -0.012 0.990718 AGE -0.3283816 0.0926922 -3.543 0.000434 *** GENDER 0.8348105 1.5264106 0.547 0.584687 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 11.11 on 493 degrees of freedom Multiple R-Squared: 0.04636, Adjusted R-squared: 0.03862 F-statistic: 5.991 on 4 and 493 DF, p-value: 0.0001031 21 H0 : β2 = β3 = β4 = 0 vs H1 : βj 6=, j = 2, 3, 4 Df Sum Sq Mean Sq F value Pr(>F) WEIGHT 1 1289.38 1289.38 10.44 0.0013 HEIGHT 1 119.78 119.78 0.97 0.3252 AGE 1 1513.06 1513.06 12.25 0.0005 GENDER 1 36.93 36.93 0.30 0.5847 Residuals 493 60875.92 123.48 SSR(intercept, weight, height, age, gender) = 2571019 + 1289.38 + 119.89 + 1513.06 + 36.93 = 2573978 SSR(intercept, weight) = 257019 + 1289.38 = 2572308 SSR(height, age, gender| intercept, weight) = 2573978− 2572308 = 1670 Notice we can also get this from the ANOVA table above SSR(height, age, gender| intercept,weight) = 119.78+1513.06+36.93 = 1670 24 What if we had the ANOVA table for the reduced model? Df Sum Sq Mean Sq F value Pr(>F) WEIGHT 1 1289.38 1289.38 10.23 0.0015 Residuals 496 62545.69 126.10 Given that SSR = SSR(x2) + SSR(x3|x2) + SSR(x1|x2, x3) + SSR(x4|x3, x2, x1) and SSR(x2, x3, x4|x1) = SSR − SSR(x1) then SSR(x2, x3, x4|x1) = 680.76 + 1798.91 + 442.55 + 36.93 − 1289.38 = 1680. 25 One other question we might be interested in asking is if there are any significant interactions in the model? lm(DIABP~WEIGHT*HEIGHT*AGE*GENDER,data=chs) Estimate Std. Error t value Pr(>|t|) (Intercept) −1479.5964 1219.6693 −1.21 0.2257 WEIGHT 12.8828 8.3636 1.54 0.1241 HEIGHT 9.9984 7.7695 1.29 0.1988 AGE 20.7270 16.4946 1.26 0.2095 GENDER −1429.3377 1638.6646 −0.87 0.3835 WEIGHT:HEIGHT −0.0816 0.0530 −1.54 0.1244 WEIGHT:AGE −0.1713 0.1135 −1.51 0.1319 HEIGHT:AGE −0.1342 0.1052 −1.28 0.2025 WEIGHT:GENDER 8.9610 10.7075 0.84 0.4031 HEIGHT:GENDER 7.2497 10.0955 0.72 0.4730 AGE:GENDER 22.2077 22.8169 0.97 0.3309 WEIGHT:HEIGHT:AGE 0.0011 0.0007 1.51 0.1312 WEIGHT:HEIGHT:GENDER −0.0436 0.0658 −0.66 0.5084 WEIGHT:AGE:GENDER −0.1449 0.1498 −0.97 0.3339 HEIGHT:AGE:GENDER −0.1146 0.1404 −0.82 0.4148 WEIGHT:HEIGHT:AGE:GENDER 0.0007 0.0009 0.79 0.4298 26 ANOVA table Df Sum Sq Mean Sq F value Pr(>F) WEIGHT 1 1289.38 1289.38 10.65 0.0012 HEIGHT 1 119.78 119.78 0.99 0.3204 AGE 1 1513.06 1513.06 12.50 0.0004 GENDER 1 36.93 36.93 0.31 0.5810 WEIGHT:HEIGHT 1 19.88 19.88 0.16 0.6855 WEIGHT:AGE 1 4.44 4.44 0.04 0.8483 HEIGHT:AGE 1 73.22 73.22 0.60 0.4371 WEIGHT:GENDER 1 21.53 21.53 0.18 0.6734 HEIGHT:GENDER 1 597.64 597.64 4.94 0.0268 AGE:GENDER 1 214.78 214.78 1.77 0.1835 WEIGHT:HEIGHT:AGE 1 298.24 298.24 2.46 0.1172 WEIGHT:HEIGHT:GENDER 1 167.07 167.07 1.38 0.2407 WEIGHT:AGE:GENDER 1 1051.41 1051.41 8.69 0.0034 HEIGHT:AGE:GENDER 1 5.07 5.07 0.04 0.8379 WEIGHT:HEIGHT:AGE:GENDER 1 75.58 75.58 0.62 0.4298 Residuals 482 58347.07 121.05
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved