Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Introduction to Statistics, Lecture notes for Statistics, Lecture notes of Statistics

University of Sydney (US)Statistics

[Week 10] Multiple Linear Regression -- T test, Confidence Intervals, High leverage points

Typology: Lecture notes

2018/2019

On special offer

~~30 Points~~

Limited-time offer

Uploaded on 06/15/2019

kefart 🇺🇸

4.4

(11)

55 documents

1 / 46

On special offer

Partial preview of the text

Download Introduction to Statistics, Lecture notes for Statistics and more Lecture notes Statistics in PDF only on Docsity! Lecture 9: Multiple Linear Regression —CSSCSCSCs Lecture 9 Outline Lecture 9: Multiple Linear Regression Example – Birds of the High Paramo Theory – Fitting a Model to Multivariate Data Example – Codes Theory – Single parameter t tests Theory - Multiple Correlation Coefficient Theory – Confidence Intervals for Regression Parameters Theory – Prediction in Linear Models Theory – High Leverage Points Theory – Dangers of multicollinearity Categorical Variable in Linear Models 2/46 Lecture 9 Theory – Fitting a Model to Multivariate Data I Suppose we have n independent observations with k associated known (explanatory) values. A natural extension of simple linear regression is to consider the model with k predictor variables Yi = β0 + β1xi1 + . . .+ βkxik + i, i = 1, . . . , n, where i ∼ NID(0, σ2). I Note: there are p = k + 1 number of regression parameters in this model. I Again the parameters are estimated by minimising the sum of squares of the residuals S(β) = n∑ i=1 ( Yi − (β0 + β1xi1 + . . .+ βkxik) )2 . Thus, β̂ = (β̂0, β̂1, . . . , β̂k) > = argmin β S(β). 5/46 Lecture 9 Example – Reading in the Data dat = read.table(file = "../datasets/paramo.txt", header = TRUE, row.names = 1) dim(dat) ## [1] 14 5 head(dat) ## N AR EL DEc DNI ## Chiles 36 0.33 1.26 36 14 ## Las Papas-Coconuco 30 0.50 1.17 234 13 ## Sumapaz 37 2.03 1.06 543 83 ## Tolima-Quindio 35 0.99 1.90 551 23 ## Paramillo 11 0.03 0.46 773 45 ## Cocuy 21 2.17 2.00 801 14 6/46 Lecture 9 Example – Numerical Summary summary(dat) ## N AR EL DEc ## Min. : 4.00 Min. :0.0300 Min. :0.460 Min. : 36.0 ## 1st Qu.:13.00 1st Qu.:0.0875 1st Qu.:0.670 1st Qu.: 606.5 ## Median :17.50 Median :0.2750 Median :0.905 Median : 954.0 ## Mean :20.71 Mean :0.6557 Mean :1.117 Mean : 848.1 ## 3rd Qu.:29.75 3rd Qu.:0.8950 3rd Qu.:1.440 3rd Qu.:1141.5 ## Max. :37.00 Max. :2.1700 Max. :2.280 Max. :1380.0 ## DNI ## Min. : 5.00 ## 1st Qu.:14.00 ## Median :32.00 ## Mean :36.79 ## 3rd Qu.:52.50 ## Max. :83.00 7/46 Lecture 9 Example – Fitting a Multiple Linear Regression Model M1 = lm(N ~ 1 + AR + EL + DEc + DNI, data = dat) summary(M1) .... ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 27.889386 6.181843 4.511 0.00146 ** ## AR 5.153864 3.098074 1.664 0.13056 ## EL 3.075136 4.000326 0.769 0.46175 ## DEc -0.017216 0.005243 -3.284 0.00947 ** ## DNI 0.016591 0.077573 0.214 0.83541 ## --- ## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 ## ## Residual standard error: 6.705 on 9 degrees of freedom ## Multiple R-squared: 0.7301,Adjusted R-squared: 0.6101 .... 10/46 Lecture 9 Example – Fitting a Multiple Linear Regression Model I A full stop on the right hand side of an R model formula represents all variables in the data frame except for the response, the following produce the same output: summary(lm(N ~ 1 + AR + EL + DEc + DNI, data = dat)) summary(lm(N ~ ., data = dat)) .... ## (Intercept) 27.889386 6.181843 4.511 0.00146 ** ## AR 5.153864 3.098074 1.664 0.13056 ## EL 3.075136 4.000326 0.769 0.46175 ## DEc -0.017216 0.005243 -3.284 0.00947 ** ## DNI 0.016591 0.077573 0.214 0.83541 .... 11/46 Lecture 9 Example – Closer look at summary(lm(...)) The fitted model is E(N) = 27.9 + 5.15 · AR+ 3.08 · EL− 0.017 · DEc+ 0.017 · DNI. You also can use M1$coeff for the parameter estimates as before. 12/46 Lecture 9 Example – Closer look at summary(lm(...)) The last columns corresponds to the p-values of the two-sided single parameter t tests where the degrees of freedom is given as n− p. 15/46 Lecture 9 Theory – Single parameter t tests I The summary(lm(...)) command in R provides information for testing the importance of a covariate taking into account all other variables in the model. I Specifically, given the model Yi = β0 + β1xi1 + . . .+ βkxik + i (i = 1, 2, . . . , n) the output provides statistics for performing a t test of H0 : βj = 0 vs. H1 : βj 6= 0 for any of the p = k + 1 given variables xj , j = 0, 1, . . . , k, making no assumptions about the other regression parameters. I To test H0 : βj = 0 we can use t∗ = β̂j − 0 SE(β̂j) under H0∼ tn−p ⇒ p-value = P (|tn−p| ≥ t∗). 16/46 Lecture 9 Example – Closer look at summary(lm(...)) Recall that i ∼ NID(0, σ2). An estimate of σ2 is 6.7052 = σ̂2 = RSS/(n− p) . 17/46 Lecture 9 Example – Closer look at summary(lm(...)) So R2a = 0.6101. 20/46 Lecture 9 Example – R and R2a The slow way (by hand) n = nrow(dat) p = 5 M1 = lm(N ~ 1 + AR + EL + DEc + DNI, data = dat) RSS = deviance(M1) # or sum(M1£res^2) Syy = (n - 1) * var(dat$N); (Syy - RSS) / Syy ## [1] 0.730068 1 - RSS / (n - p) / Syy * (n - 1) ## [1] 0.6100983 The fast way (with summary) summary(M1) ## Multiple R-squared: 0.7301,Adjusted R-squared: 0.6101 21/46 Lecture 9 Theory – Confidence intervals for Regression Parameters I A 100(1− α)% CI for the βj parameters, j = 0, 1, . . . , k = p− 1: β̂j ± t1−α/2,n−p · SE(β̂j) I You can get this from summary(lm(...)), e.g. qt(0.975, 9) ## [1] 2.262157 So a 95% confidence interval for β0 is 27.889± 2.262 · 6.182 or (13.91, 41.87). 22/46 Lecture 9 Example – Birds of the High Paramo par(mfrow = c(2, 2), mai = c(0.3, 0.3, 0.3, 0.3)) boxplot(M1$residuals); plot(M1, which = 1:3) − 10 0 5 10 15 20 25 30 − 10 0 10 Fitted values R es id ua ls ● ● ● ● ● ● ● ● ● ● ● ● ● ● Residuals vs Fitted Cocuy Cende Perija ● ● ●● ● ● ● ● ● ● ● ● ● ● −1 0 1 − 2 0 1 Normal Q−Q Cocuy Cende Perija 10 15 20 25 30 0. 0 1. 0 S ta nd ar di ze d re si du al s ● ● ● ●● ● ● ● ● ● ●● ● ● Scale−Location Cocuy Cende Perija 25/46 Lecture 9 Example – Birds of the High Paramo par(mfrow = c(2, 2), mai = c(0.3, 0.3, 0.3, 0.3)) plot(resid(M1) ~ AR + EL + DEc + DNI, data = dat) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.5 1.0 1.5 2.0 − 10 0 5 AR ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 1.0 1.5 2.0 − 10 0 5 EL re si d( M 1) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 1400 − 10 0 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 40 60 80 − 10 0 5 re si d( M 1) 26/46 Lecture 9 Comments – Conclusions for the model for the Paramo Data I The diagnostic plots do not provide any compelling evidence that the model assumptions are invalid (but n small). I The model that we fitted contains four predictor variables. There are a number of questions that we could ask regarding this model: I Overall, is there significant evidence that the model is useful in explaining variation in the number of bird species? I Are all the predictor variables providing useful information about the response? I What would we lose (and gain) if we tried to use a model with less predictor variables? I How would that impact confidence intervals for parameters? 27/46 Lecture 9 Theory – Outlying points in Rp I As with simple linear regression the fitted model should not be used to predict Y values for x combinations that are well away from the set of observed xi values (i.e. be wary of extrapolation). I This is not always easy to detect in multiple linear regression. I For example, when there are several predictor variables and each component of x0 is well within the observed range of corresponding values in x1,x2,x3 . . . ,xn but the point x0 is a long way from the observed points in p-space. 30/46 Lecture 9 Example – Outlying points in R2 I Here, x0 has x1 and x2 coordinates well within their respective ranges I but x0 is not close to the observed sample values in the 2D space. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 2 3 4 5 6 7 range of x1 ra ng e of x 2 x0 I In higher dimensions this type of behaviour is even harder to detect but we need to be on guard against extrapolating to extreme values. 31/46 Lecture 9 Example – Outlying Points I There may already be extreme values of x within the data. I Recall leverage is a measure of how far away the independent variable values of the observations are from the other observations. I The detection of such points proves difficult for higher dimensions of data hence using the leverage value helps in detecting outlying values in x (i.e. the high leverage points). I Recall Cook’s distance is a measure of outliers under a particular model and can be used to detect unusual Y -values under a particular model. 32/46 Lecture 9 pairs(dat) H 20 40 60 80 ● ● ● ● ● ● ● ● ● ● ● ● 30 50 ● ● ● ● ● ● ● ● ● ● ● 20 60 ● ● ● ● ● ● ● ● ● ● ● W ● ● ● ● ● ● ● ● ● ● ● 30 40 50 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 25 30 35 40 45 50 20 30 40 50 L 35/46 Lecture 9 Example – Catheter data summary(lm(L ~ H + W, data=dat)) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 21.0084 8.7512 2.401 0.0399 * ## H 0.1964 0.3606 0.545 0.5993 ## W 0.1908 0.1652 1.155 0.2777 ## --- ## Residual standard error: 3.943 on 9 degrees of freedom ## Multiple R-squared: 0.8053,Adjusted R-squared: 0.7621 ## F-statistic: 18.62 on 2 and 9 DF, p-value: 0.0006336 summary(lm(L ~ W, data=dat)) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 25.63746 2.00421 12.792 1.60e-07 *** ## W 0.27727 0.04399 6.303 8.87e-05 *** ## --- ## Residual standard error: 3.802 on 10 degrees of freedom ## Multiple R-squared: 0.7989,Adjusted R-squared: 0.7788 ## F-statistic: 39.73 on 1 and 10 DF, p-value: 8.871e-05 36/46 Lecture 9 Example – Catheter data cor(dat) ## H W L ## H 1.0000000 0.9610936 0.8811691 ## W 0.9610936 1.0000000 0.8938226 ## L 0.8811691 0.8938226 1.0000000 37/46 Lecture 9 Example – Caffeine Dose I Does caffeine stimulation affect the rate at which individuals can tap their fingers? I n = 30 male students randomly allocated to t = 3 groups of 10 for students each: I Group 1: zero caffeine dose I Group 2: low caffeine dose I Group 3: high caffeine dose I Allocation was blind (subjects did not know their caffeine dosage). I Two hours after treatment, each subject tapped fingers as quickly as possible for a minute. Number of taps recorded. 40/46 Lecture 9 Example – Caffeine Dose caffeine = read.table(file="caffeine.txt",header=TRUE) str(caffeine) boxplot(Taps ~ Dose, ylab="Taps", data=caffeine) High Low Zero 24 2 24 6 25 0 41/46 Lecture 9 Theory – Categorical Variable in Linear Models The model is Yij = µ+ αi + ij , j = 1, . . . , ni, i = 1, . . . , t, where I µ is the overall mean or grand mean I t is the number of levels for the categorical variable, I ni is the number of observations in category i, I αi is the effect due to category i, and I ij ∼ NID(0, σ2). 42/46 Lecture 9 Theory – Categorical Variable in Linear Models R applies the constraint β1 = α1 = 0, i.e. sets category 1 as the reference category. So the model fitted is in fact Yi′ = β0 + β2xi′2 + . . .+ βtxi′t + i′ , i ′ = 1, . . . , n So βk is the effect of category k relative to category 1. Note: I R organises categories in alphabetical order so category 1 for levels Zero, Low, High is High I Some categories are coded as numerical variables and needs to be explicitly coded as a factor using factor. E.g. x = c(1, 2, 3) str(x) ## num [1:3] 1 2 3 x = factor(x); str(x) ## Factor w/ 3 levels "1","2","3": 1 2 3 45/46 Lecture 9 Example – Categorical Variable in Linear Models summary(lm(Taps ~ Dose, data=caffeine)) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 248.3000 0.7047 352.326 < 2e-16 *** ## DoseLow -1.9000 0.9967 -1.906 0.06730 . ## DoseZero -3.5000 0.9967 -3.512 0.00158 ** I On average, male students who have low dosage of caffeine tap at a rate less than 1.9 compared to male students who have high caffeine dosage. I Likewise, male students who have zero dosage of caffeine tap at a rate less than 3.5 compared to male students who have high caffeine dosage. I And, male students who have low dosage of caffeine tap at a rate more than −1.9− (−3.5) = 1.6 compared to male students who have no caffeine. 46/46

Documents

questions

Introduction to Statistics, Lecture notes for Statistics, Lecture notes of Statistics

Related documents

Partial preview of the text