Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Properties, Confidence, and Prediction Intervals in Simple Linear Regression, Study notes of Mathematical Statistics

Statistical InferenceData AnalysisRegression Analysis

This document, authored by James H. Steiger from Vanderbilt University, provides an in-depth exploration of simple linear regression. Topics covered include statistical notation, ordinary least squares estimation, properties of least squares estimators, confidence intervals, and prediction intervals. The document also includes R code examples.

What you will learn

  • What are the properties of least squares estimators in simple linear regression?
  • How are confidence intervals and prediction intervals calculated in simple linear regression?
  • How are fitted values and residuals calculated in simple linear regression?
  • What is the role of the intercept (β0) and slope (β1) in simple linear regression?
  • What is the role of the Scheffe correction in confidence intervals for simple linear regression?

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

tarquin
tarquin 🇺🇸

4.3

(14)

40 documents

1 / 49

Toggle sidebar

Related documents


Partial preview of the text

Download Properties, Confidence, and Prediction Intervals in Simple Linear Regression and more Study notes Mathematical Statistics in PDF only on Docsity! The Simple Linear Regression Model James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 1 / 49 The Simple Linear Regression Model 1 Introduction 2 The Simple Linear Regression Model 3 Statistical Notation in ALR 4 Ordinary Least Squares Estimation Fitted Values and Residuals The Least Squares Criterion Analyzing the Forbes Data 5 Properties of Least Squares Estimators 6 Comparing Models: The Analysis of Variance Interpreting p-values Power Calculations The Coefficient of Determination R2 Revisiting Power Calculation 7 Confidence Intervals and Tests Introduction The Intercept β0 The Slope β1 A Predicted Value from a New Data Point A Fitted Value (Conditional Mean) Plotting Confidence Intervals 8 Residuals James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 2 / 49 The Simple Linear Regression Model The Simple Linear Regression Model We make two important assumptions concerning the errors: 1 We assume that E(ei |xi ) = 0, so if we could draw a scatterplot of the ei versus the xi , we would have a null scatterplot, with no patterns. 2 We assume the errors are all independent, meaning that the value of the error for one case gives no information about the value of the error for another case. Under these assumptions, if the population is bivariate normal, the errors will be normally distributed. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 5 / 49 The Simple Linear Regression Model The Simple Linear Regression Model One way of thinking about any regression model is that it involves a systematic component and an error component. 1 If the simple regression model is correct about the systematic component, then the errors will appear to be random as a function of x . 2 However, if the simple regression model is incorrect about the systematic component, then the errors will show a systematic component and be somewhat predictable as a function of x . 3 This is shown graphically in Figure 2.2 from the third edition of ALR. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 6 / 49 The Simple Linear Regression Model The Simple Linear Regression Model James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 7 / 49 Ordinary Least Squares Estimation Fitted Values and Residuals Fitted Values The sample-based estimates of β0 and β1 are denoted β̂0 and β̂1, respectively. The fitted value for case i is given Ê(Y |X = xi ), for which we use the shorthand notation ŷi , ŷi = Ê(Y |X = xi ) = β̂0 + β̂1xi (6) In other words, the fitted values are obtained by applying the sample regression equation to the sample data. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 10 / 49 Ordinary Least Squares Estimation Fitted Values and Residuals Residuals In a similar vein, we define the sample residuals: for the ith case, we have êi = yi − ŷi (7) James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 11 / 49 Ordinary Least Squares Estimation The Least Squares Criterion The Least Squares Criterion Residuals are the distances of the points from the sample-based regression line in the up-down direction, as shown in ALR4 Figure 2.2. (Figure 2.3 in ALR3.) James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 12 / 49 Ordinary Least Squares Estimation Analyzing the Forbes Data Analyzing the Forbes Data We can easily fit a simple linear regression for the Forbes data. Let’s predict Lpres from Temp. The easy way to get the regression coefficients is to use the linear model function in R. > LogPressure <- log(forbes$pres) > BoilingPoint <- forbes$bp > fit <- lm(LogPressure ~ BoilingPoint) > summary(fit) Call: lm(formula = LogPressure ~ BoilingPoint) Residuals: Min 1Q Median 3Q Max -0.0073622 -0.0033863 -0.0015865 0.0004322 0.0313139 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.9708662 0.0769377 -12.62 2.17e-09 *** BoilingPoint 0.0206224 0.0003789 54.42 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.00873 on 15 degrees of freedom Multiple R-squared: 0.995, Adjusted R-squared: 0.9946 F-statistic: 2962 on 1 and 15 DF, p-value: < 2.2e-16 James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 15 / 49 Ordinary Least Squares Estimation Analyzing the Forbes Data Detailed Regression Computations in R We can use the capabilities of R to perform the computational formulas as given in ALR. > X <- BoilingPoint > Y <- LogPressure > SXY <- sum((X-mean(X))*(Y-mean(Y))) > SXX <- sum((X-mean(X))^2 ) > SYY <- sum((Y-mean(Y))^2 ) > beta.hat.1 <- SXY/SXX > beta.hat.0 <- mean(Y) - beta.hat.1 * mean(X) > e.hat <- Y - (beta.hat.0 + beta.hat.1 * X) > RSS <- sum(e.hat^2) > n <- length(Y) > sigma.hat.squared <- RSS / (n-2) > sigma.hat <- sqrt(sigma.hat.squared) James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 16 / 49 Ordinary Least Squares Estimation Analyzing the Forbes Data Detailed Calculations Here are the results: > SYY [1] 0.2268754 > SXX [1] 530.7824 > SXY [1] 10.94599 > beta.hat.1 [1] 0.02062236 > beta.hat.0 [1] -0.9708662 > RSS [1] 0.001143315 > sigma.hat.squared [1] 7.622099e-05 > sigma.hat [1] 0.008730463 James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 17 / 49 Properties of Least Squares Estimators Variance of Estimators If we assume that errors have constant variance and are uncorrelated, then Var(β̂0) = σ2 ( 1 n + x2 SXX ) (17) Var(β̂1) = σ2 SXX (18) Var(σ̂2) = 2σ4 n − 2 (19) Cov(β̂0, β̂1) = −σ2 x SXX (20) James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 20 / 49 Properties of Least Squares Estimators Variance of Estimators Why do we care about those formulas? Because, as we shall see later, we use them for constructing confidence intervals and hypothesis tests. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 21 / 49 Properties of Least Squares Estimators Optimality Properties Weisberg discusses optimality properties of OLS estimators on page 27 of ALR: The Gauss-Markov theorem provides an optimality result for OLS estimates. Among all estimates that are linear combinations of the ys and unbiased, the OLS estimates have the smallest variance. If one believes the assumptions and is interested in using linear unbiased estimates, the OLS estimates are the ones to use. When the errors are normally distributed, the ols estimates can be justified using a completely different argument, since they are then also maximum likelihood estimates, as discussed in many mathematical statistics texts, for example, Casella and Berger (1990). James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 22 / 49 Properties of Least Squares Estimators Estimated Standard Errors Frankly, I don’t agree with this notational digression, although I should be clear that many authors use it. In a more consistent (if somewhat messier) notation, one should use se( ) to stand for the population quantity and ŝe( ) to stand for the estimated standard error. I suspect the convention of dispensing with the “hat” in the standard error notation was adopted for typographical convenience in the “old days” of painstaking mathematical typing. In any case, remember that when regression textbooks talk about “standard errors,” they are actually talking about estimated standard errors. Asymptotically, it doesn’t matter, but at small samples it can. Ultimately, of course, notation is a matter of personal preference. However, in this case, a deliberate notational inconsistency has been introduced. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 25 / 49 Comparing Models: The Analysis of Variance Interpreting p-values Interpreting p-values As you learned in Psychology 310, p-values are interpreted in such a way that if the p-value is less than α, then the null hypothesis is rejected at the α significance level. ALR has an extensive discussion revisiting this topic. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 26 / 49 Comparing Models: The Analysis of Variance Power Calculations Power Calculation When the null hypothesis is true, the F -statistic has a central F distribution. When it is false, and the assumption of fixed X holds, then the F -statistic has a non-central F distribution with 1 and n − 2 degrees of freedom, and a noncentrality parameter λ given by λ = β2 1SXX σ2 (24) The above equation for λ is not very useful in the context of regression analysis as we normally think about it. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 27 / 49 Comparing Models: The Analysis of Variance Revisiting Power Calculation Revisiting Power Calculation It is fairly easy to show, in the case of a single predictor, that for a population correlation ρ, λ = n ρ2 1− ρ2 (26) Proof. Substituting some well known identities, (i.e. β1 = ρσy/σx , σ2 = (1− ρ2)σ2 y , and nσ2 x = SXX), we get λ = β2 1SXX σ2 (27) = ρ2(σ2 y/σ 2 x)nσ2 x (1− ρ2)σ2 y (28) = n ρ2 1− ρ2 (29) Note in the above that the X scores are considered fixed, and so the population variance of X is σ2 x = SXX/n. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 30 / 49 Comparing Models: The Analysis of Variance Revisiting Power Calculation Revisiting Power Calculation The preceding expression allows one to calculate power in a linear regression in terms of the population ρ2 value, a much more natural metric for most users than SXX and β2 1 . However, careful consideration of the typical application of this formula reveals once again the artificiality of the “fixed X ” scores model that treats the X scores as if they were fixed and known (in a sense the entire population). In general, the X scores are random variates just like the Y scores, SXX will vary from sample to sample, the fixed scores model is not really appropriate, and the power value is an approximation. In the case of multiple regression, the approximation can be off by a substantial amount, but it is usually adequate. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 31 / 49 Confidence Intervals and Tests Introduction Confidence Intervals and Tests Introduction In Section 2.6 of ALR, Weisberg introduces several of the classic parametric hypothesis tests and confidence intervals calculated in connection with simple linear regression. 1 The Intercept β0. 2 The Slope β1. 3 Predicted Values from a New Data Point. 4 Fitted Values (Conditional Mean Estimates) on the Regression Line. 5 Residuals. We shall now consider each of these in turn, demonstrating calculations as we go. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 32 / 49 Confidence Intervals and Tests The Slope β1 Confidence Intervals and Tests The Slope β1 The (estimated) standard error of β1 is se(β̂1) = σ̂ SXX (32) A confidence interval for β1 may be constructed in the standard manner, with endpoints given by β̂1 ± t∗ se(β̂1) (33) James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 35 / 49 Confidence Intervals and Tests A Predicted Value from a New Data Point Confidence Intervals and Tests Two Kinds of Intervals around a Regression Line One often sees confidence regions plotted in connection with a regression line. There are actually two distinctly different kinds of plots: 1 A regression line has been calculated from a data set, then a new value x∗ becomes available, prior to the availability of the associated y∗. What is an appropriate confidence interval for the predicted value? 2 A regression line involves an (infinite) set of “fitted values” that represent conditional means for Y |X = x . What is a confidence interval for such a fitted value? James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 36 / 49 Confidence Intervals and Tests A Predicted Value from a New Data Point Confidence Intervals and Tests A Predicted Value from a New Data Point The first kind of interval is calculated as follows. The estimated value of y∗ is obtained by substituting x∗ into the estimated regression line, i.e., ỹ∗ = β̂0 + β̂1x∗ (34) Under the assumptions of fixed predictors regression, the conditional sampling variance of ỹ∗ given x∗ is a function of x∗ itself, i.e., Var(ỹ∗|x∗) = σ2 + σ2 ( 1 n + (x∗ − x)2 SXX ) (35) Recalling that SXX = (n − 1)S2 x , we can, after a little reduction, write a somewhat more revealing version of the formula as Var(ỹ∗|x∗) = σ2 ( n + 1 n + 1 n − 1 ( x∗ − x Sx )2 ) (36) James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 37 / 49 Confidence Intervals and Tests A Fitted Value (Conditional Mean) Confidence Intervals and Tests A Fitted Value (Conditional Mean) In some situations one may be interested in obtaining an estimate of E(Y |X = x). For example, in the heights data, one might estimate the population mean height of all daughters of mothers with a particular height x∗. This quantity is estimated by the fitted value ŷ = β0 + β1x∗. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 40 / 49 Confidence Intervals and Tests A Fitted Value (Conditional Mean) Confidence Intervals and Tests A Fitted Value (Conditional Mean) Regardless of whether or not you consider the fitted value itself an “estimate,” you can estimate it with the quantity ỹ∗ = β̂0 + β̂1x∗. This estimate has an estimated standard error of sefit(ỹ∗|x∗) = σ̂ ( 1 n + 1 n − 1 ( x∗ − x Sx )2 )1/2 (38) James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 41 / 49 Confidence Intervals and Tests A Fitted Value (Conditional Mean) Confidence Intervals and Tests A Fitted Value (Conditional Mean) Again, for a single value, a confidence interval for such an estimated conditional mean can be calculated with the standard approach, e.g., ỹ∗ ± t∗ sefit(ỹ∗|x∗) (39) where t∗ is the 1− α/2 critical value from the t distribution with n − 2 degrees of freedom. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 42 / 49 Confidence Intervals and Tests Plotting Confidence Intervals Confidence Intervals and Tests Plotting Confidence Intervals We can plot the prediction intervals and the confidence intervals for fitted values using R. Note that ALR recommends the Scheffe correction for the latter, but not for the former. One might ask, “Why?” Ostensibly, this is because in the former case, we are graphing what the confidence interval would be if we had observed a value x∗, while in the latter case, we are asking what the theoretical confidence intervals would be for the entire run of the regression line, based on the current data. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 45 / 49 Confidence Intervals and Tests Plotting Confidence Intervals Confidence Intervals and Tests Plotting Confidence Intervals Here is some commented code: > ##fit the simple regression model > attach(Heights) > m1 <- lm(dheight~mheight) > ##Create a run of 50 points across the x-axis > new <- data.frame(mheight=seq(55.4,70.8,length=50)) > ##create the confidence intervals > ## first for the prediction intervals > pred.w.plim <- predict(m1, new, interval="prediction") > ## next for the fitted value (conditional mean) > pred.w.clim <- scheffe.rescaled.ci(m1,0.95,new) > #Then we use matplot -- > # cbind takes all 3 columns of pred.w.clim > # and last two of pred.w.plim > matplot(new$mheight,cbind(pred.w.clim, pred.w.plim[,-1]), + col=c("black","red","red","blue","blue"),bty="l", + lty=c(2,1,1,1,1), type="l", ylab="Daughter's Height", + xlab="Mother's Height") > legend("bottomright", c("Prediction Interval", "Fitted Value C.I."), + lty = c(1, 1),col=c("blue","red")) James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 46 / 49 Confidence Intervals and Tests Plotting Confidence Intervals Confidence Intervals and Tests Plotting Confidence Intervals Here is the plot: 55 60 65 70 55 60 65 70 Mother's Height D au gh te r's H ei gh t Prediction Interval Fitted Value C.I. James H. Steiger (Vanderbilt University) The Simple Linear Regression Model 47 / 49
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved