Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Linear Regression Model: Estimation and Hypothesis Testing, Exams of Systems Engineering

The linear regression model, its assumptions, and the methods for estimating its parameters using the least squares method. It also covers hypothesis testing using the f-statistic and t-statistic, as well as variable selection in regression analysis.

Typology: Exams

Pre 2010

Uploaded on 08/05/2009

koofers-user-wnk-1
koofers-user-wnk-1 🇺🇸

10 documents

1 / 22

Toggle sidebar

Related documents


Partial preview of the text

Download Linear Regression Model: Estimation and Hypothesis Testing and more Exams Systems Engineering in PDF only on Docsity! 1.4. SIMPLE LINEAR REGRESSION 11 the factor is held constant during the current experiment but may be varied in a future experiment. An illustration is given in Figure 1.3 from the injection molding experiment discussed in Section 1.2. Other designations of factors can be considered. For example, experimen- tal factors can be further divided into two types (control factors and noise factors), as in the discussion on the choice of factors in Section 1.2. For the implementation of experiments, we may also designate an experimental factor as “hard-to-change” or “easy-to-change.” These designations will be considered later as they arise. 1.4 Simple Linear Regression We use the following data to illustrate the simplest case of regression analysis, i.e., simple linear regression with one regression variable. Lea (1965) discussed the relationship between mean annual temperature and the mortality rate for a type of breast cancer in women. The data (shown in Table 1.1), taken from certain regions of Great Britain, Norway, and Sweden contains the mean annual temperature (in degrees F) and mortality index for neoplasms of the female breast. Table 1.1: Breast Cancer Mortality Data Mortality rate (M) 102.5 104.5 100.4 95.9 87.0 95.0 88.6 89.2 Temperature (T ) 51.3 49.9 50.0 49.2 48.5 47.8 47.3 45.1 Mortality rate (M) 78.9 84.6 81.7 72.2 65.1 68.1 67.3 52.5 Temperature (T ) 46.3 42.1 44.2 43.5 42.3 40.2 31.8 34.0 The first step in any regression analysis is to obtain a scatter plot. A scatter plot of mortality against temperature (Figure 1.4) reveals an increasing linear relationship between the two variables. Such a linear relationship between a response y and a predictor variable x can be expressed in terms of the following model y = β0 + β1x + ², (1.1) where ² is the random part of the model which is assumed to be normally distributed with mean 0 and variance σ2, i.e., ² ∼ N(0, σ2); because ² is normally distributed, so is y and var(y) = σ2. If N observations are collected in an experiment, the model for them takes the form yi = β0 + β1xi + ²i, i = 1, . . . , N, (1.2) where yi is the ith value of the response and xi is the corresponding value of the covariate. The unknown parameters in the model are the regression coefficients β0 and β1 and the error variance σ2. Thus, the purpose for collecting the data is to estimate and make inferences about these parameters. For estimating β0 and 12CHAPTER 1. BASIC CONCEPTS FOR EXPERIMENTAL DESIGN AND INTRODUCTORY REGRESSION ANALYSIS 35 40 45 50 60 70 80 90 10 0 Temperature Mo rta lity ra te Figure 1.4: Scatter Plot of Temperature versus Mortality, Breast Cancer Data. β1, the least squares criterion is used; i.e., the least squares estimators (LSEs), denoted by β̂0 and β̂1 respectively, minimize the following quantity: L(βo, β1) = N∑ i=1 (yi − (β0 + β1xi))2. (1.3) Taking partial derivatives of the above with respect to β0 and β1 and equat- ing them to zero, ∂L ∂β0 = 2 N∑ i=1 (yi − β0 − β1xi)(−1) = 0, ∂L ∂β1 = 2 N∑ i=1 (yi − β0 − β1xi)(−xi) = 0. From the above, the following two equations are obtained N∑ i=1 yi = Nβ̂0 + β̂1 N∑ i=1 xi, (1.4) N∑ i=1 xiyi = β̂0 N∑ i=1 xi + β̂1 N∑ i=1 x2i . (1.5) These are called normal equations. Solving them, the estimators of β0 and β1 are obtained as β̂1 = ∑N i=1(xi − x̄)(yi − ȳ)∑N i=1(xi − x̄)2 , (1.6) β̂0 = ȳ − β̂1x̄. (1.7) 1.5. TESTING OF HYPOTHESIS, INTERVAL ESTIMATION AND DIAGNOSTICS15 Using (1.9)– (1.12), the following expressions can be obtained for the mean, variance and covariance of β̂1 and β̂0: E(β̂1) = β1, (1.13) E(β̂0) = β0, (1.14) var(β̂1) = σ2∑N i=1(xi − x̄)2 , (1.15) var(β̂0) = σ2 ( 1 N + x̄2∑N i=1(xi − x̄)2 ) , (1.16) cov(β̂0, β̂1) = − x̄∑N i=1(xi − x̄)2 σ2. (1.17) Formulas in (1.13) – (1.17) are special cases of general mean and variance- covariance formulas for the least square estimator in multiple linear regression. See (1.43) and (1.44). An elementary proof without use of matrix will be given at the end of the section. From (1.13) and (1.14), we observe that β̂1 and β̂0 are unbiased estimators of β1 and β0 respectively. Clearly, to estimate var(β̂0), var(β̂1) and cov(β̂0, β̂1), it is necessary to obtain an estimate of σ2. This estimate can be obtained from the residuals ei. The sum of squares of the residuals, also called the error sum of squares (RSS) is given by RSS = N∑ i=1 e2i = N∑ i=1 (yi − ŷi)2, (1.18) which, after some straightforward algebraic manipulations, reduces to RSS = N∑ i=1 (yi − ȳ)2 − β̂21 N∑ i=1 (xi − x̄)2. (1.19) The degrees of freedom associated with SSE is N − 2. Thus, the mean square error is given by MSE = RSS/(N − 2). It can be shown that E(RSS) = (N − 2)σ2. Consequently, E(MSE) = σ2, and MSE is an unbiased estimator of σ2. In order to know whether the predictor variable x has explanatory power, it is necessary to test the null hypothesis H0 : β1 = 0. Under the assumption of normality of the error term ²i in (1.2), each yi is a normally distributed random variable. Since β̂1 can be expressed as a linear combination of yi’s and its mean and variance are given by (1.13) and (1.15), it follows that β̂1 ∼ N ( β1, σ 2/ N∑ i=1 (xi − x̄)2 ) . (1.20) 16CHAPTER 1. BASIC CONCEPTS FOR EXPERIMENTAL DESIGN AND INTRODUCTORY REGRESSION ANALYSIS The estimated standard error of β̂1 is thus given by Ŝ.E.(β̂1) = √ MSE∑N i=1(xi − x̄)2 . (1.21) For testing H0, the following statistic should be used t = β̂1 Ŝ.E.(β̂1) . (1.22) The above statistic follows a t-distribution with n− 2 degrees of freedom. The higher the value of t, the more significant is the coefficient β1. For the two-sided alternative H1 : β1 6= 0, p value = Prob(|tN−2| > |tobs|), where Prob(·) denotes the probability of an event, tν is a random variable that has a t distribution with ν degrees of freedom and tobs denotes the observed or computed value of the t-statistic. The t critical values can be found in Appendix C. A very small p value indicates that either we have observed something which rarely happens, or H0 is not true. In practice, H0 is rejected at level of significance α if the p value is less than α. Common values of α are 0.1, 0.05 or 0.01. Generally the p value gives the probability under the null hypothesis that the t statistic value for an experiment conducted in comparable conditions will exceed the observed value |tobs|. The smaller the p value, the stronger is the evidence that the null hypothesis does not hold. Therefore it provides a quanti- tative measure of the significance of effects in the experiment under study. The same interpretation can be applied when other test statistics and null hypotheses are considered. The 100(1− α)% confidence interval for β1 is given by β̂1 ± tN−2,α/2 Ŝ.E.(β̂1), where tN−2,α/2 ia the upper α/2 point of the t distribution with n− 2 degrees of freedom. If the confidence interval does not contain 0, H0 is rejected at level α. Another way of judging the explanatory power of the predictor variable is by splitting up the total variation associated with the response into two compo- nents. The quantity ∑N i=1(yi− ȳ)2 measures the total variation in the data and is called the corrected total sum of squares (CTSS). From (1.19), we observe that CTSS = RegrSS + RSS, (1.23) where RegrSS = β̂21 ∑N i=1(xi − x̄)2 is called the corrected regression sum of squares. Thus, the variation in the data is split into the variation explained by the regression model plus the residual variation. This relationship is given in a table called the ANalysis Of VAriance or ANOVA table displayed in Table 1.3. Based on (1.23), we can define R2 = RegrSS CTSS = 1− RSS CTSS . (1.24) 1.5. TESTING OF HYPOTHESIS, INTERVAL ESTIMATION AND DIAGNOSTICS17 Table 1.3: ANOVA Table for Simple Linear Regression Degrees of Sum of Mean Source Freedom Squares Squares regression 1 β̂21 ∑N i=1(xi − x̄)2 β̂21 ∑N i=1(xi − x̄)2 residual N − 2 ∑Ni=1(yi − ŷi)2 ∑N i=1(yi − ŷi)2/(N − 2) total(corrected) N − 1 ∑Ni=1(yi − ȳ)2 Because the R2 value measures the “proportion of total variation explained by the fitted regression model β̂0 + β̂1x,” a higher R2 value indicates a better fit of the regression model. It can be shown that R is the square of the product- moment correlation r between y = (yi)Ni=1 and x = (xi) N i=1, which is given by r = ∑N i=1(xi − x̄)(yi − ȳ)√∑N i=1(xi − x̄)2 √∑N i=1(yi − ȳ)2 . (1.25) The mean square is the sum of squares divided by the corresponding degrees of freedom, where the degrees of freedom are those associated with each sum of squares. As explained earlier, the mean square error, or the residual mean square, is an unbiased estimator of σ2. If the null hypothesis H0 : β1 = 0 holds, the F statistic β̂21 ∑N i=1(xi − x̄)2∑N i=1 e 2 i /(N − 2) (1.26) (the regression mean square divided by the residual mean square) has an F distribution with parameters 1 and N − 2, which are the degrees of freedom of its numerator and denominator, respectively. The p value is calculated by evaluating Prob(F1,N−2 > Fobs), (1.27) where F1,N−2 has an F distribution with parameters 1 and N − 2, and Fobs is the observed value of the F statistic. The F critical values can be found in Appendix D. The p value in (1.27) can be obtained from certain pocket calcu- lators or by interpolating the values given in Appendix D. An example of an F distribution is given in Figure 2.1 (in Chapter 2) along with its critical values. Let us now complete the analysis of the breast cancer mortality data. From Table 1.2, we obtain RSS = ∑16 i=1 e 2 i = 796.91. Consequently, σ̂ 2 = MSE = RSS/14 = 56.92. From the computations in the previous section, CTSS = 3396.44, and RegrSS = CTSS−RSS = 2599.53. Table 1.4 shows the ANOVA. The R2 is obtained as 2599.53/3396.44 = 0.7654, which means 76.54% of the variation in mortality is explained by the fitted model. 20CHAPTER 1. BASIC CONCEPTS FOR EXPERIMENTAL DESIGN AND INTRODUCTORY REGRESSION ANALYSIS Proof of (1.15)–(1.17): From (1.9), E(β̂1) = E( N∑ i=1 wiyi) = N∑ i=1 wiE(yi) = N∑ i=1 wi(β0 + β1xi) = β0 N∑ i=1 wi + β1 N∑ i=1 wixi = β1. (1.32) The last step follows from the fact that ∑N i=1 wi = 0 and ∑N i=1 wixi = 1, which follow easily from (1.10). Next, we have, var(β̂1) = var( N∑ i=1 wiyi) = N∑ i=1 w2i var(yi) = N∑ i=1 w2i σ 2 = σ2 N∑ i=1 w2i = σ2 1∑N i=1(xi − x̄)2 . The last step here follows from the relation N∑ i=1 w2i = ∑N i=1(xi − x̄)2{∑N i=1(xi − x̄)2 }2 = 1∑N i=1(xi − x̄)2 . From (1.7) and (1.32), we have that E(β̂0) = E(ȳ)− x̄E(β̂1) = β0 +β1x̄−β1x̄ = β0. Next, from (1.11) and (1.12), var(β̂0) = N∑ i=1 ( 1 N − x̄wi)2var(yi) = σ2( N N2 + x̄2 N∑ i=1 w2i − 2 x̄ N N∑ i=1 wi)2 = σ2( 1 N + x̄2∑N i=1(xi − x̄)2 ). The last step again follows from ∑N i=1 wi = 0 and ∑N i=1 w 2 i = 1PN i=1(xi−x̄)2 . Lastly, the covariance between β̂0 and β̂0 can be obtained as cov(β̂0, β̂1) = cov(ȳ − β̂1x̄, β̂1) = cov(ȳ, β̂1)− x̄var(β̂1). 1.6. MULTIPLE LINEAR REGRESSION 21 Now, cov(ȳ, β̂1) = cov( 1 N N∑ i=1 yi, N∑ i=1 wiyi) = 1 N N∑ i=1 wivar(yi) (by independence of y1, . . . , yN ) = σ2 N N∑ i=1 wi = 0 (since ∑N i=1 wi = 0). Thus, cov(β̂0, β̂1) = −x̄var(β̂1) = − x̄∑N i=1(xi − x̄)2 σ2. 1.6 Multiple Linear Regression Experimental data can often be modeled by the general linear model (also called the multiple regression model). Suppose that the response y is related to p co- variates (also called explanatory variables, regressors, predictors) x1, x2, . . . , xp as follows: y = β0 + β1x1 + · · ·+ βpxp + ², (1.33) where ² is the random part of the model which is assumed to be normally distributed with mean 0 and variance σ2, i.e., ² ∼ N(0, σ2); because ² is normally distributed, so is y and Var(y) = σ2. The structural part of the model is E(y) = β0 + β1x1 + · · ·+ βpxp + E(²) = β0 + β1x1 + · · ·+ βpxp. Here, E(y) is linear in the β’s, the regression coefficients, which explains the term linear model. If N observations are collected in an experiment, the model for them takes the form yi = β0 + β1xi1 + · · ·+ βpxip + ²i, i = 1, . . . , N, (1.34) where yi is the ith value of the response and xi1, . . . , xip are the corresponding values of the p covariates. These N equations can be written in matrix notation as: y = Xβ + ², (1.35) where y = (y1, . . . , yN )T is the N × 1 vector of responses, β = (β0, β1, . . . , βp)T is the (p + 1)× 1 vector of regression coefficients, ² = (²1, . . . , ²N )T is the N × 1 vector of errors, and X, the N × (p + 1) model matrix, is given as X =   1 x11 · · · x1p ... ... . . . ... 1 xN1 · · · xNp   . (1.36) 22CHAPTER 1. BASIC CONCEPTS FOR EXPERIMENTAL DESIGN AND INTRODUCTORY REGRESSION ANALYSIS The unknown parameters in the model are the regression coefficients β0, β1, . . . , βp and the error variance σ2. As in Section 1.4, the least squares criterion is used; i.e., the least squares estimators (LSEs), denoted by β̂, minimize the following quantity: N∑ i=1 (yi − (β0 + β1xi1 + · · ·+ βpxip))2 (1.37) which in matrix notation is (y − Xβ)T (y − Xβ). (1.38) In other words, the squared distance between the response vector y and the vector of fitted values Xβ̂ is minimized. In order to minimize the sum of squared residuals, the vector of residuals r = y − Xβ̂ (1.39) needs to be perpendicular to the vector of fitted values ŷ = Xβ̂, (1.40) that is, the cross product between these two vectors should be zero: rT ŷ = rT Xβ̂ = 0. An equivalent way of stating this is that the columns of the model matrix X need to be perpendicular to r, the vector of residuals, and thus satisfy XT (y −Xβ̂) = XT y −XT Xβ̂ = 0. (1.41) The solution to this equation is the least squares estimate, which is β̂ = (XT X)−1XT y. (1.42) From (1.42), E(β̂) = (XT X)−1XT E(y) = (XT X)−1XT Xβ, since E(y) = Xβ = β. (1.43) The variance of β̂ is var(β̂) = (XT X)−1XT var(y)((XT X)−1XT )T = (XT X)−1XT X(XT X)−1σ2I since var(y) = σ2I = (XT X)−1σ2. (1.44) 1.6. MULTIPLE LINEAR REGRESSION 25 Under H0, it has a t distribution with N − p− 1 degrees of freedom. This can also be used to construct confidence intervals since the denominator of the t statistic is the standard error of its numerator β̂j : β̂j ± tN−p−1,α/2 √ σ̂2(XT X)−1jj , (1.54) where tN−p−1,α/2 is the upper α/2 quantile of the t distribution with N − p− 1 degrees of freedom. See Appendix C for t critical values. Besides testing the individual βj ’s, testing linear combinations of the βj ’s can be useful. For testing aT β = ∑p j=0 ajβj , where a is a (p + 1)× 1 vector, it can be shown that aT β̂ ∼ N(aT β, σ2aT (XT X)−1a). (1.55) This suggests using the test statistic aT β̂√ σ̂2aT (XT X)−1a , (1.56) which has a t distribution with N − p− 1 degrees of freedom. Extra Sum of Squares Principle The extra sum of squares principle will be useful later for developing test statis- tics in a number of situations. Suppose that there are two models, say Model I and Model II. Model I is a special case of Model II, denoted by Model I ⊂ Model II. Let Model I : yi = β0 + β1xi1 + · · ·+ βqxiq + ²i (1.57) and Model II : yi = β0 + β1xi1 + · · ·+ βqxiq + βq+1xi,q+1 + · · ·+ βpxip + ²′i. (1.58) Model I ⊂ Model II since βq+1 = · · · = βp = 0 in Model I. Then, for testing the null hypothesis that Model I is adequate, i.e., H0 : βq+1 = · · · = βp = 0 (1.59) holds, the extra sum of squares principle employs the F statistic: (RSS(Model I)−RSS(Model II))/(p− q) RSS(Model II)/(N − p− 1) , (1.60) where RSS stands for the residual sum of squares. It follows that RSS(Model I)−RSS(Model II) = RegrSS(Model II)−RegrSS(Model I), (1.61) 26CHAPTER 1. BASIC CONCEPTS FOR EXPERIMENTAL DESIGN AND INTRODUCTORY REGRESSION ANALYSIS where RegrSS denotes the regression sum of squares; thus, the numerator of the F statistic in (1.60) is the gain in the regression sum of squares for fitting the more general Model II relative to Model I, i.e., the extra sum of squares. When (1.59) holds, the F statistic has an F distribution with parameters p − q (the difference in the number of estimated parameters between Models I and II) and N − p − 1. The extra sum of squares technique can be implemented by fitting Models I and II separately, obtaining their respective residual sums of squares, calculating the F statistic above, and then computing its p value. 1.7 Variable Selection in Regression Analysis In the regression fitting of the linear model (1.34), those covariates whose re- gression coefficients are not significant may be removed from the full model. A more parsimonious model (i.e., one with fewer covariates) is preferred as long as it can explain the data well. This follows from the principle of parsi- mony (or Occam’s razor), a principle attributed to the 14th-century English philosopher, William of Occam, which states “entities should not be multiplied beyond necessity.” It is also known that a model that fits the data too well may give poor predictions. The goal of variable selection in regression analysis is to identify the smallest subset of the covariates that explains the data well; one hopes to capture the true model or at least the covariates of the true model with the largest regression coefficients. One class of strategies is to use a model selection criterion to evaluate all possible subsets of the covariates and select the subset (which corresponds to a model) with the best value of the criterion. This is referred to as best subset regression. To maintain a balance between data fitting and prediction, a good model selection criterion should reward good model fitting as well as penalize model complexity. The R2 in (1.47) is not a suitable criterion because it increases as the number of covariates increases. That is, it does not penalize excessively large models. An alternative criterion is the adjusted R2 (Wherry, 1931), which takes into consideration the reduction in degrees of freedom for estimating the residual variance with inclusion of pre- dictor variables in the model. For a model containing p covariates, the adjusted R2 is given by R2a = 1− RSS/(N − p− 1) CTSS/(N − 1) . (1.62) Note that the difference between R2a and the expression for R 2 in (1.47) is in the degrees of freedom in the denominator and the numerator of (1.62). If an insignificant variable is augmented to a model, the R2 will increase, but the adjusted R2a may decrease. Another commonly used criterion is the Cp statistic (Mallows, 1973). Sup- pose there are a total of q covariates. For a model that contains p regression coefficients corresponding to p − 1 covariates and an intercept term β0, define its Cp value as Cp = RSS s2 − (N − 2p), (1.63) 1.7. VARIABLE SELECTION IN REGRESSION ANALYSIS 27 where RSS is the residual sum of squares for the model, s2 is the mean-squared error (see (1.48)) for the model containing all q covariates and β0, and N is the total number of observations. As the model gets more complicated, the RSS term in (1.63) decreases while the value p in the second term increases. The counteracting effect of these two terms prevents the selection of extremely large or small models. If the model is true, E(RSS) = (N − p)σ2. Assuming that E(s2) = σ2, it is then approximately true that E(Cp) ≈ (N − p)σ 2 σ2 − (N − 2p) = p. Thus one should expect the best fitting models to be those with Cp ≈ p. Further theoretical and empirical studies suggest that models whose Cp values are low and are close to p should be chosen. For moderate to large q, fitting all subsets is computationally infeasible. An alternative strategy is based on adding or dropping one covariate at a time from a given model, which requires fewer model fittings but can still identify good fitting models. It need not identify the best fitting models as in any optimization that optimizes sequentially (and locally) rather than globally. The main idea is to compare the current model with a new model obtained by adding or deleting a covariate from the current model. Call the smaller and bigger models Model I and Model II, respectively. Based on the extra sum of squares principle in Section 1.6, one can compute the F statistic in (1.60), also known as a partial F , to determine if the covariate should be added or deleted. The partial F statistic takes the form RSS(Model I)−RSS(Model II) RSS(Model II)/ν , (1.64) where ν is the degrees of freedom of the RSS (residual sum of squares) for Model II. Three versions of the strategy are considered next. One version is known as backward elimination. It starts with the full model containing all q covariates and computes partial F ’s for all models with q− 1 covariates. At the kth step, Model II has q− k + 1 covariates and Model I has q−k covariates, so that ν = N−(q−k+1)−1 = N−q+k−2 in the partial F in (1.64). At each step, compute the partial F value for each covariate being considered for removal. The one with the lowest partial F , provided it is smaller than a preselected value, is dropped. The procedure continues until no more covariates can be dropped. The preselected value is often chosen to be F1,ν,α, the upper α critical value of the F distribution with 1 and ν degrees of freedom. Choice of the α level determines the stringency level for eliminating covariates. Typical α’s range from α = 0.1 to 0.2. A conservative approach would be to choose a smaller F (i.e., a large α) value so that important covariates are not eliminated. Note that the statistic in (1.64) does not have a proper F distribution so that the F critical values serve only as guidelines. The literature often refers to them as F-to-remove values to make this distinction. Another version is known as forward selection, which starts with the model containing an intercept and then adds one covariate at a time. The covariate 30CHAPTER 1. BASIC CONCEPTS FOR EXPERIMENTAL DESIGN AND INTRODUCTORY REGRESSION ANALYSIS Table 1.6: Multiple Regression Output for Air Pollution Data Standard Predictor Coefficient Error t p Value Constant 1332.7 291.7 4.57 0.000 JanTemp -2.3052 0.8795 -2.62 0.012 JulyTemp -1.657 2.051 -0.81 0.424 RelHum 0.407 1.070 0.38 0.706 Rain 1.4436 0.5847 2.47 0.018 Education -9.458 9.080 -1.04 0.303 PopDensi 0.004509 0.004311 1.05 0.301 %NonWhit 5.194 1.005 5.17 0.000 %WC -1.852 1.210 -1.53 0.133 pop 0.00000109 0.00000401 0.27 0.788 pop/hous -45.95 39.78 -1.16 0.254 income -0.000549 0.001309 -0.42 0.677 logHC -53.47 35.39 -1.51 0.138 logNOx 80.22 32.66 2.46 0.018 logSO2 -6.91 16.72 -0.41 0.681 Table 1.7: ANOVA Table from Air Pollution Data Degrees of Sum of Mean Source Freedom Squares Squares F p Value Regression 14 173383 12384 10.36 0.000 Residual Error 44 52610 1196 Total 58 225993 1.8. ANALYSIS OF THE AIR POLLUTION DATA 31 Table 1.8: Best Subsets Regression using Cp Statistic Source Subset size R2a Cp s Variables 4 69.7 67.4 8.3 35.624 1,4,7,13 5 72.9 70.3 4.3 34.019 1,4,5,7,13 6 74.2 71.3 3.7 33.456 1,4,6,7,8,13 7 75.0 71.6 4.3 33.290 1,4,6,7,8,12,13 8 75.4 71.5 5.4 33.322 1,4,5,6,7,8,10,12,13 Table 1.9: Stepwise Regression Output for Air Pollution Data Step 1 2 3 4 5 6 7 Constant 887.9 1208.5 1112.7 1135.4 1008.7 1029.5 1028.7 %NonWhit 4.49 3.92 3.92 4.73 4.36 4.15 4.15 t value 6.40 6.26 6.81 7.32 6.73 6.60 6.66 p value 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Education -28.6 -23.5 -21.1 -14.1 -15.6 -15.5 t value -4.32 -3.74 -3.47 -2.10 -2.40 -2.49 p value 0.000 0.000 0.001 0.041 0.020 0.016 logSO2 28.0 21.0 26.8 -0.4 t value 3.37 2.48 3.11 -0.02 p value 0.001 0.016 0.003 0.980 JanTemp -1.42 -1.29 -2.15 -2.14 t value -2.41 -2.26 -3.25 -4.17 p value 0.019 0.028 0.002 0.000 Rain 1.08 1.66 1.65 t value 2.15 3.07 3.16 p value 0.036 0.003 0.003 logNOx 42 42 t value 2.35 4.04 p value 0.023 0.000 s 48.0 42.0 38.5 37.0 35.8 34.3 34.0 R2 41.80 56.35 63.84 67.36 69.99 72.86 72.86 R2(adj) 40.78 54.80 61.86 64.94 67.16 69.73 70.30 Cp 55.0 29.5 17.4 12.7 9.7 6.3 4.3 Keeping in mind the principle of parsimony explained in Section 1.7, the ob- jective is now to fit a reduced model containing fewer number of variables. Table 1.6 shows that four variables - JanTemp, Rain, %NonWhit and logNOx are significant at 5% level. However, the strategy of retaining the significant predic- tors in the fitted multiple regression model may not work well if some variables are strongly correlated. We therefore employ the two variable selection tech- niques described in Section 1.7. Table 1.8 shows the output for the best subsets regression using Mallow’s Cp statistic (only 5 out of 14 rows are shown). Each row of the table corresponds to the “best” (the one with minimum Cp) model for a fixed number of predictor variables. For example, among the ( 14 4 ) subsets of predictors containing 4 variables, the one containing, coincidentally, the pre- dictors JanTemp, Rain, %NonWhit and logNOx has the lowest Cp value of 8.3 and is considered the best. 32CHAPTER 1. BASIC CONCEPTS FOR EXPERIMENTAL DESIGN AND INTRODUCTORY REGRESSION ANALYSIS Although the best model with six predictors has the lowest Cp value of 3.7, it does not satisfy the criterion Cp ≈ p (note that p is one more than the number of predictors because it also includes the intercept term). Observe that the best model containing five variables (p = 6) has its Cp value (4.3) closer to p. Therefore, on the basis of the Cp criterion and the principle of parsimony, one would be inclined to choose a model containing the five variables JanTemp, Rain, %NonWhit, Education and logNOx. Let us now use a stepwise regression approach to find the best model. Ta- ble 1.9 summarizes the stepwise regression output. The α values corresponding to F -to-remove and F -to-enter were both taken as 15% (the default value in standard statistical software). The output does not show the F -to-enter or F -to-remove value, rather, it shows the t-statistic corresponding to each coeffi- cient in the multiple regression model after inclusion or exclusion of a variable at each step. After 7 steps, the method chooses a model with five predictors JanTemp, Rain, %NonWhit, Education and logNOx. Observe that al- though, at the third step, logSO2 entered the model, it was dropped at the 7th step. This means that, at the third step, when the model consisted of two variables (%NonWhit and Education), among the remaining 12 predictor vari- ables, inclusion of logSO2 increased the partial F by the maximum amount. Obviously, this partial F (not shown in the table) was more than the cut-off value. At step 6, logNOx was included in the model following exactly the same logic. However, after running a multiple regression with six variables follow- ing the inclusion of logNOx, the t-value for logSO2 drops drastically with the corresponding p value of 0.98 (see Table 1.9). This is due to a strong positive correlation between logNOx and logSO2, referred to as multicollinearity in re- gression literature. Consequently, at step 7, the F -to-remove value for logSO2 becomes very large (again, this is not shown in the output) and results in drop- ping of this variable. Eventually, the final model selected by stepwise regression is exactly the same as the one selected by using the Cp statistic. The final model can be obtained either directly from the last stage of the stepwise regression output, or by running a multiple regression of mortality on the five significant variables. The coefficients of these five variables in the final model will be different from the corresponding coefficients in Table 1.6, as the model now has fewer variables. The fitted model is MORTALITY = 1028.7− 2.14 JanTemp + 1.65 Rain− 15.5 Education + 4.15 %NonWhit + 42 logNOx, (1.65) with an R2 of 72.9% and an adjusted R2a of 70.3%. From (1.65) one can conclude that, after adjusting for the effects of JanTemp, Rain, Education and %NonWhit, the pollutant NOx has a significant effect on mortality, while the other two pollutants HC and SO2 do not. 1.9 Practical Summary 1. Experimental problems can be divided into five broad categories:
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved