Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Regression Analysis: Relationship between Number of Vehicles and Congestion Time, Assignments of Statistics

An analysis of the relationship between the number of vehicles and congestion time using regression analysis. The least squares line equation, interpretation of the results, and exercises to test the understanding of the concepts. The data is based on a scatterplot with observed data points and the least squares line.

Typology: Assignments

Pre 2010

Uploaded on 10/12/2009

koofers-user-ngb
koofers-user-ngb 🇺🇸

10 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Regression Analysis: Relationship between Number of Vehicles and Congestion Time and more Assignments Statistics in PDF only on Docsity! 'STAT 4230 I 6230 Homework ffl-solutions January 28, 2008 1. Exercise 3.10 (10 Points) a..Yes.. For the men, as the year increases, the winning time tends to decrease. The straight-line model is given by g-0o*0P*e" Because of the falliirg trend, we would expect the slope to be negative. b., Yes. For the women, as the year increases, we see tliat the winning time also tends to decrease. , The straight-line model is given by U:90+.1fi*e. Once again, we observe a falling tiend, so we would expect the slope to be negative. c. Since the slope of the women's line is steeper than that for the men, the slope of the women's line will tie greater in absolute value. d. No. The data were gathered from 1880 to 2000. Using this data to predict the time for the " year 2020 (20 years ahead of data) would be very risky. We have no idea what the relationship between time and year will be outside the observed range. Thus, we would not recommend using this model. 2. Exe_rcise 3.11 (10 Points) u. ,tn" scatterplot (with least squares line) STAT 4230i6230' Homework #1 CqFtlm tlDt ws ll# of V6hlclc 0.06 ; o.o{ t".= .5 o. os 7 o.oz o | 2 3 a 5 6 I I S l0 l! 12 13 l'l 15'16 lhbor of VGfi lcl6 b. The least squares line equation is 0:-0'00105*0.00321r, as determined from the SAS output below: Puaeter Estimates Variable Ib.tercept vehicles Para-Beter Estinate -0 . o0105 0 .00321 DF 1 L Standard Error t Value 0.00393 -o.27 0.00043237 7.43 Pr > ltl o.7940 <.0001 c. See pa.rt (a). d. For Bo : -O.OOf05, r : 0 is outside the range of observed r values, so ps dous not have any practical meaning other than being d, y-intercept. For €r : 0.00321, for every increase in the number of vehicles, the mean congestion time rises by 0.00321 minute. 3. Exercise 3.14 (10 Points) a. The scatterplot: STAT 4230,,6230 Homewor? #1 }|hb€r of Flrtc+Ere Killsd w36 ilaet Box f lt Oc4ilry 5.O 4.8 ,{.6 1.4 4.? 4.0 3.8 € 3.6 : 3.{ ; s.? :3.0 € 2.8 -- 2.6 it.n6 Z,Z : z,o ; 1.8 € r.6 t,2 t.o 0.8 0.6 0.4 o,2 o.0 go .to 50 " 60 ih:t Box ltt o@@rcy (x) Examining the plot, it appears that as nest box tit occupancy increases, the number offlycatchers killed also increases (a positive linear relationship). b. Ftom the SAS output, we see that 0o : -3.04686 and B1 : 0.10766. Because r : 0 is outside the range of observed r values, pq do"s not have any practical meaninjbther than being a y-intercept. We estimate that for each additional nest box tit occupancy, the mean number of flycatchers kiiled is estimated to rise by 0.10766. 4. Exercise 3.L9 (10 Points) - a. The scatterplot: I I 1 To determine if country credit risk,contributes inforrriation for the prediction of market volatility, we must test Flom the MINITAB'output, we see that the-test statistic t : -4.37 corresponds to apvalue of 0.000. For any level ofsigniflcance greater than o : 0.d00, we reject the null hypothesii. Suf[cient evidence exists to indicate that country credit risk contributes information for the prediction of mdrket volatility at ldvel o > 0.000. Looking at the plot, possible outliers may include the points (31.8, 87.0) for Argentina and (32.6, 74.1) for T\rrkey. These two points seem to stray off the general linear trend, so we will observe the impact on the least squares line cauied by removing them. In this case, I removed the two points in item (d), but your answer here depends on which ngin!(s) you removed. With the observitiors for Argehtina and T\rrkey removed, the output from SAS is fhe REG Procedure I'tode1: }lDDELl n * . DePeDdeDt Variable: rlsk Nurnber of observations Read 38 Nurober of 0bservations Used 38 d. a' Source 'Model , Error Corrected Toial" Analysis of Variaace Suu'of M"gr DF Squares Square F Value 1 1189.71548 1189.71548 74.!2 36 3032.29531 84.23043 37 4222.OtO79 Pr>F ) 0.0006 Root MSE ) Depeadent }leau Coeff Var 9.r777L 29,93947 30.65422 R-Square 0.2818 Adj R-Sq 0.2618 Vuiable DF Intercept 1 -r6di + 1 Pa!am6ter Estirate 47.03236 -o:26333 Parameter Estiioates Standard Error t Va1ue 4.78557 9.83 o.o7007 -3.76 Pr >.ltl ." <.0001 0.0006 The new regression line is A:47-03236 - 0'26333r' . We see that the parameter estimates have not changed dramatically. The test statistics for the predictor, t :.-3.76, corresponds to a small pialue of 0.0006. Thus; at level o > 0.0006, there is sufficient evidence to indicate that the average credit rating contributes information fot the , prediction of annualized risk(%). We have not lost this significant linear relationship by removing these two points. 7. Exercise 3.49 (10 Points) .r , a. The 95% prediction interval for g-when r : 64 is (1.0516, 6.6347). With 95% confidence, we predict that the nurnber of flycatchers liiUed with the nest box tit occupancy at 64% is between 1.0516 and 6.6347. b. The width o'f lhe 95To confidence interval for E(y) is naruorret when compared to the prediction interval foi y. In the'confide"nce intervA,l, we are looking at a mean value. The error of predicting a particular"value for g will always be larger than the error of estimating the mean value of y' represented bV E(y), for aparticular value of z. The'larger error means awider iiterval. c. If we try to use the model to predict the nuinber of flycatchers killed with a nest box tit occupancy of ISYo, we are going outside the range of c values for which we havd data. Thus, extrapolation will likely lead to erroneous results, so it is not recommended to use the model for r : 15. 8. Exercise 3.52 (1"0 Points) a. The scatterplot for the data appears below: , STAT 42&i62 Homework #1 Plot of H.lqht vsrus grast tleigftt Dtanctd -t8 < t6 '; . .-.' ) t ' 7- . oslo15e02s3o l2 I Brcaat Helght Diattd b. The SAS output from fitting a linear model to the data appears below: The REG Procedure ilodel: MODELl Dependent Variable: height Nurnber of Observations R€ad 36 Nulber of 0bservations Used. 36 t , Aaalysis of VariaDce Sum of ' llea Source DF Squares Square F Value Pr > F Model 1 783.24469 183.24469 65.10 <.0001 Error 34 95.70281 " 2.4L479Cdrrected Total 35 278.94750 Root, MSE L.67773 R-Square 0.6569 Dependent llea 17'90833 Adj R-Sq 0'6468 Coeff Var 9.36845 Paraseter Estiloates Parameter Staldard Variable DF Estlnate Errof t value . pr > ltl Intercept 1 9.14684 1.12131 8.16 <.0001 breast 1 0.48147 0.05967 A-O7 <.0001 The fitted line is ' I 0:9'14684f0.48147rc' where r is the breast height diameter and g is the height. c. See part (a) for the plot with the least squares line. d. We now examine the SAS output to detdrmine if the breast height diameter is a useful predictor of tree height..We test the hypotheses Hn:At:0 _ H:,Bi+o The test sfatistic is t : 8.07, which corresponds to a two-tailed pvalue of < 0.0001. This pvalue is less than o:'O.OS; which'means that we reject the nuii hypothesis. There is sufficient evid.ence at a:0.05 to indicate breast height diameter is a useful predictor of tree height. We how determine a 90% confidence interval of the average height of a white spruce tree with a breast height diameter of 20 cm. The confidence interval takes the form of " ' 0 i tqis. The 9070 corresponds to a confidence coefficient of 0.90 : 1-'o. Thus, o : 0.10, and $ - 0.1012:'0.05. Flom Table 2 in Appendix C, with df.:n-2:36-2:34,to.os,et=L697. The predicted g for r.:20 is g : 9.14634+ (0.48147 .20) : L8.7763. Flom the formula on"p. 131, . we see that {l y t (t7)s whbre ts is based on (n - 2) df.' Here, n : 36, ro : 20, -n : 18.1972222, arrd.,S5," : .D fi 7n(r2) : 127tL47 -(36'18.19722222) : 790.4698. Ftom the SAS output, s ='v/MSE : '/r&e\ x t.6778. The 90% confidence interral for the mean value of gr for c : 20 is 18.7763 + 1.697(0.300) =+ (t8.2672,19.2854). ' Therefore, we are g0To confident that a tree with a breast height diameter of 20 cm will have a mean tree height between 18.2672 and 19.2854 meters. 9. Exercise 3.54 (2O Points) When carrying out a simple linear regression, it is a good idea to plot the data to see if there is even a possible linear trend present. The plot below shows a positive linear relationship between r (the depth at which drilling begins) and g (the time to drill 5 feet). STAT 4230/6230 Homework #1 Plot of DrllllE Deptn wre: l,lDc to t}lll.s F@t ,/ t00 tzs t50 t75 200 22< 2So z7S 300 Dath.t rhtA Orllllq Bqlne! F*_t ' w lO = ' ;s :. ':6 '-1
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved