Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Autocorrelation and Seasonality in Time Series Analysis: A Comprehensive Guide - Prof. Joh, Study notes of Statistics

An in-depth exploration of autocorrelation and seasonality in time series analysis. It covers various methods for detecting autocorrelation, such as graphical and numerical diagnostics, and discusses the implications of autocorrelation for stationarity. Additionally, the document introduces seasonal modeling and box-jenkins arima models to account for seasonal dependence structures. This resource is essential for students and researchers in statistics, econometrics, and related fields.

Typology: Study notes

Pre 2010

Uploaded on 07/31/2009

koofers-user-xdc
koofers-user-xdc 🇺🇸

10 documents

1 / 82

Toggle sidebar

Related documents


Partial preview of the text

Download Autocorrelation and Seasonality in Time Series Analysis: A Comprehensive Guide - Prof. Joh and more Study notes Statistics in PDF only on Docsity! 5-0 Stat 5100 Notes, Spring 2009 Unit 5: Time Series Section Topic 5.0 Summary / Overview 5.1 Autocorrelation (Hamilton pp. 118-124) 5.2 Stationarity (Bowerman pp. 437-441, 450-451) 5.3 AR & MA Models (pp. 467-470, 442-457) 5.4 ARIMA Models (pp. 474-476) 5.5 Forecasting & Goodness of Fit (pp. 462-467, 496-504) 5.6 Seasonal Modeling (Table 12.1) 5-1 5.0 Summary / Overview Homework 5 intro: http://www.leftbusinessobserver.com/BushNGas.html Response Y collected in some sequential manner: time, space Want to make useful forecasts (short-term predictions) Want to understand what influences Y : • recurring patterns in Y • effect of other variables (X1, . . . , Xk−1) on Y • dependence among observations (due to sequential nature) 5-4 All clear? The REG Procedure Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3.82800 0.10064 38.03 <.0001 temp 1 0.01286 0.00170 7.57 <.0001 precip 1 -0.04743 0.02123 -2.23 0.0271 campaign 1 -0.24698 0.11348 -2.18 0.0313 5-5 Linear regression model: Yi = β0 + β1Xi,1 + . . . + βk−1Xi,k−1 + i, i = 1, . . . , n Assumption: 1, . . . , n iid N(0, σ2) What if not independent? 1. bj estimates unbiased but not minimum variance (inefficient) 2. MSE can severely underestimate σ2 ⇒ var. of bj underestimated ⇒ usual inferences not applicable 5-6 When could error terms be dependent? • observations collected serially in time: − closing price of GE stock every day − rainfall every month − population every census • observations collected in geographic sequence: − air quality at each mile marker of freeway − water pH every km along river − soil “richness” at points along / throughout a soy field • others - data collected / observed in some sequence 5-9 For significance level α, sample size n, and k − 1 predictors, get critical values dL and dU from table (like A4.4 on p. 355-356 of Hamilton text) d < dL ⇒ reject H0 at level α d > dU ⇒ fail to reject H0 at level α dL ≤ d ≤ dU ⇒ test inconclusive at level α Test for negative autocorrelation (H1 : φ < 0) : − calculate d as above, then compare 4− d to critical values 5-10 Back to Concord2 data ... Look at autocorrelation in Durbin-Watson test Durbin-Watson D 0.535 Pr < DW <.0001 Pr > DW 1.0000 Number of Observations 137 1st Order Autocorrelation 0.730 NOTE: Pr<DW is the p-value for testing positive autocorrelation, and Pr>DW is the p-value for testing negative autocorrelation. d = α = 0.01 φ̂1 = Table n = 137 dL = k − 1 = 3 dU = Conclusion here: 5-11 Was the campaign successful? What’s different here? Estimates of Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.1261 1.000000 | |********************| 1 0.0921 0.730231 | |*************** | Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value 1 -0.730231 0.059465 -12.28 Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 3.8226 0.1222 31.29 <.0001 temp 1 0.0119 0.002103 5.66 <.0001 precip 1 -0.0358 0.0113 -3.18 0.0018 campaign 1 -0.1901 0.1938 -0.98 0.3284 Autocorrelation Estimate times 100 100, 90; 80 70; 60; 501 40; 30! 20; —_ o 5 Correlogram 0123456789111111111122222 012345678902345 Number of lags 5-14 5-15 Remedial measures for autocorrelation • add predictor variables – trend? • transform predictors and/or response • account for error dependence structure: – Box-Jenkins (ARIMA) models - iterative process: 1. identify tentative model 2. use historical data to fit model 3. diagnostic checking 4. forecast future time series values – model assumptions: ? homogeneity, stationarity, invertibility; next section 5-16 5.2 Stationarity Linear model, revised: Yt = β0 + β1Xt,1 + . . . + βk−1Xt,k−1 + t Time series: Y1, Y2, . . . , Yt, . . . , Yn (n = T sometimes) First-order stationary: E[Yt] = µt ≡ µ for all t Second-order stationary if: V ar[Yt] = σ2t ≡ σ2 for all t (homogeneity) Intuitive diagnostic: “looks” the same (mean and variance) in every time window 5-19 (b) cyclic trends i. small # obs. per cycle: add dummy variables − quarterly: 4 quarters ⇒ # dummy vars.? − monthly: 12 months ⇒ # dummy vars.? ii. large # obs. (L) per cycle (too many for dummy vars.): − consider trigonometric functions of t as predictors: X1 = sin 2πt L X2 = cos 2πt L What kinds of cycles would these “remove”? (sketch) X3 = t sin 2πt L X4 = t cos 2πt L What kinds of cycles would these “remove”? (sketch) 5-20 3. “Differencing” - for “stubborn” trends First differences: Zt = Yt − Yt−1, t = 2, . . . , n Second differences: Wt = Zt − Zt−1 = Yt − 2Yt−1 + Yt−2, t = 3, . . . , n Algebraically, what do first differences do to linear effect of time? Yt = a + bt ⇒ Zt = Yt − Yt−1 = . . . 5-21 Do second differences remove quadratic time effect? Yt = a + bt + ct2 ⇒ Zt = Yt − Yt−1 = . . . ⇒ Wt = Zt − Zt−1 = . . . Higher-order differences (rare in practice) remove higher-order time effects But - differencing can destroy cyclic behavior • ⇒ hurts ability to forecast (loss of information) • a remedial measure of last resort Trends remain? Plot of Residuals for Monthly Hotel Room Averages Log of Data — after removing linear time effect 0.32: 0.27" 0.22 | f 0.17- 0.12: 0.07: 0.02 rte tl | 5 , ~0.03 || + —0.08° —0.13- —0.18- —0.23. : 0 43 86 129 172 Time 5-24 Residual Plot of Residuals for Monthly Hotel Room Averages Plot of Residuals for Monthly Hotel Room Averages Log of Data, w/ Month Dummy Vars; how better? Log of Data — and removing linear time effect 5-25 a — ee —* te — Se =, SST SS ———o _——— es ar = — t + — SSS ee ————— oo SS See * = SSS eS a |. i! a SS — = ~ _ a, ees oo) renpisey 172 129 129 172 86 Time Time Predicted Values from Regression Model with Dummy Variables for Months seBeiene Wool ja}oy AjyjuO/| 90 120 150 180 Time 60 30 5-26 Bonus material: Generalized differencing: Zt = Yt − ρYt−1 Methods to estimate ρ: (a) Differencing; will return to this (ρ ≡ 1) (b) Cochrane-Orcutt (primitive Yule-Walker; be cautious with small n or large ρ) (c) Hildreth-Lu (primitive ULS; be cautious with small n or large ρ) 5-29 Graphical checks for dependence structure: The ARIMA Procedure Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 3415.718 1.00000 | |********************| 1 -1719.956 -.50354 | **********| . | 2 416.701 0.12200 | . |** . | 3 -723.256 -.21174 | . ****| . | 4 273.522 0.08008 | . |** . | 5 66.440027 0.01945 | . | . | 6 396.723 0.11615 | . |** . | 7 -742.192 -.21729 | . ****| . | 8 861.426 0.25219 | . |***** . | 9 -655.675 -.19196 | . ****| . | 10 192.042 0.05622 | . |* . | Partial Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 -0.50354 | **********| . | 2 -0.17625 | .****| . | 3 -0.31556 | ******| . | 4 -0.26367 | *****| . | 5 -0.15282 | . ***| . | 6 0.04200 | . |* . | 7 -0.19396 | .****| . | 8 0.11850 | . |** . | 9 0.05068 | . |* . | 10 -0.06969 | . *| . | 5-30 Autocorrelation function (ACF or SACF) • measure linear association between time series observations separated by a lag of m time units: rm = ∑n−m t=b (Zt − Z̄)(Zt+m − Z̄)∑n t=b(Zt − Z̄)2 , Z̄ = ∑n t=b Zt n− b + 1 SE of rm is Srm = √ 1 + 2 ∑m−1 l=1 r 2 l√ n− b + 1 (b = 1 unless use differencing) • call rm the sample autocorrelation function: SACF (m) or ÂCF (m) • sometimes used: trm = rm/Srm 5-31 Autocorrelation plot (or SAC): • bar-plot rm vs. m for various lags m (sketch) • lines often added to represent 2 SE’s (sketch) – rough 95% confidence intervals – if rm is more than 2 SE’s away from zero, consider it “significant” (rough: non-zero) – compare |trm | to 2 (for lags m ≤ 3, use 1.6 because “low” lags most important to pick up) • determine stationarity and identify “MA(q)” structure 5-34 Look at ACF: MA(1) The ARIMA Procedure Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 3415.718 1.00000 | |********************| 1 -1719.956 -.50354 | **********| . | 2 416.701 0.12200 | . |** . | 3 -723.256 -.21174 | . ****| . | 4 273.522 0.08008 | . |** . | 5 66.440027 0.01945 | . | . | 6 396.723 0.11615 | . |** . | 7 -742.192 -.21729 | . ****| . | 8 861.426 0.25219 | . |***** . | 9 -655.675 -.19196 | . ****| . | 10 192.042 0.05622 | . |* . | Partial Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 -0.50354 | **********| . | 2 -0.17625 | .****| . | 3 -0.31556 | ******| . | 4 -0.26367 | *****| . | 5 -0.15282 | . ***| . | 6 0.04200 | . |* . | 7 -0.19396 | .****| . | 8 0.11850 | . |** . | 9 0.05068 | . |* . | 10 -0.06969 | . *| . | 5-35MA(1) model fit to Overshort data The ARIMA Procedure Unconditional Least Squares Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag MU -5.12443 0.35073 -14.61 <.0001 0 MA1,1 0.99999 0.26992 3.70 0.0005 1 Constant Estimate -5.12443 Variance Estimate 1996.541 Std Error Estimate 44.68267 AIC 600.9357 SBC 605.0218 Number of Residuals 57 Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations--------------- 6 4.82 5 0.4379 0.119 0.131 -0.054 0.102 0.130 0.123 12 13.18 11 0.2817 -0.090 0.079 -0.210 -0.161 -0.178 -0.041 18 29.47 17 0.0304 0.098 -0.141 -0.273 -0.173 -0.207 -0.151 24 32.94 23 0.0821 -0.084 -0.071 0.068 -0.057 -0.086 0.095 5-36 Partial Autocorrelation Function (PACF or SPACF) • autocorrelation of time series observations separated by a lag of m, with the effects of the intervening observations eliminated rm,m = r1 if k = 1 rm,m = rm − ∑m−1 l=1 rm−1,lrm−l 1− ∑m−1 l=1 rm−1,lrl if k ≥ 2 where rm = SACF (m) and rm,l = rm−1,l − rm,mrm−1,m−l for l = 1, . . . ,m− 1 SE of rm,m is Srm,m = 1/ √ n− b + 1 (b = 1 unless use differencing) • call rm,m the sample partial autocorrelation function: SPACF (m) or ̂PACF (m) • sometimes used: trm,m = rm,m/Srm,m 5-39More common representation for AR(p): • Zt = δ + φ1Zt−1 + φ2Zt−2 + . . . + φpZt−p + at – φi are unknown parameters; random shock at iid N(0, σ2) – δ = µ(1− φ1 − . . .− φp); µ = E[Zt] Zt are “residuals” ⇒ µ ≡ 0 ⇒ common to assume δ = 0 Special case: Random Walk Model • Zt = Zt−1 + at • AR(1) is a discrete time continuous Markov Chain (probability at time t depends only on state at time t− 1) AR(p): value of response (Zt) at time t depends on response values at previous p times 5-40 Example 5.3.2: General Electric’s gross investment (in millions of dollars) for years 1935-1954 5-41 The ARIMA Procedure Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.073518 1.00000 | |********************| 1 0.021289 0.28957 | . |****** . | 2 -0.038026 -.51723 | **********| . | 3 -0.039301 -.53458 | .***********| . | 4 -0.0051560 -.07013 | . *| . | 5 0.022797 0.31009 | . |****** . | 6 0.016539 0.22497 | . |**** . | 7 -0.0022313 -.03035 | . *| . | 8 -0.0093295 -.12690 | . ***| . | 9 -0.0029266 -.03981 | . *| . | 10 -0.0011643 -.01584 | . | . | Partial Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.28957 | . |****** . | 2 -0.65610 | *************| . | 3 -0.18504 | . ****| . | 4 -0.23526 | . *****| . | 5 -0.05568 | . *| . | 6 -0.18820 | . ****| . | 7 -0.01776 | . | . | 8 -0.03811 | . *| . | 9 0.04283 | . |* . | 10 -0.13331 | . ***| . | 5-44 Inverse Autocorrelation Function (IACF or SIACF): • similar to PACF, and rarely discussed “Autoregressive” Process: • current & future values (of Zt) depend on historical values of same time series (Zt) • (1− φ1B − . . .− φpBp)Zt − δ = at “Moving” Average Process: • current & future values (of Zt) depend on past random shocks (at) • (1− θ1B − . . .− θqBq)−1(Zt − δ) = at 5-45 Example 5.3.3: gas prices since 1976 adjusted for inflation Gas Price Data Gas Price Data 2001 2010 1992 1983 1974 S vt 9 ° 9 ° 9 ° oO 0 NA A - a (sue|loq jueUND) pepesjuf JO 90d Equivalent Buying Power of 1976 Dollar So Oo So O° So O° S oS bolUmDGOUmllCUOHLUCCOGCOHHSC“(‘<‘i‘C + 98 M8 N WN TFT Fy 2 ro ra axes) LO 1 oO is / oO © oO is is & eqoqgogoqgoaoaoa aoe Bas OomnaoRFrANnAH SG OO MDaNAN AN AN FF papegiun jo a0uq ebeleny yenuuy year year 5-46 The REG Procedure Dependent Variable: price Root MSE 38.81984 R-Square 0.5864 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.49939 21.53252 -0.07 0.9449 infl76 1 0.56355 0.08367 6.74 <.0001 Note “extra” output on next slide; what is H0? 5-49 What about a “composite” (AR and MA) model? ARMA(1,1) model fit to gas data The ARIMA Procedure Name of Variable = resid Unconditional Least Squares Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag MU -2.32029 14.86111 -0.16 0.8769 0 MA1,1 -0.23685 0.56261 -0.42 0.6767 1 AR1,1 0.62825 0.28952 2.17 0.0378 1 Constant Estimate -0.86256 Variance Estimate 728.6837 Std Error Estimate 26.99414 AIC 324.2865 SBC 328.8656 Number of Residuals 34 Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations--------------- 6 0.96 4 0.9151 0.005 0.034 0.022 -0.071 -0.044 -0.118 12 2.66 10 0.9884 0.009 0.060 -0.142 -0.084 0.047 -0.028 18 3.68 16 0.9994 -0.074 -0.072 -0.029 -0.007 -0.059 -0.033 24 7.78 22 0.9977 -0.094 -0.042 -0.010 -0.084 0.140 0.051 5-50 ARMA(p,q): Mixed Autoregressive-Moving Average Model Zt = δ + φ1Zt−1 + . . . + φpZt−p︸ ︷︷ ︸ + at − θ1at−1 − . . .− θqat−q︸ ︷︷ ︸ AR(p) MA(q) In backshift notation, ARMA(p,q): (1− φ1B − φ2B2 − . . .− φpBp)Zt = δ + (1− θ1B − θ2B2 − . . .− θqBq)at ⇒ (1− θ1B − . . .− θqBq)−1 [(1− φ1B − . . .− φpBp)Zt − δ] = at 5-51 Estimation procedures • need to estimate φl’s, θl’s, and βj ’s • how to deal with initial lag? • several approaches exist – ULS (unconditional least squares): MA(q) & AR(p) − also called nonlinear least squares − minimize SS error – YW (Yule-Walker): AR(p) − generalized least squares using OLS residuals to estimate covariance across observations Invertibility - an underlying assumption here − intuitively, “weights” (φl & θl) on past observations decrease as we move further into the past 5-54 After differencing, AR and MA dependence structures may exist Autoregressive Integrated Moving Average process: ARIMA(p,d,q) • p : AR(p) (value at time t depends on previous p values) • d : # of differences (need to take dth difference to make stationary) • q : MA(q) (value at time t depends on previous q random shocks) 5-55 How to select p and q? How to select d? • usually look at plots of time series • choose lowest d to make stationary (also SAC) ARIMA(p,d,q) is a very flexible family of models ⇒ useful prediction Recall backshift notation: • d = 1 : Zt = Yt − Yt−1 = Yt −BYt = (1−B)Yt • general d: Zt = (1−B)dYt 5-56 Model summary: • model Y in terms of predictors X1, . . . , Xk−1, with ARIMA(p,d,q) dependence structure • But in what order does SAS do this? (1−B)d︸ ︷︷ ︸ Yt = β0 + β1Xt,1 + . . . + βk−1Xt,k−1︸ ︷︷ ︸ Differencing Linear Model + (1− φ1B − . . .− φpBp)−1︸ ︷︷ ︸ (1− θ1B − . . .− θqBq)︸ ︷︷ ︸ at, Autoregressive Moving Average at iid N(0, σ2)︸ ︷︷ ︸ Independence (given p, d, and q, SAS estimates βj ’s, φl’s, and θl’s) 5-59 5.5 Forecasting & Goodness of Fit (1−B)dYt = β0 + β1Xt,1 + . . . + βk−1Xt,k−1 +(1− φ1B − . . .− φpBp)−1 (1− θ1B − . . .− θqBq) at, at iid N(0, σ2) ARIMA(p,d,q) model rewritten, with t = 1, . . . , n: Yt = g1(Y1, . . . , Yt−1) + g2(Xt,1, . . . , Xt,k−1) + g3(a1, . . . , at) where g1 = linear combination (LC) of previous observations (Differencing) g2 = LC of predictors at time t, in terms of parameters βj (Linear Model) g3 = function of random shocks in terms of parameters φl & θl (AR & MA dependence structures) “fit model” → estimates & standard errors for βj ’s, φl’s, & θl’s 5-60 Predicted values (point forecast from Box-Jenkins model; even for times t > n): Ŷt = g1(Y1, . . . , Yt−1)︸ ︷︷ ︸ + ĝ2(Xt,1, . . . , Xt,k−1)︸ ︷︷ ︸ + ĝ3(â1, . . . , ât)︸ ︷︷ ︸ Estimate Yl with Estimate βj with bj Estimate φl & θl Ŷl if no obs. with φ̂l & θ̂l at time l (l > n) Note: ât = 0, âl = Yl − Ŷl for l < t, and âl = 0 for l > n Multicollinearity? • “predictors” Y1, . . . , Yt−1, Xt,1, . . . , Xt,k−1 related? • need diagnostics for “goodness of fit” 5-61 Measure of “Overall Fit”: Standard Error S = √∑n 1 (Yt − Ŷt)2 n− np , np = # parameters in model In SAS: Std Error Estimate; smaller S means ... Diagnostic Checking: Ljung-Box statistic • Residuals reflect model assumptions • Check “adequacy” of overall Box-Jenkins model (for these data) • In SAS, look at lag 6 χ2 for Autocorrelation Check of Residuals − what is H0? 5-64 General SAS code for ARIMA(p,d,q), Y in terms of X1, . . . , Xk−1: proc arima data = a1; identify var = Y (d ) crosscorr = (X1 . . . Xk−1) ; estimate p = p q = q input = (X1 . . . Xk−1) method = uls plot; forecast lead = L alpha = a noprint out = fout; run; option description d, p, q differencing, AR, & MA settings (as before) plot adds RSAC & RSPAC plots L # times after last observed to forecast a set confidence limit; a = .10 ⇒ 90% conf. limits noprint optional, suppresses output out = fout optional, sends forecast data to fout data set For a = .10, data set fout will contain columns / variables: Y, forecast, std, l90, u90, residual (what about time or X1 . . . Xk−1?) 5-65 Summary - choosing a “good” model (choice of p, d, & q) • RSAC & RSPAC die down quickly (should have “nothing” left) • small standard error (S) • small Ljung-Box statistic (Q∗) • “tight” or narrow confidence / prediction / forecasting intervals – how far into “future”? (t = n + τ , τ > 0) – good summary / comparison plot: overlay forecast and confidence limits (sketch) 5-66 Example 5.3.4: gas prices since 1976, revisited Gas Price Data Gas prices: first differences 300 60 280 40 260 20 go 0 3B 220 8 £ 200 6 —20 > 180 = —40 ° ° 160 ~% —60 & 140 i _g9 o 420 100 100 ~ 80 —120 60 —140 1974 1980 1986 1992 1998 2004 2010 1974 1983 1992 2001 2010 year year 5-69 Look at behavior of SAC and SPAC (after removing time effects) The ARIMA Procedure Name of Variable = resid Partial Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.44423 | . |********* | 2 0.06458 | . |* . | 3 -0.13722 | . ***| . | 4 -0.15015 | . ***| . | 5 -0.05210 | . *| . | 6 -0.04653 | . *| . | 7 -0.00524 | . | . | 8 -0.09845 | . **| . | 9 -0.14026 | . ***| . | 10 0.05924 | . |* . | 11 0.00153 | . | . | 12 -0.17891 | . ****| . | 5-70 Tentative model: ARIMA(1,0,0) The ARIMA Procedure Unconditional Least Squares Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag Variable Shift MU 88.38384 32.51518 2.72 0.0108 0 price 0 AR1,1 0.60015 0.16145 3.72 0.0008 1 price 0 NUM1 -0.82421 4.34859 -0.19 0.8510 0 year1 0 NUM2 0.14764 0.12170 1.21 0.2345 0 year2 0 Std Error Estimate 28.80927 Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ---------------Autocorrelations--------------- 6 2.01 5 0.8475 0.042 0.023 -0.050 -0.134 -0.102 -0.123 12 3.35 11 0.9853 0.012 0.020 -0.143 -0.022 0.072 -0.014 18 4.10 17 0.9994 -0.069 -0.062 0.003 0.019 -0.035 -0.041 24 8.18 23 0.9981 -0.095 -0.034 -0.005 -0.020 0.154 0.068 So what is the “fitted” model equation? 5-71 Tentative model: ARIMA(1,0,0) Autocorrelation Plot of Residuals Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 829.974 1.00000 | |********************| 1 34.933641 0.04209 | . |* . | 2 19.057756 0.02296 | . | . | 3 -41.169483 -.04960 | . *| . | 4 -111.339 -.13415 | . ***| . | 5 -84.280608 -.10155 | . **| . | 6 -101.921 -.12280 | . **| . | 7 9.634734 0.01161 | . | . | 8 16.400616 0.01976 | . | . | 9 -118.340 -.14258 | . ***| . | 10 -17.889676 -.02155 | . | . | 11 60.020632 0.07232 | . |* . | 12 -11.942645 -.01439 | . | . | Partial Autocorrelations Lag Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 1 0.04209 | . |* . | 2 0.02123 | . | . | 3 -0.05156 | . *| . | 4 -0.13106 | . ***| . | 5 -0.09089 | . **| . | 6 -0.11632 | . **| . | 7 0.00850 | . | . | 8 -0.00319 | . | . | 9 -0.18734 | . ****| . | 10 -0.06088 | . *| . | 11 0.06114 | . |* . | 12 -0.04792 | . *| . | 5-74 Forecasting equation for ARIMA(1,1,1) with one covariate: (1−B)Yt = (β0 + β1year1t) + (1− φ1B)−1 (1− θ1B) at (1− φ1B) (1−B)Yt = (1− φ1B) (β0 + β1year1t) + (1− θ1B) at ( 1− (1 + φ1)B + φ1B2 ) Yt = β0 (1− φ1) + β1 (1− φ1B) year1t + (1− θ1B) at Yt − (1 + φ1)Yt−1 + φ1Yt−2 = β0 (1− φ1) + β1 (year1t − φ1year1t−1) + at − θ1at−1 Ŷt = β̂0 ( 1− φ̂1 ) + β̂1 ( year1t − φ̂1year1t−1 ) +ât − θ̂1ât−1 +(1 + φ̂1)Yt−1 − φ̂1Yt−2 Ŷ2010 = . . . 5-75 5.6 Seasonal Modeling Recall “estimating out” cyclic trends − add time-related variables as predictors − make time series stationary Occasionally, even after using these regression methods, a “seasonal effect” remains − correlation / dependence among residuals at seasonal level − detectable using SAC and SPAC plots − determine appropriate error structure based on how plots “die down” What if SAC & SPAC plots don’t die down, but have a recurring pattern? − e.g., spikes at lags L, 2L, 3L, . . . − seasonal time series - “seasons” of length L observations 5-76 What to do? First, try using L− 1 dummy predictors (most interpretable model) Otherwise, consider Box-Jenkins seasonal models: 1. Seasonal moving average model of order q: − SAC spikes (and SPAC dies down) at lags L, 2L, . . ., qL Zt = δ + at − θ1,Lat−L − θ2,Lat−2L − . . .− θq,Lat−qL SAS: estimate q = (L, 2L, ..., qL); 2. Seasonal autoregressive model of order p: − SAC dies down (and SPAC spikes) at lags L, 2L, . . ., pL Zt = δ + φ1,LZt−L + φ2,LZt−2L + . . . + φp,LZt−pL SAS: estimate p = (L, 2L, ..., pL); 5-79General Box-Jenkins model of order (p, P, q,Q): φp(B)φP (BL)Zt = δ + θq(B)θQ(BL)at where φp(B) = (1− φ1B − φ2B − . . .− φpBp) φP (BL) = (1− φ1,LBL − φ2,LB2L − . . .− φP,LBPL) Zt = ∆DL ∆ dY ∗t δ = µφp(B)φP (BL) , µ = E[Zt] θq(B) = (1− θ1B − θ2B2 − . . .− θqBq) θQ(BL) = (1− θ1,LBL − θ2,LB2L − . . .− θQ,LBQL) Zt is stationary time series φ1, . . . , φp, φ1,L, . . . , φP,L, δ, θ1, . . . , θq, θ1,L, . . . , θQ,L are unknown parameters to be estimated from the data at, at−1, . . . are iid N(0, σ2) (independent and identically distributed) 5-80 5.0 Summary, revisited Response Y collected in some sequential manner: time, space Want to make useful forecasts (short-term predictions) Want to understand what influences Y : • the “obvious” effects – recurring patterns in Y – effect of other variables (X1, . . . , Xk−1) on Y • the less “obvious”: dependence among observations – previous values (autoregressive, AR(p)) – previous errors (moving average, MA(q)) – both (ARMA(p, q)) 5-81 Box-Jenkins (ARIMA) models: • account for dependence structures • for useful forecasts, meet model assumptions (stationarity) – add dummy vars., transform response, differencing • graphical diagnostics (SAC & SPAC) to tentatively identify appropriate model (ARIMA) structure • graphical (RSAC & RSPAC) and numerical (Q∗ & S) diagnostics to assess model adequacy • make forecasts (point & interval) with “adequate” model • may need to consider seasonal models (based on SAC & SPAC, or RSAC & RSPAC) Now – a case study
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved