This is the overview of the chapter 4. I will add info from the index file here. I will filter from there to what will go here. My goal is to first have the website have a map of the concepts in time series. I want to first avoid doing it by chapters since it seperates topics and limit my learning to that lesson. Bigger picture first.
Adding this from homework_4_0 chapter notes.qmd file in old repo. This are notes intended to connect time series concepts
Code
# Load necessary library# Define the checklist tablechecklist_data <-data.frame(Topic =c("Trend", "Seasonality", "Stationarity", "White Noise", "Serial Correlation", "Drift"),Description =c("Presence of an upward or downward movement over time.","Repeating patterns at regular intervals.","Mean, variance, and autocorrelation are constant over time.","Residuals are independent, identically distributed with mean zero.","Past values are correlated with current values.","Consistent positive or negative change over time." ),White_Noise =c("", "", "✔", "✔", "", ""),Random_Walk =c("", "", "", "", "✔", ""),RW_with_Drift =c("", "", "", "", "✔", "✔"),AR_p =c("✔", "", "✔", "", "✔", ""),ARIMA =c("", "", "✔", "", "✔", ""),Holt_Winters =c("✔", "✔", "✔", "", "✔", ""))# Render the tablekable(checklist_data, format ="html", escape = F) %>%kable_styling(full_width =FALSE, bootstrap_options =c("striped", "hover"))
Topic
Description
White_Noise
Random_Walk
RW_with_Drift
AR_p
ARIMA
Holt_Winters
Trend
Presence of an upward or downward movement over time.
✔
✔
Seasonality
Repeating patterns at regular intervals.
✔
Stationarity
Mean, variance, and autocorrelation are constant over time.
✔
✔
✔
✔
White Noise
Residuals are independent, identically distributed with mean zero.
✔
Serial Correlation
Past values are correlated with current values.
✔
✔
✔
✔
✔
Drift
Consistent positive or negative change over time.
✔
Time Series Model Checklist
Topic
Description
White Noise
Random Walk
RW with Drift
AR(p)
ARIMA
Holt-Winters
Trend
Presence of an upward or downward movement over time.
✔
✔
Seasonality
Repeating patterns at regular intervals.
✔
Stationarity
Mean, variance, and autocorrelation are constant over time.
✔
✔
✔
✔
White Noise
Residuals are independent, identically distributed with mean zero.
✔
Serial Correlation
Past values are correlated with current values.
✔
✔
✔
✔
✔
Drift
Consistent positive or negative change over time.
✔
spacer
Code
temps_ts <- rio::import("https://byuistats.github.io/timeseries/data/global_temparature.csv") |>as_tsibble(index = year)temps_ts |>autoplot(.vars = change) +labs(x ="Year",y ="Temperature Change (Celsius)",title =paste0("Change in Mean Annual Global Temperature (", min(temps_ts$year), "-", max(temps_ts$year), ")") ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5) )
Code
# second r chunkpacf(temps_ts$change)
Code
# 3rd r chunkglobal_ar <- temps_ts |>model(AR(change ~order(1:9)))tidy(global_ar)
# 5th r chunktemps_forecast <- global_ar |>forecast(h ="50 years")temps_forecast |>autoplot(temps_ts, level =95) +geom_line(aes(y = .fitted, color ="Fitted"),data =augment(global_ar)) +scale_color_discrete(name ="") +labs(x ="Year",y ="Temperature Change (Celsius)",title =paste0("Change in Mean Annual Global Temperature (", min(temps_ts$year), "-", max(temps_ts$year), ")"),subtitle =paste0("50-Year Forecast Based on our AR(", tidy(global_ar) |>as_tibble() |> dplyr::select(term) |>tail(1) |> stringr::str_sub(1), ") Model") ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5),plot.subtitle =element_text(hjust =0.5) )
Code
# Import and prepare the datatemps_ts <- rio::import("https://byuistats.github.io/timeseries/data/global_temparature.csv") |>as_tsibble(index = year)# Fit an AR model to the 'change' variablear_fit <-ar(temps_ts$change, method ="mle", na.action = na.omit)# Extract and display the order and coefficientsorder <- ar_fit$ordercoefficients <- ar_fit$arcat("Order of the fitted AR model: ", order, "\n")
Order of the fitted AR model: 4
Code
cat("Coefficients of the AR model: ", coefficients, "\n")
Coefficients of the AR model: 0.6770191 -0.03582926 0.1560533 0.1929323
Code
# Visualization of the original datatemps_ts |>autoplot(.vars = change) +labs(x ="Year",y ="Temperature Change (Celsius)",title =paste0("Change in Mean Annual Global Temperature (", min(temps_ts$year), "-", max(temps_ts$year), ")") ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5) )
Book section
4.5 Autoregressive models
4.5.1 defition
The series \(\{x_t\}\) is an autoregressive process of order \(p\), abbreviated to AR(\(p\)), if
where \(\{w_t\}\) is white noise and the \(\alpha_i\) are the model parameters with \(\alpha_p \neq 0\) for an order \(p\) process.Equation (4.15) can be expressed as a polynomial of order \(p\) in terms of the backward shift operator (e4.16):
\[
\theta_p(B)x_t = (1 - \alpha_1 B - \alpha_2 B^2 - \dots - \alpha_p B^p)x_t = w_t \tag{e4.16}
\] The following points should be noted:
The random walk is the special case AR(1) with \(\alpha_1 = 1\) (see Equation (4.4)).
The exponential smoothing model is the special case \(\alpha_i = \alpha(1 - \alpha)^i\) for \(i = 1, 2, \dots\) and \(p \to \infty\).
The model is a regression of \(x_t\) on past terms from the same series; hence the use of the term “autoregressive.”
The model parameters can be estimated by minimizing the sum of squared errors.
4.5.2 Stationary and non-stationary AR processes
The equation \(\theta_p(B) = 0\), where \(B\) is formally treated as a number (real or complex), is called the characteristic equation. The roots of the characteristic equation (i.e., the polynomial \(\theta_p(B)\) from Equation (4.16)) must all exceed unity in absolute value for the process to be stationary. Notice that the random walk has \(\theta = 1 - B\) with root \(B = 1\) and is non-stationary. The following four examples illustrate the procedure for determining whether an AR process is stationary or non-stationary:
The AR(1) model \(x_t = \frac{1}{2}x_{t-1} + w_t\) is stationary because the root of \(1 - \frac{1}{2}B = 0\) is \(B = 2\), which is greater than 1.
The AR(2) model \(x_t = x_{t-1} - \frac{1}{4}x_{t-2} + w_t\) is stationary. The proof of this result is obtained by first expressing the model in terms of the backward shift operator: \(\frac{1}{4}(B^2 - 4B + 4)x_t = w_t\) i.e., \(\frac{1}{4}(B - 2)^2x_t = w_t\). The roots of the polynomial are given by solving \(\theta(B) = \frac{1}{4}(B - 2)^2 = 0\) and are therefore obtained as \(B = 2\). As the roots are greater than unity, this AR(2) model is stationary.
The model \(x_t = \frac{1}{2}x_{t-1} + \frac{1}{2}x_{t-2} + w_t\) is non-stationary because one of the roots is unity. To prove this, first express the model in terms of the backward shift operator: \(-\frac{1}{2}(B^2 + B - 2)x_t = w_t\); i.e., \(-\frac{1}{2}(B - 1)(B + 2)x_t = w_t.\) The polynomial \(\theta(B) = -\frac{1}{2}(B - 1)(B + 2)\) has roots \(B = 1, -2\). As there is a unit root (\(B = 1\)), the model is non-stationary. Note that the other root (\(B = -2\)) exceeds unity in absolute value, so only the presence of the unit root makes this process non-stationary.
The AR(2) model \(x_t = -\frac{1}{4}x_{t-2} + w_t\) is stationary because the roots of \(1 + \frac{1}{4}B^2 = 0\) are \(B = \pm 2i\), which are complex numbers with \(i = \sqrt{-1}\), each having an absolute value of 2, exceeding unity.
The R function polyroot finds zeros of polynomials and can be used to find the roots of the characteristic equation to check for stationarity.
4.5.3 Second-order properties of an AR(1) model
From Equation (4.15), the AR(1) process is given by
\[
x_t = \alpha x_{t-1} + w_t
\tag{e4.18}
\]
where \(\{w_t\}\) is a white noise series with mean zero and variance \(\sigma^2\). It can be shown (§4.5.4) that the second-order properties follow as:
where \(|\alpha| < 1\). Thus, the correlogram decays to zero more rapidly for small \(\alpha\). The following example gives two correlograms for positive and negative values of \(\alpha\), respectively (Fig. 4.11):
Fig. 4.11. Example correlograms for two autoregressive models:
Try experimenting using other values for \(\alpha\). For example, use a small value of \(\alpha\) to observe a more rapid decay to zero in the correlogram.
4.5.6 Partial Autocorrelation
From Equation (4.21), the autocorrelations are non-zero for all lags even though in the underlying model \(x_t\) only depends on the previous value \(x_{t-1}\) (Equation (4.18)). The partial autocorrelation at lag \(k\) is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. For example, the partial autocorrelation of an AR(1) process will be zero for all lags greater than 1. In general, the partial autocorrelation at lag \(k\) is the \(k\)th coefficient of a fitted AR(\(k\)) model; if the underlying process is AR(\(p\)), then the coefficients \(\alpha_k\) will be zero for all \(k > p\). Thus, an AR(\(p\)) process has a correlogram of partial autocorrelations that is zero after lag \(p\). Hence, a plot of the estimated partial autocorrelations can be useful when determining the order of a suitable AR process for a time series. In R, the function pacf can be used to calculate the partial autocorrelations of a time series and produce a plot of the partial autocorrelations against lag (the “partial correlogram”).
4.5.7 Simulation
An AR(1) process can be simulated in R as follows:
> set.seed(1)
> x <- w <- rnorm(100)
> for (t in 2:100) x[t] <- 0.7 * x[t- 1] + w[t]
> plot(x, type = "l")
> acf(x)
> pacf(x)
Fig. 4.12. A simulated AR(1) process, \(x_t = 0.7x_{t-1} + w_t\). Note that in the partial correlogram (c) only the first lag is significant, which is usually the case when the underlying process is AR(1).
Code
set.seed(1)x <- w <-rnorm(100)for (t in2:100) x[t] <-0.7* x[t -1] + w[t]# Custom titlesplot_title <-"fig. 12a Time Series Plot"x_axis_title <-"Time"# Plot with custom titlesplot(x, type ="l", main = plot_title, xlab = x_axis_title)
Code
acf(x, main ="fig 12b Correlogram: Sample correlation against lag")
Code
pacf(x, main ="fig 12c Partial correlogram: Sample partial correlation against lag")
The resulting plots of the simulated data are shown in Figure 4.12 and give one possible realisation of the model. The partial correlogram has no significant correlations except the value at lag 1, as expected (Fig. 4.12c – note that the pacf starts at lag 1, whilst the acf starts at lag 0).The difference between the correlogram of the underlying model (Fig. 4.11a) and the sample correlogram of the simulated series (Fig. 4.12b) shows discrepancies that have arisen due to sampling variation. Try repeating the commands above several times to obtain a range of possible sample correlograms for an AR(1) process with underlying parameter \(\alpha = 0.7\).
You are asked to investigate an AR(2) process in Exercise 4.
4.6 Fitted models
4.6.1 Model fitted to simulated series
An AR(p) model can be fitted to data in R using the ar function. In the code below, the autoregressive model x.ar is fitted to the simulated series of the last section and an approximate 95% confidence interval for the underlying parameter is given, where the (asymptotic) variance of the parameter estimate is extracted using x.ar$asy.var:
The method “mle” used in the fitting procedure above is based on maximizing the likelihood function (the probability of obtaining the data given the model) with respect to the unknown parameters. The order \(p\) of the process is chosen using the Akaike Information Criterion (AIC; Akaike, 1974), which penalizes models with too many parameters:
In the function ar, the model with the smallest AIC is selected as the best-fitting AR model. Note that, in the code above, the correct order (\(p = 1\)) of the underlying process is recovered.
The parameter estimate for the fitted AR(1) model is \(\hat{\alpha} = 0.60\). Whilst this is smaller than the underlying model value of \(\alpha = 0.7\), the approximate 95% confidence interval does contain the value of the model parameter as expected, giving us no reason to doubt the implementation of the model.
4.6.2 Exchange rate series: Fitted AR model
An AR(1) model is fitted to the exchange rate series, and the upper bound of the confidence interval for the parameter includes 1. This indicates that there would not be sufficient evidence to reject the hypothesis \(\alpha = 1\), which is consistent with the earlier conclusion that a random walk provides a good approximation for this series.
However, simulated data from models with values of \(\alpha > 1\), formally included in the confidence interval below, exhibit exponentially unstable behavior and are not credible models for the New Zealand exchange rate.
Figure 4.13 Fig. 4.13. The correlogram of residual series for the AR(1) model fitted to the exchange rate data.
Code
Z <-read.table("../data/pounds_nz.dat", header = T)Z.ts <-ts(Z, st =1991, fr =4)Z.ar <-ar(Z.ts)mean(Z.ts)
[1] 2.823251
Code
Z.ar$order
[1] 1
Code
Z.ar$ar
[1] 0.890261
Code
Z.ar$ar +c(-2, 2) *sqrt(Z.ar$asy.var)[1]
[1] 0.7405097 1.0400123
Code
acf(Z.ar$res[-1])
In the code above, a “−1” is used in the vector of residuals to remove the first item from the residual series (Fig. 4.13). For a fitted AR(1) model, the first item has no predicted value because there is no observation at \(t = 0\); in general, the first \(p\) values will be “not available” (NA) in the residual series of a fitted AR(\(p\)) model.
By default, the mean is subtracted before the parameters are estimated, so a predicted value \(\hat{z}_t\) at time \(t\) based on the output above is given by
The global temperature series was introduced in §1.4.5, where it was apparent that the data exhibited an increasing trend after 1970, which may be due to the “greenhouse effect.” Sceptics may claim that the apparent increasing trend can be dismissed as a transient stochastic phenomenon. For their claim to be consistent with the time series data, it should be possible to model the trend without the use of deterministic functions.
Consider the following AR model fitted to the mean annual temperature series (figure 4.14):
Code
# resource code 4.6.3Global =scan("../data/global.dat")Global.ts =ts(Global, st =c(1856, 1), end =c(2005, 12),fr =12)Global.ar <-ar(aggregate(Global.ts, FUN = mean), method ="mle")mean(aggregate(Global.ts, FUN = mean))
[1] -0.1382628
Code
Global.ar$order
[1] 4
Code
Global.ar$ar
[1] 0.58762026 0.01260254 0.11116731 0.26763656
Code
acf(Global.ar$res[-(1:Global.ar$order)], lag =50)
figure Fig. 4.14. The correlogram of the residual series for the AR(4) model fitted to the annual global temperature series. The correlogram is approximately white noise so that, in the absence of further information, a simple stochastic model can ‘explain’ the correlation and trends in the series.
Based on the output above, a predicted mean annual temperature \(\hat{x}_t\) at time \(t\) is given by
The correlogram of the residuals has only one (marginally) significant value at lag 27, so the underlying residual series could be white noise (Fig. 4.14). Thus, the fitted AR(4) model (Equation (e4.24)) provides a good fit to the data.
As the AR model has no deterministic trend component, the trends in the data can be explained by serial correlation and random variation, implying that it is possible that these trends are stochastic (or could arise from a purely stochastic process).
Again, we emphasize that this does not imply that there is no underlying reason for the trends. If a valid scientific explanation is known, such as a link with the increased use of fossil fuels, then this information would clearly need to be included in any future forecasts of the series.