Functions & Formulas
Using Simple Math
Chapter 1
test
e1.1
\[ \bar{x} = \sum x_i / n \]
e1.2 Additive Decomposition Model
\[ x_t = m_t + s_t + z_t \]
where, at time \(t\), \(x_t\) is the observed series, \(m_t\) is the trend, \(s_t\) is the seasonal effect, and \(z_t\) (or \(e_t\)) is an error term that is, in general, a sequence of correlated random variables with mean zero.
e1.3 Multiplicative Decomposition Model
\[ x_t = m_t* s_t + z_t \]
e1.4 log Additive Decomposition Model
If the random variation is modelled by a multiplicative factor and the variable is positive, an additive decomposition model for \(log(x_t)\) can be used (To be consistent with R, we use log for the natural logarithm, which is often written ln.):
\[ log(x_t) = m_t + s_t + z_t \] Some care is required when the exponential function is applied to the predicted mean of \(log(x_t)\) to obtain a prediction for the mean value \(x_t\), as the effect is usually to bias the predictions. If the random series \(z_t\) are normally distributed with mean 0 and variance \(\sigma^2\), then the predicted mean value at time \(t\) based on Equation (1.4) is given by (e1.5)
e1.5
\[ \hat{x}_t = e^{m_t + s_t} + e^{\frac{1}{2}\sigma^2} \]
However, if the error series is not normally distributed and is negatively skewed, as is often the case after taking logarithms, the bias correction factor will be an overcorrection (Exercise 4) and it is preferable to apply an empirical adjustment (which is discussed further in Chapter 5).
e1.6 Centred Moving Average
\[ \hat{m}_t =\frac{\frac{1}{2}x_{t-6} + x_{t-5} +...+ x_{t-1} +...+ x_{t+5} + \frac{1}{2}x_{t+6}} {12} \]
where t=7,…,n−6.The coefficients in Equation(1.6) for each month are 1/12(or sum to 1/12 in the case of the first and last coefficients),so that equal weight is given to each month and the coefficients sum to 1. By using the seasonal frequency for the coefficients in the moving average, the procedure generalises for any seasonal frequency (e.g., quarterlyseries), provided the condition that the coefficients sum to unity is still met.
e1.7 Estimate of Monthly Additive Effect (\(s_t\))
at time \(t\) can be obtained by subtracting \(\hat{m}_t\):
\[ \hat{s}_t = x_t - \hat{m}_t \]
By averaging these estimates of the monthly effects for each month, we obtain a single estimate of the effect for each month. If the period of the time series is a whole number of years, the number of monthly effects averaged for each month is one less than the number of years of record. At this stage, the twelve monthly additive components should have an average value close to, but not usually exactly equal to, zero. It is usual to adjust them by subtracting this mean so that they do average zero.
e1.7.5 Estimate of Monthly Multiplicative Effect (\(s_t\))
If the monthly effect is multiplicative, the estimate is given by division;
\[ \hat{s}_t = \frac {x_t}{\hat{m}_t} \]
It is usual to adjust monthly multiplicative factors so that they average unity. The procedure generalises, using the same principle, to any seasonal frequency.
e1.7.6 Seasonally Adjusted Series (additive & Multiplicative)
It is common to present economic indicators, such as unemployment percentages, as seasonally adjusted series. This highlights any trend that might otherwise be masked by seasonal variation attributable, for instance, to the end of the academic year, when school and university leavers are seeking work. If the seasonal effect is additive, a seasonally adjusted series is given by.
\[ additive = x_t - \bar{s}_t \]
whilst if it is multiplicative, an adjusted series is obtained from
\[ multiplicative = x_t / \bar{s}_t \]
where \(\bar{s}_t\) is the seasonally adjusted mean for the month corresponding to time \(t\).
Chapter 2
e2.1
\[ \gamma(x, y) = E[(x - \mu_x)(y - \mu_y)] \]
e2.2
\[ Cov(x, y) = \sum(x_i - \bar{x})(y_i - \bar{y}) / (n - 1) \]
e2.3
mean((x - mean(x))*(y - mean(y))) \(\to\) \(E[(x - \mu_x)(y - \mu_y)]\)
e2.4
\[ p(x, y) = \frac{E[(x - \mu_x)(y - \mu_y)]}{\sigma_x\sigma_y} = \frac{\gamma(x, y)}{\sigma_x\sigma_y} \]
e2.5
\[ Cor(x, y) = \frac{Cov(x, y)}{sd(x)sd(y)} \]
e2.6
\[ \mu(t) = E[x_t] \]
e2.7
\[ \bar{x} = \sum_{t=1}^n \frac{x_t}{n} \]
e2.8
\[ \lim_{ \to \infty} \frac{\sum x_t}{n} = \mu \]
e2.9
\[ \sigma^2(t) = E[(x_t - \mu)^2] \]
e2.10
\[ Var(x) = \frac{\sum (x_t - \bar{x})^2}{n - 1} \]
e2.11 second-order stationary autocovariance function *acvf**
If a time series model is second-order stationary, we can define an autocovariance function (acvf), \(γ_k\), as a function of the lag \(k\):
\[ \gamma_k = E[(x_t - \mu)(x_{t+k} - \mu)] \]
- Definition:
- \(\gamma_k\): The second-order stationary autocovariance function (acvf) at lag \(k\).
- Quantifies the covariance between \(x_t\) and \(x_{t+k}\), assuming the time series is second-order stationary.
- Components:
- \(x_t, x_{t+k}\): Observations of the time series at time \(t\) and \(t+k\).
- \(\mu\): The mean of the time series, which remains constant for second-order stationary series.
- \(E[\cdot]\): The expectation operator, averaging across all realizations of the process.
- Conditions:
- The time series must be second-order stationary, meaning:
- The mean \(\mu\) is constant over time.
- The variance \(\sigma^2\) is finite and constant.
- The covariance between \(x_t\) and \(x_{t+k}\) depends only on the lag \(k\), not on \(t\).
- The time series must be second-order stationary, meaning:
- Purpose:
- \(\gamma_k\) describes the strength and direction of the linear relationship between \(x_t\) and \(x_{t+k}\) for a stationary series.
- It captures patterns such as trends, cycles, and dependencies across time lags.
- Interpretation:
- \(\gamma_k > 0\): Positive covariance, indicating similar deviations from the mean for \(x_t\) and \(x_{t+k}\).
- \(\gamma_k < 0\): Negative covariance, indicating opposite deviations from the mean for \(x_t\) and \(x_{t+k}\).
- \(\gamma_k = 0\): No linear relationship between \(x_t\) and \(x_{t+k}\) at lag \(k\).
e2.12 The lag k autocorrelation function (acf), \(ρ_k\), is defined by
\[ \rho_k = \frac{\gamma_k}{\sigma^2} \]
- Definition:
- \(\rho_k\): The lag \(k\) autocorrelation function (acf).
- Normalizes the autocovariance \(\gamma_k\) by the variance \(\sigma^2\) of the time series.
- Components:
- Numerator: \(\gamma_k\) is the autocovariance at lag \(k\), which measures the linear dependency between \(x_t\) and \(x_{t+k}\).
- Denominator: \(\sigma^2\) is the constant variance of the time series under the assumption of second-order stationarity.
- Conditions:
- The time series must be second-order stationary, meaning:
- The mean \(\mu\) is constant.
- The variance \(\sigma^2\) is constant.
- The autocovariance \(\gamma_k\) depends only on the lag \(k\), not on \(t\).
- The time series must be second-order stationary, meaning:
- Purpose:
- \(\rho_k\) standardizes \(\gamma_k\) to provide a dimensionless measure of the strength and direction of the relationship between \(x_t\) and \(x_{t+k}\).
- Values of \(\rho_k\) are constrained to the range \([-1, 1]\).
- Interpretation:
- \(\rho_k > 0\): Positive correlation at lag \(k\), indicating that \(x_t\) and \(x_{t+k}\) move in the same direction.
- \(\rho_k < 0\): Negative correlation at lag \(k\), indicating that \(x_t\) and \(x_{t+k}\) move in opposite directions.
- \(\rho_k = 0\): No linear relationship between \(x_t\) and \(x_{t+k}\) at lag \(k\).
e2.13 The sample autocovariance function acvf or sacvf, \(c_k\), is calculated as
\[ c_k = \frac{1}{n} \sum_{t=1}^{n-k} (x_t - \bar{x})(x_{t+k} - \bar{x}) \] (ai compile so need to confirm)
- Definition:
- \(c_k\): The sample autocovariance at lag \(k\).
- Measures the covariance of the time series \(x_t\) with itself, shifted by \(k\) time steps.
- Components:
- Numerator:
- \((x_t - \bar{x})(x_{t+k} - \bar{x})\): Calculates the product of deviations of \(x_t\) and \(x_{t+k}\) from their mean \(\bar{x}\).
- \(\sum_{t=1}^{n-k}\): Sums these products over all valid time indices \(t\) where \(t\) and \(t+k\) are within the range of the data.
- Denominator:
- Divides by the total number of observations, \(n\), to normalize the covariance.
- Numerator:
- Purpose:
- \(c_k\) quantifies the degree of linear relationship (covariance) between observations in the series that are separated by a lag of \(k\) time steps.
- Helps identify patterns like periodicity or trends in autocorrelation.
- Interpretation:
- \(c_k > 0\): Positive covariance, indicating similar deviations at lag \(k\).
- \(c_k < 0\): Negative covariance, indicating opposite deviations at lag \(k\).
- \(c_k = 0\): No linear relationship between \(x_t\) and \(x_{t+k}\) at lag \(k\).
e2.14 The sample autocorrelation function acf is defined as
\[ r_k = \frac{c_k}{c_0} \] (ai compile so need to confirm)
- Definition:
- \(r_k\): The sample autocorrelation at lag \(k\).
- Normalizes the sample autocovariance \(c_k\) using \(c_0\) (the variance of the series).
- Components:
- Numerator: \(c_k\) is the sample autocovariance at lag \(k\), representing the covariance between \(x_t\) and \(x_{t+k}\).
- Denominator: \(c_0\) is the sample autocovariance at lag \(0\), equivalent to the variance of the series (calculated with \(n\) in the denominator).
- Purpose:
- \(r_k\) measures the strength and direction of the linear relationship between observations in the series, separated by lag \(k\).
- Normalization ensures \(r_k\) values lie between \(-1\) and \(1\).
- Interpretation:
- \(r_k > 0\): Positive correlation at lag \(k\), indicating similar trends in \(x_t\) and \(x_{t+k}\).
- \(r_k < 0\): Negative correlation at lag \(k\), indicating opposing trends in \(x_t\) and \(x_{t+k}\).
- \(r_k = 0\): No linear relationship between \(x_t\) and \(x_{t+k}\) at lag \(k\).
e2.15
In subsequent chapters, second-order properties for several time series models are derived using the result shown in Equation (2.15). Let x1,x2,…,xn and y1, y2,…,ym be random variables. Then
\[ Cov( \sum_{i=1}^{n} x_i, \sum_{j=1}^{m} y_j ) = \sum_{i=1}^{n} \sum_{j=1}^{m} Cov(x_i , y_j) \]
where \(Cov(x,y)\) is the covariance between a pair of random variables \(x\) and \(y\). The result tells us that the covariance of two sums of variables is the sum of all possible covariance pairs of the variables. Note that the special case of \(n = m\) and \(x_i = y_i\) (i = 1,…,n) occurs in subsequent chapters for a time series {\(x_t\)}.The proof of Equation (2.15) is left to Exercise 5a
Chapter 3
e3.1 cross covariance function (\(ccvf\)), $_k (x, y), as a function of the lag, k:
\[ \gamma_k (x, y) = E[(x_{t + k} - \mu_x)(y_t - \mu_y)] \]
e3.2
Some textbooks define ccvf with the variable y lagging when k is positive, but we have used the definition that is consistent with R. Whichever way you choose to define the ccvf
\[ \gamma_k (x, y) = \gamma_{- k} (y, x) \]
e3.3 cross-correlation function (\(ccf\))
When we have several variables and wish to refer to the acvf of one rather than the ccvf of a pair, we can write it as, for example, \(γ_k(x,x)\). The lag \(k\) cross-correlation function (\(ccf\)), \(p_k(x,y)\), is defined by
\[ p_k(x, y) = \frac{\gamma_k(x, y)}{\sigma_x\sigma_y} \]
The ccvf and ccf can be estimated from a time series by their sample equivalents.
e3.4 sample cross-covariance function (\(ccvf\)) or (\(sccvf\)) is calculatred as
\[ c_k(x, y) = \frac{1}{n} \sum_{t = 1}^{n - k} (x_{t + k} - \bar{x})(y_t - \bar{y}) \]
The formula for the sample cross-covariance function (SCCVF) quantifies the relationship between two time series \(x_t\) and \(y_t\) at different lags:
- Definition:
- \(c_k(x, y)\): The sample cross-covariance at lag \(k\).
- \(k\): The lag (i.e., the amount by which one time series is shifted relative to the other).
- \(\bar{x}\): The mean of the \(x_t\) time series.
- \(\bar{y}\): The mean of the \(y_t\) time series.
- Components:
- \(x_{t + k}\): The value of the \(x_t\) time series shifted by \(k\) time steps.
- \(y_t\): The value of the \(y_t\) time series at time \(t\).
- \((x_{t + k} - \bar{x})\): The deviation of \(x_{t + k}\) from its mean.
- \((y_t - \bar{y})\): The deviation of \(y_t\) from its mean.
- Summation:
- \(\sum_{t=1}^{n-k}\): The summation is over all overlapping time points, up to \(n-k\) to ensure there are sufficient data points for \(x_{t+k}\).
- Normalization:
- \(\frac{1}{n}\): The normalization factor divides by \(n\), the total number of observations in the time series.
- Purpose:
- Measures how much \(x_t\) (shifted by \(k\)) and \(y_t\) deviate from their respective means in a synchronized way.
- Positive \(c_k(x, y)\): Indicates that \(x_{t+k}\) and \(y_t\) move in the same direction (positive relationship).
- Negative \(c_k(x, y)\): Indicates an inverse relationship.
This function is essential for identifying lagged dependencies between two time series??
e3.5 sample autocorrelation function (\(acf\)) is define as
\[ r_k (x, y) = \frac{c_k(x, y)}{\sqrt{c_0(x, x)c_0(y, y)}} \]
The formula for the sample autocorrelation function (ACF) quantifies the normalized relationship between two time series \(x_t\) and \(y_t\) at lag \(k\):
\[ r_k(x, y) = \frac{c_k(x, y)}{\sqrt{c_0(x, x)c_0(y, y)}} \]
- Definition:
- \(r_k(x, y)\): The sample autocorrelation at lag \(k\).
- \(c_k(x, y)\): The sample cross-covariance at lag \(k\).
- \(c_0(x, x)\): The variance (or zero-lag covariance) of the \(x_t\) time series?
- \(c_0(y, y)\): The variance (or zero-lag covariance) of the \(y_t\) time series?
- Normalization:
- The denominator \(\sqrt{c_0(x, x)c_0(y, y)}\) ensures that \(r_k(x, y)\) is normalized, resulting in values between \(-1\) and \(1\)?
- This normalization allows \(r_k(x, y)\) to measure the strength and direction of the linear relationship between \(x_t\) and \(y_t\) at lag \(k\)?
- Purpose:
- \(r_k(x, y) > 0\): Indicates a positive correlation at lag \(k\).
- \(r_k(x, y) < 0\): Indicates a negative correlation at lag \(k\).
- \(r_k(x, y) = 0\): Indicates no linear relationship at lag \(k\).
This function is widely used to identify and interpret lagged dependencies in time series data.
e3.6 Bass Formula
The Bass formula for the number of people, \(N_t\), who have bought a product at time \(t\) depends on three parameters: the total number of people who eventually buy the product, \(m\); the coefficient of innovation, \(p\); and the coefficient of imitation, \(q\). The Bass formula is
\[ N_{t +1} = N_t + p(m - N_t) + qN_t (m - N_t) / m \]
According to the model, the increase in sales, \(N_{t+1} − N_t\), over the next time period is equal to the sum of a fixed proportion p and a time varying proportion \(q\frac{N_t}{m}\) of people who will eventually buy the product but have not yet done so. The rationale for the model is that initial sales will be to people who are interested in the novelty of the product, whereas later sales will be to people who are drawn to the product after seeing their friends and acquaintances use it. Equation (3.6) is a difference equation and its solution is (e3.7)
e3.7 Bass Formula Solution?
\[ N_t = m\frac{1 - e^{-(p + q)t}}{1 + (q / p)e^{-(p + q)t}} \]
It is easier to verify this result for the continuous-time version of the model.
e3.8 hazard
One interpretation of the Bass model is that the time from product launch until purchase is assumed to have a probability distribution that can be parametrised in terms of \(p\) and \(q\). A plot of sales per time unit against time is obtained by multiplying the probability density by the number of people, \(m\), who eventually buy the product. Let \(f(t)\), \(F(t)\), and \(h(t)\) be the density, cumulative distribution function (\(cdf\)), and hazard, respectively, of the distribution of time until purchase. The definition of the hazard …
the definition of the hazard is
\[ h(t) = \frac{f(t)}{1 - F(t)} \]
The interpretation of the hazard is that if it is multiplied by a small time increment it gives the probability that a random purchaser who has not yet made the purchase will do so in the next small time increment (Exercise 2). Then the continuous time model of the Bass formula can be expressed in terms of the hazard (e3.9):
e3.9 Continues time model of the Bass formula can be expressed in terms of the hazard
\[ h(t) = p + qF(t) \]
Equation (3.6) is the discrete form of Equation (3.9) (Exercise 2). The solution of Equation (3.8), with \(h(t)\) given by Equation (3.9), for \(F(t)\) is (e3.10)
e3.10
\[ F(t) = \frac {1 - e^{-(p+q)t}}{1 + (q/p) e^{-(p+q)t}} \]
Two special cases of the distribution are the exponential distribution and logistic distribution, which arise when \(q = 0\) and \(p = 0\), respectively. The logistic distribution closely resembles the normal distribution (Exercise 3). Cumulative sales are given by the product of \(m\) and \(F(t)\). The pdf is the derivative of Equation (3.10):
e3.11
\[ f(t) = \frac{(p + q)^2 e^{-(p+q)t}}{p[1 + (q/p)e^{-(p+q)t}]^2} \]
Sales per unit time at time \(t\) are (e3.12)
e3.12
\[ S(t) = mf(t) = \frac{m(p + q)^2 e^{-(p + q)t}} {p [ 1 + (q/p)e^{-(p+q)t}]^2} \]
The time to peak is (e3.13)
e3.13 The time to peak is
\[ t_{peak} = \frac {log(q) - log(p)} {p + q} \]
e3.14 Forecasting Sales
3.4.1 Exponential smoothing:
Our objective is to predict some future value \(x_{n+k}\) given a past history \({x1,x2,...,xn}\) of observations up to time \(n\). In this subsection we assume there is no systematic trend or seasonal effects in the process, or that these have been identified and removed. The mean of the process can change from one time step to the next, but we have no information about the likely direction of these changes. A typical application is forecasting sales of a well-established product in a stable market. The model is
\[ x_t = \mu_t + w_t \]
where \(\mu_t\) is the non-stationary mean of the process at time \(t\) and \(w_t\) are independent random deviations with a mean of 0 and a standard deviation \(\sigma\). We will follow the notation in R and let at be our estimate of \(\mu_t\). Given that there is no systematic trend, an intuitively reasonable estimate of the mean at time \(t\) is given by a weighted average of our observation at time \(t\) and our estimate of the mean at time \(t−1\) (e3.15):
e3.15 exponentially weighted moving average (\(EWMA\))
\[ a_t = \alpha x_t + (1 - \alpha)a_{t-1} \qquad \qquad 0 < \alpha < 1 \]
The \(a_t\) in Equation (3.15) is the exponentially weighted moving average (\(EWMA\)) at time \(t\). The value of \(\alpha\) determines the amount of smoothing, and it is referred to as the smoothing parameter. If \(\alpha\) is near 1, there is little smoothing and \(a_t\) is approximately \(x_t\). This would only be appropriate if the changes in the mean level were expected to be large by comparison with \(\sigma\). At the other extreme, a value of \(\alpha\) near \(0\) gives highly smoothed estimates of the mean level and takes little account of the most recent observation. This would only be appropriate if the changes in the mean level were expected to be small compared with \(\sigma\). A typical compromise figure for \(\alpha\) is 0.2 since in practice we usually expect that the change in the mean between time \(t − 1\) and time \(t\) is likely to be smaller than \(\sigma\). Alternatively, R can provide an estimate for \(\alpha\), and we discuss this option below. Since we have assumed that there is no systematic trend and that there are no seasonal effects, forecasts made at time \(n\) for any lead time are just the estimated mean at time \(n\). The forecasting equation is
Chapter 4
e4.1 Residual error
A residual error is the difference between the observed value and the model predicted value at time t. If we suppose the model is defined for the variable \(y_t\) and \(\hat{y}_t\) is the value predicted by the model, the residual error \(x_t\) is
\[ x_t = y_t - \hat{y}_t \tag{e4.1} \] As the residual errors occur in time, they form a time series: x1,x2,…,xn.
e4.2 Second-Order properties and the correlogram
The second-order properties of a white noise series {\(w_t\)} are an immediate consequence of the definition in §4.2.2. However, as they are needed so often in the derivation of the second-order properties for more complex models, we explicitly state them here:
If \(\{w_t\}_{t=1}^n\) is a DWN time series, then the population has the following properties.
\[ \left.\begin{matrix} \mu_w = 0 \\ cov(w_t, w_{t+k}) = \begin{cases} \sigma^2 \quad if & k = 0 \\ 0 \quad if & k \ne 0 \end{cases} \\ \end{matrix}\right\} \tag{e4.2} \]
The correlation (autocorrelation) function is therefore (e4.3)
e4.3
\[ \rho_k = \begin{cases} 1, & if \quad k = 0 \\ 0, & if \quad k \ne 0 \end{cases} \tag{e4.3} \]
e4.4 Random Walk
let {\(x_t\)} be a time series. Then {\(x_t\)} is a random walk if
\[ x_t = x_{t-1} + w_t \tag{e4.4} \]
where {\(w_t\)} is a white noise series.
e4.5 Back substitution??
Substituting \(x_{t-1} = x_{t-2} + w_{t-1}\) in equation (e4.4) and then substituting for \(x_{t-2}\), followed by \(x_{t-3}\) and so on (a process known as ‘back substitution’) gives (e4.5):
\[ x_t = w_t + w_{t-1} + w_{t-2} + ... \tag{e4.5} \]
in practice, the series above will not be infinite but will start at some time t = 1. Hence, (e4.6):
e4.6
\[ x_t = w_1 + w_2 + ...+ w_t \tag{e4.6} \]
e4.7 Backward shift operator \(B\) is defined by
\[ Bx_t = x_{t-1} \tag{e4.7} \]
The backward shift operator is sometimes called the ‘lag operator’. By repeatedly applying B, it follows that
e4.8 Lag Operator
\[ B^n x_t = x_{t-n} \tag{e4.8} \]
using \(B\), Equation e4.4 can be rewritten as
\[ x_t = Bx_t + w_t \Rightarrow x_t = (1 - B)^{-1} w_t \] \[ \Rightarrow x_t = (1 + B + B^2 + ...) w_t \Rightarrow x_t = w_t + w_{t-1} + w_{t-2} + ... \]
and equation e4.5 is recovered.
e4.9 Second-Order properties of a Random Walk follow as
\[ \left.\begin{matrix} \mu_x = 0 \\ \gamma_k(t) = Cov(x_t, x_{t+k}) = t\sigma^2 \\ \end{matrix}\right\} \tag{e4.9} \]
The covariance is a function of time, so the process is non-stationary. In particular, the variance is \(t\sigma^2\) and so it increases without limit as \(t\) increases. It follows that a random walk is only suitable for short term predictions.
e4.10 Time-Varying autocorrelation function
The time-varying autocorrelation function for \(k > 0\)follows from Equation (e4.9) as
\[ p_k(t) = \frac {Cov(x_t, x_{t+k})}{\sqrt{Var(x_t)Var(x_{t+k})}} = \frac{t\sigma^2}{\sqrt{t\sigma^2 (t + k)\sigma^2}} = \frac{1}{\sqrt{1 + k/t}} \tag{e4.10} \] so that, for large \(t\) with \(k\) considerably less than \(t\), \(ρ_k\) is nearly 1. Hence, the correlogram for a random walk is characterised by positive autocorrelations that decay very slowly down from unity. This is demonstrated by simulation in §4.3.7.
e4.10.5 Derivation of second-order properties
Equation (e4.6) is a finite sum of white noise terms, each with zero mean and variance \(\sigma^2\). Hence, the mean of \(x_t\) is zero (Equation (e4.9)). The autocovariance in Equation (e4.9) can be derived using Equation (e2.15) as follows:
\[ \gamma_k(t) = Cov(x_t, x_{t+k}) = Cov( \sum_{i=1}^{t} w_i, \sum_{j=1}^{t+k} w_j ) = \sum_{i=j} Cov(w_i, w_j) = t\sigma^2 \tag{e4.10.5} \]
4.3.6 The difference operator
Differencing adjacent terms of a series can transform a non-stationary series to a stationary series. For example, if the series \({x_t}\) is a random walk, it is non-stationary. However, from Equation (4.4), the first-order differences of \({x_t}\) produce the stationary white noise series \({w_t}\) given by \(x_t - x_{t-1} = w_t\).
Hence, differencing turns out to be a useful “filtering” procedure in the study of non-stationary time series. The difference operator \(\nabla\) is defined by e4.11
e4.11 The Difference Operator
\[ \nabla x_t = x_t - x_{t-1} \tag{e4.11} \]
Note that \(\nabla x_t = (1 - B)x_t\), so that \(\nabla\) can be expressed in terms of the backward shift operator \(B\). In general, higher-order differencing can be expressed as (e4.12)
e4.12 Higher-Order Differencing
\[ \nabla^n = (1 - B)^n \tag{e4.12} \]
The proof of the last result is left to Exercise 7.
e4.13 before this is how we get the alphas or coefficients estimates??
Fig. 4.7. The correlogram of the residuals from the fitted Holt-Winters model for the exchange rate series (UK pounds to NZ dollars, 1991–2000). There are no significant correlations in the residual series, so the model provides a reasonable approximation to the exchange rate data.
\[ \left.\begin{matrix} x_t = x_{t-1} + b_{t-1} + w_t \\ b_{t-1} = 0.167(x_{t-1} - x_{t-2}) + 0.833b_{t-2} \\ \end{matrix}\right\} \tag{e4.13} \]
where {\(w_t\)} is white noise with zero mean.
e4.14 The Integrated Autoregressive Model
After some algebra, Equations (4.13) can be expressed as one equation in terms of the backward shift operator:
\[ (1 - 0.167B + 0.167B^2)(1 - B)x_t = w_t \tag{e4.14} \]
Equation (4.14) is a special case—the integrated autoregressive model within the important class of models known as ARIMA models (Chapter 7). The proof of Equation (4.14) is left to Exercise 8.
e4.14.5 Random Walk with Drift
The random walk model can be adapted to allow for this by including a drift parameter \(\delta\).
\[ x_t = x_{t-1} + \delta + w_t \]
4.5 Autoregressive models
e4.15 Autoregressive Process
4.5.1 defition
The series \(\{x_t\}\) is an autoregressive process of order \(p\), abbreviated to AR(\(p\)), if
\[ x_t = \alpha_1 x_{t-1} + \alpha_2 x_{t-2} + \dots + \alpha_p x_{t-p} + w_t \tag{e4.15} \]
where \(\{w_t\}\) is white noise and the \(\alpha_i\) are the model parameters with \(\alpha_p \neq 0\) for an order \(p\) process.Equation (4.15) can be expressed as a polynomial of order \(p\) in terms of the backward shift operator (e4.16):
e4.16 Autoregressive Process in terms of the Backward Shift Operator
\[ \theta_p(B)x_t = (1 - \alpha_1 B - \alpha_2 B^2 - \dots - \alpha_p B^p)x_t = w_t \tag{e4.16} \] The following points should be noted:
The random walk is the special case AR(1) with \(\alpha_1 = 1\) (see Equation (4.4)).
The exponential smoothing model is the special case \(\alpha_i = \alpha(1 - \alpha)^i\) for \(i = 1, 2, \dots\) and \(p \to \infty\).
The model is a regression of \(x_t\) on past terms from the same series; hence the use of the term “autoregressive.”
A prediction at time \(t\) is given by (e4.17)
e4.17
\[ \hat{x}_t = \alpha_1 x_{t-1} + \alpha_2 x_{t-2} + \dots + \alpha_p x_{t-p} \tag{e4.17} \]
- The model parameters can be estimated by minimizing the sum of squared errors.
e4.18 Second-order properties of an AR(1) model
From Equation (4.15), the AR(1) process is given by
\[ x_t = \alpha x_{t-1} + w_t \tag{e4.18} \]
where \(\{w_t\}\) is a white noise series with mean zero and variance \(\sigma^2\). It can be shown (§4.5.4) that the second-order properties follow as (e4.19):
e4.19
\[ \left.\begin{matrix} \mu_x = 0 \\ \gamma_k = \frac{\alpha^k \sigma^2}{1 - \alpha^2} \\ \end{matrix}\right\} \tag{e4.19} \]
e4.20 Derivation of second-order properties for an AR(1) process*
Using \(B\), a stable AR(1) process (\(|\alpha| < 1\)) can be written as
\[ \left.\begin{matrix} (1 - \alpha B)x_t = w_t \\ \Rightarrow x_t = (1 - \alpha B)^{-1}w_t \\ \Rightarrow x_t = w_t + \alpha w_{t-1} + \alpha^2 w_{t-2} + \dots = \sum_{i=0}^\infty \alpha^i w_{t-i} \\ \end{matrix}\right\} \tag{e4.20} \]
Hence, the mean is given by
\[ E(x_t) = E\left(\sum_{i=0}^\infty \alpha^i w_{t-i}\right) = \sum_{i=0}^\infty \alpha^i E(w_{t-i}) = 0 \]
and the autocovariance follows as
\[ \gamma_k = \text{Cov}(x_t, x_{t+k}) = \text{Cov} \left(\sum_{i=0}^\infty \alpha^i w_{t-i}, \sum_{j=0}^\infty \alpha^j w_{t+k-j}\right) \]
\[ = \sum\nolimits_{j=k+i} \alpha^i \alpha^j \text{Cov}(w_{t-i}, w_{t+k-j}) \]
\[ = \alpha^k \sigma^2 \sum\nolimits_{i=0}^\infty \alpha^{2i} = \frac{\alpha^k \sigma^2}{1 - \alpha^2} \]
using Equation (e2.15) and (e4.2)
e4.21 Autocorrelation function for an AR(1) process
4.5.5 Correlogram of an AR(1) process
From Equation (4.19), the autocorrelation function follows as
\[ \rho_k = \alpha^k \quad (k \geq 0) \tag{e4.21} \]
where \(|\alpha| < 1\). Thus, the correlogram decays to zero more rapidly for small \(\alpha\). The book example for fig 4.11 gives two correlograms for positive and negative values of \(\alpha\), respectively.
e4.22 Akaike Information Criterion
The order \(p\) of the process is chosen using the Akaike Information Criterion (AIC; Akaike, 1974), which penalizes models with too many parameters:
\[ \text{AIC} = -2 \times \text{log-likelihood} + 2 \times \text{number of parameters} \tag{e4.22} \]
In the function ar
, the model with the smallest AIC
is selected as the best-fitting AR model.
e4.23 predicted value \(\hat{z}_t\) for an AR(1)
model
a predicted value \(\hat{z}_t\) at time \(t\) based on the output above is given by
\[ \hat{z}_t = 2.8 + 0.89(z_{t-1} - 2.8) \tag{e4.23} \]
e4.24 Predicted mean \(\hat{x}_t\)
Consider the following AR model fitted to the mean annual temperature series (figure 4.14):
Based on the output for fig 4.14 in the book, a predicted mean annual temperature \(\hat{x}_t\) at time \(t\) is given by
\[ \hat{x}_t = -0.14 + 0.59(x_{t-1} + 0.14) + 0.013(x_{t-2} + 0.14) \\ + 0.11(x_{t-3} + 0.14) + 0.27(x_{t-4} + 0.14) \tag{e4.24} \]
e4.25 exercise 4 model
\[ x_t = \frac{5}{6}x_{t-1} - \frac{1}{6}x_{t-2} + w_t \tag{e4.25} \]