Applied Time Series Notebook
  • Master
  • Projects
    • Overview
    • Applied Time Series Projects

    • Project 1
    • Time Series: China Export Commodities

    • Project 2
    • Time Series Project 2: Consumer Credit

    • Project 3
    • Project title here
  • Ch 1
    • Overview
    • Chapter overview and Task
    • Time Series Homework: Chapter 1 Lesson 1
    • Time Series Homework: Chapter 1 Lesson 2
    • Time Series Homework: Chapter 1 Lesson 3
    • Time Series Homework: Chapter 1 Lesson 4
    • Time Series Homework: Chapter 1 Lesson 5
  • Ch 2
    • Overview
    • Autocorrelation Concepts
    • Time Series Homework: Chapter 2 Lesson 1
    • Time Series Homework: Chapter 2 Lesson 2
    • Time Series Homework: Chapter 2 Lesson 3
  • Ch 3
    • Overview
    • Chapter overview and Task
    • Time Series Homework: Chapter 3 Lesson 2
    • Time Series Homework: Chapter 3 Lesson 3
    • Time Series Homework: Chapter 3 Lesson 4
    • Time Series Homework: Chapter 3 Lesson 5

    • r code Models draft
    • Chapter 3 r code examples and practice

    • Lesson 1
    • White Noise and Random Walks - Part 1
    • Time Series Homework: Chapter 3 Lesson 1
  • Ch 4
    • Overview
    • Chapter overview and Task

    • r code Models draft
    • Chapter 4 r code examples and practice

    • Lesson 1
    • White Noise and Random Walks - Part 1
    • Ch 4.1 Code Notes

    • Lesson 2
    • White Noise and Random Walks - Part 2
    • Time Series Homework: Chapter 4 Lesson 2

    • Lesson 3
    • Autoregressive (AR) Models
    • Time Series Homework: Chapter 4 Lesson 3

    • Lesson 4
    • Fitted AR Models
    • Ch 4.4 Code Notes
  • Ch 5
    • Overview
    • Chapter overview and Task

    • Lesson 1
    • White Noise and Random Walks - Part 1

    • Lesson 1 Notes
    • Linear Models, GLS, and Seasonal Indicator Variables
  • Tools
    • Tools, Help & Ideas
    • Tools, Resources and Help Ideas
    • Markdown Visuals
    • Git and Terminal Commands
    • Steps for formatting Date and Creating Index
    • Functions & Formulas
    • test
  • Outcomes
  • def
  1. Overview
  2. Autocorrelation Concepts
  • Overview
    • Autocorrelation Concepts
    • Time Series Homework: Chapter 2 Lesson 1
    • Time Series Homework: Chapter 2 Lesson 2
    • Time Series Homework: Chapter 2 Lesson 3

On this page

  • 2.1 Purpuse
  • 2.2 Expectation and the ensemble
    • 2.2.1 Expected value
    • 2.2.2 The ensemble and stationary
    • 2.2.3 Ergodic Series*
    • 2.2.4 Variance function
    • 2.2.5 Autocorrelation
  • 2.3 The Correlogram
    • 2.3.1 General Discussion
    • 2.3.2 Example based on air passenger series
  • 2.4 Covariance of sums of random variables
  • 2.5 Summary of Commands used in examples
  • index
    • plots
  1. Overview
  2. Autocorrelation Concepts

Autocorrelation Concepts

Chapter 2

This is the overview of the chapter 4. I will add info from the index file here. I will filter from there to what will go here. My goal is to first have the website have a map of the concepts in time series. I want to first avoid doing it by chapters since it seperates topics and limit my learning to that lesson. Bigger picture first.

2.1 Purpuse

Once we have identified any trend and seasonal effects, we can deseasonalise the time series and remove the trend. If we use the additive decomposition method of §1.5, we first calculate the seasonally adjusted time series and then remove the trend by subtraction. This leaves the random component, but the random component is not necessarily well modelled by independent random variables.

In many cases, consecutive variables will be correlated. If we identify such correlations, we can improve our forecasts, quite dramatically if the correlations are high. We also need to estimate correlations if we are to generate realistic time series for simulations. The correlation structure of a time series model is defined by the correlation function, and we estimate this from the observed time series.

Plots of serial correlation (the ‘correlogram’, defined later) are also used extensively in signal processing applications. The paradigm is an underlying deterministic signal corrupted by noise. Signals from yachts, ships, aeroplanes, and space exploration vehicles are examples. At the beginning of 2007, NASA’s twin Voyager spacecraft were sending back radio signals from the frontier of our solar system, including evidence of hollows in the turbulent zone near the edge.

2.2 Expectation and the ensemble

2.2.1 Expected value

The expected value, commonly abbreviated to expectation, \(E\), of a variable, or a function of a variable, is its mean value in a population. So \(E(x)\) is the mean of \(x\), denoted \(\mu\),1 and \(E\big[(x - \mu)^2\big]\) is the mean of the squared deviations about \(µ\), better known as the variance \(σ^2\) of \(x\).2 The standard deviation, \(σ\) is the square root of the variance. If there are two variables (\(x\), \(y\)), the variance may be generalized to the covariance, \(γ(x, y)\). Covariance is defined by

\[ γ(x, y) = E[(x − µ_x)(y − µ_y)] \tag{e2.1} \]

The covariance is a measure of linear association between two variables (\(x\), \(y\)). In §1.4.3, we emphasized that a linear association between variables does not imply causality.

Sample estimates are obtained by adding the appropriate function of the individual data values and division by \(n\) or, in the case of variance and covariance, \(n − 1\), to give unbiased estimators.3 For example, if we have \(n\) data pairs, \((x_i, y_i)\), the sample covariance is given by

\[ \text{Cov}(x, y) = \frac{\sum (x_i − \bar{x})(y_i − \bar{y})}{n − 1} \tag{e2.2} \]
If the data pairs are plotted, the lines \(x = \bar{x}\) and \(y = \bar{y}\) divide the plot into quadrants. Points in the lower left quadrant have both \((x_i − \bar{x})\) and \((y_i − \bar{y})\) negative, so the product that contributes to the covariance is positive. Points in the upper right quadrant also make a positive contribution. In contrast, points in the upper left and lower right quadrants make a negative contribution to the covariance. Thus, if \(y\) tends to increase when \(x\) increases, most of the points will be in the lower left and upper right quadrants and the covariance will be positive. Conversely, if \(y\) tends to decrease as \(x\) increases, the covariance will be negative. If there is no such linear association, the covariance will be small relative to the standard deviations of \(\{x_i\}\) and \(\{y_i\}\) – always check the plot in case there is a quadratic association or some other pattern.

In R, we can calculate a sample covariance, with denominator \(n−1\), from its definition or by using the function cov. If we use the mean function, we are implicitly dividing by \(n\).

Benzoapyrene is a carcinogenic hydrocarbon that is a product of incomplete combustion. One source of benzoapyrene and carbon monoxide is automobile exhaust. Colucci and Begeman (1971) analyzed sixteen air samples from Herald Square in Manhattan and recorded the carbon monoxide concentration (\(x\), in parts per million) and benzo[a]pyrene concentration (\(y\), in micrograms per thousand cubic metres) for each sample. The data are plotted in Figure 2.1.

figure 2.1 here

We now use R to calculate the covariance for the Herald Square pairs in three different ways:

> x <- CO; y <- Benzoa; n <- length(x)
> sum((x - mean(x)) * (y - mean(y))) / (n - 1)
[1] 5.51
> mean((x - mean(x)) * (y - mean(y)))
[1] 5.17
> cov(x, y)
[1] 5.51

The correspondence between the R code above and the expectation definition of covariance should be noted:

\[ \text{mean}((x − \text{mean}(x)) * (y − \text{mean}(y))) \rightarrow E[(x − µ_x)(y − µ_y)] \tag{e2.3} \]

Given this correspondence, the more natural estimate of covariance would be \(\text{mean}((x − \text{mean}(x)) * (y − \text{mean}(y)))\). However, as can be seen above, the values computed using the internal function cov are those obtained using sum with a denominator of \(n − 1\). As \(n\) gets large, the difference in denominators becomes less noticeable, and the more natural estimate asymptotically4 approaches the unbiased estimate.

Correlation is a dimensionless measure of the linear association between a pair of variables (\(x\), \(y\)) and is obtained by standardizing the covariance by dividing it by the product of the standard deviations of the variables. Correlation takes a value between \(−1\) and \(+1\), with a value of \(0\) indicating no linear association.

The population correlation, \(ρ\), between a pair of variables (\(x\), \(y\)) is defined by

\[ ρ(x, y) = \frac{E[(x − µ_x)(y − µ_y)]}{σ_x σ_y} = \frac{γ(x, y)}{σ_x σ_y} \tag{e2.4} \]

The sample correlation, \(Cor\), is an estimate of \(ρ\) and is calculated as

\[ Cor(x, y) = \frac{Cov(x, y)}{sd(x)sd(y)} \tag{e2.5} \]

In R, the sample correlation for pairs \((x_i, y_i)\) stored in vectors x and y is cor(x, y). A value of \(+1\) or \(−1\) indicates an exact linear association, with the \((x, y)\) pairs falling on a straight line of positive or negative slope, respectively. The correlation between the CO and benzoapyrene measurements at Herald Square is now calculated both from the definition and using cor.

> cov(x,y) / (sd(x)*sd(y))
[1] 0.3551
> cor(x,y)
[1] 0.3551

Although the correlation is small, there is nevertheless a physical explanation for the correlation because both products are a result of incomplete combustion. A correlation of \(0.36\) typically corresponds to a slight visual impression that \(y\) tends to increase as \(x\) increases, although the points will be well scattered.

2.2.2 The ensemble and stationary

The mean function of a time series model is

\[ µ(t) = E(x_t) \tag{e2.6} \]

and, in general, is a function of \(t\). The expectation in this definition is an average taken across the ensemble of all the possible time series that might have been produced by the time series model (Fig. 2.2). The ensemble constitutes the entire population. If we have a time series model, we can simulate more than one time series (see Chapter 4). However, with historical data, we usually only have a single time series, so all we can do, without assuming a mathematical structure for the trend, is to estimate the mean at each sample point by the corresponding observed value. In practice, we make estimates of any apparent trend and seasonal effects in our data and remove them, using decompose, for example, to obtain time series of the random component. Then time series models with a constant mean will be appropriate.

If the mean function is constant, we say that the time series model is stationary in the mean. The sample estimate of the population mean, \(µ\), is the sample mean, \(\bar{x}\):

\[ \bar{x} = \sum_{t=1}^{n} x_t / n \tag{e2.7} \]

Equation (2.7) does rely on an assumption that a sufficiently long time series characterizes the hypothetical model. Such models are known as ergodic, and the models in this book are all ergodic.

2.2.3 Ergodic Series*

A time series model that is stationary in the mean is ergodic in the mean if the time average for a single time series tends to the ensemble mean as the length of the time series increases:

\[ \lim_{n \to \infty} \frac{\sum x_t}{n} = µ \tag{e2.8} \]

This implies that the time average is independent of the starting point. Given that we usually only have a single time series, you may wonder how a time series model can fail to be ergodic, or why we should want a model that is not ergodic. Environmental and economic time series are single realizations of a hypothetical time series model, and we simply define the underlying model as ergodic.

There are, however, cases in which we can have many time series arising from the same time series model. Suppose we investigate the acceleration at the pilot seat of a new design of microlight aircraft in simulated random gusts in a wind tunnel. Even if we have built two prototypes to the same design, we cannot be certain they will have the same average acceleration response because of slight differences in manufacture. In such cases, the number of time series is equal to the number of prototypes. Another example is an experiment investigating turbulent flows in some complex system. It is possible that we will obtain qualitatively different results from different runs because they do depend on initial conditions. It would seem better to run an experiment involving turbulence many times than to run it once for a much longer time. The number of runs is the number of time series. It is straightforward to adapt a stationary time series model to be non-ergodic by defining the means for the individual time series to be from some probability distribution.

2.2.4 Variance function

The variance function of a time series model that is stationary in the mean is

\[ \sigma^2(t) = E[(x_t − \mu)^2] \tag{e2.9} \]

which can, in principle, take a different value at every time \(t\). But we cannot estimate a different variance at each time point from a single time series. To progress, we must make some simplifying assumption. If we assume the model is stationary in the variance, this constant population variance, \(σ^2\), can be estimated from the sample variance:

\[ Var(x) = \frac{\sum (x_t − \bar{x})^2}{n − 1} \tag{e2.10} \]

In a time series analysis, sequential observations may be correlated. If the correlation is positive, Var(x) will tend to underestimate the population variance in a short series because successive observations tend to be relatively similar. In most cases, this does not present a problem since the bias decreases rapidly as the length \(n\) of the series increases.

2.2.5 Autocorrelation

The mean and variance play an important role in the study of statistical distributions because they summarise two key distributional properties – a central location and the spread. Similarly, in the study of time series models, a key role is played by the second-order properties, which include the mean, variance, and serial correlation (described below).

Consider a time series model that is stationary in the mean and the variance. The variables may be correlated, and the model is second-order stationary if the correlation between variables depends only on the number of time steps separating them. The number of time steps between the variables is known as the lag. A correlation of a variable with itself at different times is known as autocorrelation or serial correlation. If a time series model is second-order stationary, we can define an autocovariance function (\(acvf\)), \(\gamma_k\), as a function of the lag \(k\):

\[ \gamma_k = E[(x_t - \mu)(x_{t+k} - \mu)] \tag{e2.11} \]

The function \(\gamma_k\) does not depend on \(t\) because the expectation, which is across the ensemble, is the same at all times \(t\). This definition follows naturally from Equation (2.1) by replacing \(x\) with \(x_t\) and \(y\) with \(x_{t+k}\) and noting that the mean \(\mu\) is the mean of both \(x_t\) and \(x_{t+k}\). The lag \(k\) autocorrelation function (\(acf\)), \(\rho_k\), is defined by:

\[ \rho_k = \frac{\gamma_k}{\sigma^2} \tag{e2.12} \]

It follows from the definition that \(\rho_0\) is 1.

It is possible to set up a second-order stationary time series model that has skewness; for example, one that depends on time \(t\). Applications for such models are rare, and it is customary to drop the term ‘second-order’ and use ‘stationary’ on its own for a time series model that is at least second-order stationary. The term strictly stationary is reserved for more rigorous conditions.

The acvf and acf can be estimated from a time series by their sample equivalents. The sample acvf, \(c_k\), is calculated as:

\[ c_k = \frac{1}{n} \sum_{t=1}^{n-k} (x_t - \bar{x})(x_{t+k} - \bar{x}) \tag{2.13} \]

Note that the autocovariance at lag \(0\), \(c_0\), is the variance calculated with a denominator \(n\). Also, a denominator \(n\) is used when calculating \(c_k\), although only \(n - k\) terms are added to form the numerator. Adopting this definition constrains all sample autocorrelations to lie between \(-1\) and \(1\). The sample acf is defined as:

\[ r_k = \frac{c_k}{c_0} \tag{2.14} \]

We will demonstrate the calculations in R using a time series of wave heights (mm relative to still water level) measured at the centre of a wave tank. The sampling interval is 0.1 second and the record length is 39.7 seconds. The waves were generated by a wave maker driven by a pseudo-random signal that was programmed to emulate a rough sea. There is no trend and no seasonal period, so it is reasonable to suppose the time series is a realisation of a stationary process.

> www <- "http://www.massey.ac.nz/~pscowper/ts/wave.dat"
> wave.dat <- read.table (www, header=T) ; attach(wave.dat)
> plot(ts(waveht)) ; plot(ts(waveht[1:60]))

The upper plot Figure 1 (a) in Figure 2.3 shows the entire time series. There are no outlying values. The lower plot Figure 1 (b) is of the first sixty wave heights. We can see that there is a tendency for consecutive values to be relatively similar and that the form is like a rough sea, with a quasi-periodicity but no fixed frequency.

fig. 2.3. Wave height at centre of tank sampled at 0.1 second intervals.

# book resource code
wave.dat <- read.table ("../data/wave.dat", header=T)
# attach(wave.dat) #bf
plot(ts(wave.dat$waveht)) # fig 2.3a or upper
plot(ts(wave.dat$waveht[1:60])) # fig 2.3b or lower
acf(waveht)$acf[2]
acf(waveht, type = c("covariance"))$acf[2]
plot(waveht[1:396],waveht[2:397])

The autocorrelations of \(x\) are stored in the vector acf(x)$acf, with the lag \(k\) autocorrelation located in acf(x)$acf[k+1]. For example, the lag 1 autocorrelation for waveht is Figure 1 (c):

>acf(waveht)$acf[2]
[1]0.47

The first entry, acf(waveht)$acf[1], is \(r_0\) and equals 1 (not appear in plot, but book code has it). A scatterplot, such as Figure 2.1 for the Herald Square data, complements the calculation of the correlation and alerts us to any non-linear patterns. In a similar way, we can draw a scatterplot corresponding to each autocorrelation. For example, for lag 1, we plot waveht[1:396] against waveht[2:397] to obtain Figure 2.4 Figure 1 (e). Autocovariances are obtained by adding an argument to acf. The lag 1 autocovariance is given by: Figure 1 (d)

>acf(waveht,type=c("covariance"))$acf[2]
[1]33328

see first value in plot and note it matches 33328. not that code is referring to the second acf value, but that is the book code which includes the first observation, but this code does not hence [1].

2.3 The Correlogram

2.3.1 General Discussion

By default, the acf function produces a plot of \(r_k\) against \(k\), which is called the correlogram. For example, Figure 2.5 Figure 1 (c) gives the correlogram for the wave heights obtained from acf(waveht). In general, correlograms have the following features:

  • The x-axis gives the lag (\(k\)) and the y-axis gives the autocorrelation (\(r_k\)) at each lag. The unit of lag is the sampling interval, 0.1 second. Correlation is dimensionless, so there is no unit for the y-axis.

  • If \(\rho_k = 0\), the sampling distribution of \(r_k\) is approximately normal, with a mean of \(-\frac{1}{n}\) and a variance of \(\frac{1}{n}\). The dotted lines on the correlogram are drawn at: \(-\frac{1}{n} \pm \frac{2}{\sqrt{n}}\) If \(r_k\) falls outside these lines, we have evidence against the null hypothesis that \(\rho_k = 0\) at the 5% level. However, we should be careful about interpreting multiple hypothesis tests. Firstly, if \(\rho_k\) does equal 0 at all lags \(k\), we expect 5% of the estimates, \(r_k\), to fall outside the lines. Secondly, the \(r_k\) are correlated, so if one falls outside the lines, the neighbouring ones are more likely to be statistically significant. This will become clearer when we simulate time series in Chapter 4. In the meantime, it is worth looking for statistically significant values at specific lags that have some practical meaning (for example, the lag that corresponds to the seasonal period, when there is one). For monthly series, a significant autocorrelation at lag 12 might indicate that the seasonal adjustment is not adequate.

  • The lag 0 autocorrelation is always 1 and is shown on the plot. Its inclusion helps us compare values of the other autocorrelations relative to the theoretical maximum of 1. This is useful because, if we have a long time series, small values of \(r_k\) that are of no practical consequence may be statistically significant. However, some discernment is required to decide what constitutes a noteworthy autocorrelation from a practical viewpoint. Squaring the autocorrelation can help, as this gives the percentage of variability explained by a linear relationship between the variables. For example, a lag 1 autocorrelation of 0.1 implies that a linear dependency of \(x_t\) on \(x_{t-1}\) would only explain 1% of the variability of \(x_t\). It is a common fallacy to treat a statistically significant result as important when it has almost no practical consequence.

  • The correlogram for wave heights has a well-defined shape that appears like a sampled damped cosine function. This is typical of correlograms of time series generated by an autoregressive model of order 2. We cover autoregressive models in Chapter 4.

If you look back at the plot of the air passenger bookings, there is a clear seasonal pattern and an increasing trend (Fig. 1.1). It is not reasonable to claim the time series is a realisation of a stationary model. But, whilst the population acf was defined only for a stationary time series model, the sample acf can be calculated for any time series, including deterministic signals. Some results for deterministic signals are helpful for explaining patterns in the acf of time series that we do not consider realisations of some stationary process:

  • If you construct a time series that consists of a trend only, the integers from 1 up to 1000 for example, the acf decreases slowly and almost linearly from 1.
  • If you take a large number of cycles of a discrete sinusoidal wave of any amplitude and phase, the acf is a discrete cosine function of the same period.
  • If you construct a time series that consists of an arbitrary sequence of \(p\) numbers repeated many times, the correlogram has a dominant spike of almost 1 at lag \(p\).

Usually, a trend in the data will show in the correlogram as a slow decay in the autocorrelations, which are large and positive due to similar values in the series occurring close together in time. This can be seen in the correlogram for the air passenger bookings acf(AirPassengers) (Fig. 2.6: Correlogram for the air passenger bookings over the period 1949–1960. The gradual decay is typical of a time series containing a trend. The peak at 1 year indicates seasonal variation.) Figure 2 (b). If there is seasonal variation, seasonal spikes will be superimposed on this pattern. The annual cycle appears in the air passenger correlogram as a cycle of the same period superimposed on the gradually decaying ordinates of the acf. This gives a maximum at a lag of 1 year, reflecting a positive linear relationship between pairs of variables \((x_t, x_{t+12})\) separated by 12-month periods. Conversely, because the seasonal trend is approximately sinusoidal, values separated by a period of 6 months will tend to have a negative relationship. For example, higher values tend to occur in the summer months followed by lower values in the winter months. A dip in the acf therefore occurs at lag 6 months (or 0.5 years). Although this is typical for seasonal variation that is approximated by a sinusoidal curve, other series may have patterns, such as high sales at Christmas, that contribute a single spike to the correlogram.

2.3.2 Example based on air passenger series

Although we want to know about trends and seasonal patterns in a time series, we do not necessarily rely on the correlogram to identify them. The main use of the correlogram is to detect autocorrelations in the time series after we have removed an estimate of the trend and seasonal variation. In the code below, the air passenger series is seasonally adjusted and the trend removed using decompose. To plot the random component and draw the correlogram, we need to remember that a consequence of using a centered moving average of 12 months to smooth the time series, and thereby estimate the trend, is that the first six and last six terms in the random component cannot be calculated and are thus stored in R as NA. The random component and correlogram are shown in Figures 2.7 and 2.8, respectively.

> data(AirPassengers)
> AP <- AirPassengers
> AP.decom <- decompose(AP, "multiplicative")
> plot(ts(AP.decom$random[7:138]))
> acf(AP.decom$random[7:138])

The correlogram in Figure 2.8 Figure 2 (c) suggests either a damped cosine shape that is characteristic of an autoregressive model of order 2 (Chapter 4) or that the seasonal adjustment has not been entirely effective. The latter explanation is unlikely because the decomposition does estimate twelve independent monthly indices. If we investigate further, we see that the standard deviation of the original series from July until June is 109, the standard deviation of the series after subtracting the trend estimate is 41, and the standard deviation after seasonal adjustment is just 0.03. (see code for this plots to get this numbers)

> sd(AP[7:138])
[1] 109
> sd(AP[7:138]- AP.decom$trend[7:138])
[1] 41.1
> sd(AP.decom$random[7:138])
[1] 0.0335

The reduction in the standard deviation shows that the seasonal adjustment has been very effective.

2.3.3 Example based on the Font Reservoir series

Monthly effective inflows (\(m^3s^{-1}\)) to the Font Reservoir in Northumberland for the period from January 1909 until December 1980 have been provided by Northumbrian Water PLC. A plot of the data is shown in Figure 2.9. There was a slight decreasing trend over this period and substantial seasonal variation. The trend and seasonal variation have been estimated by regression, as described in Chapter 5, and the residual series (adflow), which we analyze here, can reasonably be considered a realization from a stationary time series model.

The main difference between the regression approach and using decompose is that the former assumes a linear trend, whereas the latter smooths the time series without assuming any particular form for the trend. The correlogram is plotted in Figure 2.10. Figure 3 (b)

> www <- "http://www.massey.ac.nz/~pscowper/ts/Fontdsdt.dat"
> Fontdsdt.dat <- read.table(www, header=T)
> attach(Fontdsdt.dat)
> plot(ts(adflow), ylab = 'adflow')
> acf(adflow, xlab = 'lag (months)', main="")

There is a statistically significant correlation at lag 1. The physical interpretation is that the inflow next month is more likely than not to be above average if the inflow this month is above average. Similarly, if the inflow this month is below average, it is more likely than not that next month’s inflow will be below average. The explanation is that the groundwater supply can be thought of as a slowly discharging reservoir. If groundwater is high one month, it will augment inflows and is likely to do so next month as well. Given this explanation, you may be surprised that the lag 1 correlation is not higher. The explanation for this is that most of the inflow is runoff following rainfall, and in Northumberland there is little correlation between seasonally adjusted rainfall in consecutive months. An exponential decay in the correlogram is typical of a first-order autoregressive model (Chapter 4). The correlogram of the adjusted inflows is consistent with an exponential decay. However, given the sampling errors for a time series of this length, estimates of autocorrelation at higher lags are unlikely to be statistically significant.

This is not a practical limitation because such low correlations are inconsequential. When we come to identify suitable models, we should remember that there is no one correct model and that there will often be a choice of suitable models. We may make use of a specific statistical criterion, such as Akaike’s information criterion, introduced in Chapter 5, to choose a model, but this does not imply that the model is correct.

2.4 Covariance of sums of random variables

In subsequent chapters, second-order properties for several time series models are derived using the result shown in Equation (2.15). Let \(x_1, x_2, \ldots, x_n\) and \(y_1, y_2, \ldots, y_m\) be random variables. Then:

\[ Cov \left( \sum_{i=1}^n x_i, \sum_{j=1}^m y_j \right) = \sum_{i=1}^n \sum_{j=1}^m Cov(x_i, y_j) \tag{2.15} \]

where \(Cov(x, y)\) is the covariance between a pair of random variables \(x\) and \(y\). The result tells us that the covariance of two sums of variables is the sum of all possible covariance pairs of the variables. Note that the special case of \(n = m\) and \(x_i = y_i\) (\(i = 1, \ldots, n\)) occurs in subsequent chapters for a time series \(\{x_t\}\). The proof of Equation (2.15) is left to Exercise 5a.

2.5 Summary of Commands used in examples

  • mean returns the mean (average)
  • var returns the variance with denominator \(n - 1\)
  • sd returns the standard deviation
  • cov returns the covariance with denominator \(n - 1\)
  • cor returns the correlation
  • acf returns the correlogram (or sets the argument to obtain the autocovariance function)

index

plots

Code
# example usage @fig-2.2.5r-3

wave_dat <- read_table("../data/wave.dat") |>
    mutate(index = 1:n()) |>
    as_tsibble(index = index)
wave_dat |> # Subfigure 1: Full time series of wave data
    autoplot()

# Subfigure 2: First 60 observations of wave data
wave_dat |>
    slice(1:60) |>
    autoplot()

# Subfigure 3: Autocorrelation function (ACF)
acf_data <- ACF(wave_dat)
ACF(wave_dat) |> autoplot() +
    geom_text(data = acf_data, aes(x = lag, y = acf, label = round(acf, 2)), vjust = -0.5, size = 3)

# Subfigure 4: Autocovariance function (ACVF)
acvf_data <- ACF(wave_dat, type = "covariance")
ACF(wave_dat, type = "covariance") |> autoplot() +
    geom_text(data = acvf_data, aes(x = lag, y = acf, label = round(acf, 2)), vjust = -0.5, size = 3)

# Subfigure 5: Scatterplot of wave heights vs. lagged wave heights
wave_dat |>
    mutate(waveht_lag = lead(waveht)) |>
    ggplot(aes(x = waveht, y = waveht_lag)) + geom_point()
(a) fig 2.3a Full time series of wave data.
(b) fig 2.3b 1st 60 observations of wave data.
(c) fig 2.5 e2.14 correlogram for the wave heights obt. from acf(waveht)
(d) fig 2.55 e2.11 or e2.13 ACVF of wave data.
(e) fig 2.4 wave height pairs seperated by a lag of 1.
Figure 1: ‘Autocorrelation’ and ‘Wave heights’ Modern Look
Code
# resource code 2.3.1


data(AirPassengers)
AP <- AirPassengers
AP.decom <- decompose(AP, "multiplicative")# Decompose the data

# Subfigure 1: Time series of random component (7:138)
plot(ts(AP.decom$random[7:138]), main = "Time Series of Random Component")

# Subfigure 2: ACF of the full AirPassengers data
acf_data <- acf(AP, plot = FALSE)
plot(acf_data, main = "fig 2.6a: Full ACF of AirPassengers data")
text(x = acf_data$lag, y = acf_data$acf, labels = round(acf_data$acf, 2), pos = 3, cex = 0.8)

# Subfigure 3: ACF of random component (7:138)
acf_random_data <- acf(AP.decom$random[7:138], plot = FALSE)
plot(acf_random_data, main = "fig 2.6b: ACF of Random Component (7:138)")
text(x = acf_random_data$lag, y = acf_random_data$acf, labels = round(acf_random_data$acf, 2), pos = 3, cex = 0.8)

# Standard deviations
timeseries_sd <- sd(AP[7:138])
trend_sd <- sd(AP[7:138] - AP.decom$trend[7:138])
random_sd <- sd(AP.decom$random[7:138])
(a) fig 2.7: Random component of the airp data after removing the trend and the seasonal variation
(b) fig 2.6: Correlogram for the air passenger bookings over the period 1949–1960 The gradual decay is typical of a time series containing a trend. The peak at 1 year indicates seasonal variation.
(c) fig 2.8: Correlogram for the random component of air passenger bookings over the period 1949–1960.
Figure 2: AirPassengers Decomposition and Autocorrelation

The standard deviation of the original series from July until June is 109.4186843 The standard deviation of the trend component is 41.1149062. The standard deviation of the random component is 0.0333884.

Code
fontdsdt_dat <- read_table("../data/Fontdsdt.dat")
index_dates <- seq(lubridate::ymd("1909-01-01"), by = "1 months", length.out = nrow(fontdsdt_dat))
f_ts <- fontdsdt_dat |>
    mutate(date = index_dates, month = tsibble::yearmonth(date)) |>
    as_tsibble(index = month)

# Subfigure 1: Time series of adflow
autoplot(f_ts) + labs(y = "adflow", title = "f.ts <- ts(Fontdsdt.dat$adflow, st = 1909, fr = 12)")

# Subfigure 2: ACF of adflow
acf_adflow <- ACF(f_ts)
autoplot(acf_adflow) + labs(x = "lag (months)", title = "f.decom <- decompose(f.ts, multiplicative)") +
    geom_text(data = as_tibble(acf_adflow), aes(x = lag, y = acf, label = round(acf, 2)), vjust = -0.5, size = 3)

# Decompose the time series
f_decom <- f_ts |>
    model(feasts::classical_decomposition(adflow, type = "mult")) |>
    components()

# Subfigure 3: ACF of random component (7:857)
acf_random <- ACF(f_decom, .vars = random)
autoplot(acf_random) + labs(title = "acf(f.decom$random[7:857])") +
    geom_text(data = as_tibble(acf_random), aes(x = lag, y = acf, label = round(acf, 2)), vjust = -0.5, size = 3)
(a) Fig. 2.9. Adjusted inflows to the Font Reservoir, 1909–1980.
(b) Fig. 2.10. Correlogram for adjusted inflows to the Font Reservoir, 1909–1980.
(c) ACF of random component (7:857)
Figure 3: Example based on the Font Reservoir series
Back to top

Footnotes

  1. A more formal definition of the expectation \(E\) of a function \(\phi(x, y)\) of continuous random variables \(x\) and \(y\), with a joint probability density function \(f(x, y)\), is the The mean value for \(\phi\) obtained by integrating over all possible values of \(x\) and \(y\) is given by: \[ E[\phi(x,y)] = \int_y \int_x \phi(x,y) f(x,y) \, dx \, dy \] Note that the mean of \(x\) is obtained as the special case \(\phi(x, y) = x\).↩︎

  2. For more than one variable, subscripts can be used to distinguish between the properties; e.g., for the means, we may write \(\mu_x\) and \(\mu_y\) to distinguish between the mean of \(x\) and the mean of \(y\).↩︎

  3. An estimator is unbiased for a population parameter if its average value, in infinitely repeated samples of size \(n\), equals that population parameter. If an estimator is unbiased, its value in a particular sample is referred to↩︎

  4. In statistics, asymptotically means as the sample size approaches infinity.↩︎

Eduardo Ramirez 2024©