This is the overview of the chapter 4. I will add info from the index file here. I will filter from there to what will go here. My goal is to first have the website have a map of the concepts in time series. I want to first avoid doing it by chapters since it seperates topics and limit my learning to that lesson. Bigger picture first.
2.2 Expectation and the ensemble
2.2.1 Expected value
The expected value
, commonly abbreviated to expectation, \(E\), of a variable, or a function of a variable, is its mean value in a population. So \(E(x)\) is the mean of \(x\), denoted \(\mu\), and \(E\big[(x - \mu)^2\big]\) is the mean of the squared deviations about \(µ\), better known as the variance \(σ^2\) of \(x\). The standard deviation, \(σ\) is the square root of the variance. If there are two variables (\(x\), \(y\)), the variance may be generalized to the covariance, \(γ(x, y)\). Covariance is defined by
\[
γ(x, y) = E[(x − µ_x)(y − µ_y)] \tag{e2.1}
\]
The covariance
is a measure of linear association between two variables (\(x\), \(y\)). In §1.4.3, we emphasized that a linear association between variables does not imply causality.
Sample estimates are obtained by adding the appropriate function of the individual data values and division by \(n\) or, in the case of variance and covariance, \(n − 1\), to give unbiased estimators. For example, if we have \(n\) data pairs, \((x_i, y_i)\), the sample covariance is given by
\[
\text{Cov}(x, y) = \frac{\sum (x_i − \bar{x})(y_i − \bar{y})}{n − 1} \tag{e2.2}
\]
If the data pairs are plotted, the lines \(x = \bar{x}\) and \(y = \bar{y}\) divide the plot into quadrants. Points in the lower left quadrant have both \((x_i − \bar{x})\) and \((y_i − \bar{y})\) negative, so the product that contributes to the covariance is positive. Points in the upper right quadrant also make a positive contribution. In contrast, points in the upper left and lower right quadrants make a negative contribution to the covariance. Thus, if \(y\) tends to increase when \(x\) increases, most of the points will be in the lower left and upper right quadrants and the covariance
will be positive. Conversely, if \(y\) tends to decrease as \(x\) increases, the covariance
will be negative. If there is no such linear association, the covariance will be small relative to the standard deviations of \(\{x_i\}\) and \(\{y_i\}\) – always check the plot in case there is a quadratic association or some other pattern.
In R, we can calculate a sample covariance
, with denominator \(n−1\), from its definition or by using the function cov
. If we use the mean
function, we are implicitly dividing by \(n\).
Benzoapyrene is a carcinogenic hydrocarbon that is a product of incomplete combustion. One source of benzoapyrene and carbon monoxide is automobile exhaust. Colucci and Begeman (1971) analyzed sixteen air samples from Herald Square in Manhattan and recorded the carbon monoxide concentration (\(x\), in parts per million) and benzo[a]pyrene concentration (\(y\), in micrograms per thousand cubic metres) for each sample. The data are plotted in Figure 2.1.
figure 2.1 here
We now use R to calculate the covariance for the Herald Square pairs in three different ways:
> x <- CO; y <- Benzoa; n <- length(x)
> sum((x - mean(x)) * (y - mean(y))) / (n - 1)
[1] 5.51
> mean((x - mean(x)) * (y - mean(y)))
[1] 5.17
> cov(x, y)
[1] 5.51
The correspondence between the R code above and the expectation definition of covariance should be noted:
\[
\text{mean}((x − \text{mean}(x)) * (y − \text{mean}(y))) \rightarrow E[(x − µ_x)(y − µ_y)] \tag{e2.3}
\]
Given this correspondence, the more natural estimate of covariance would be \(\text{mean}((x − \text{mean}(x)) * (y − \text{mean}(y)))\). However, as can be seen above, the values computed using the internal function cov
are those obtained using sum
with a denominator of \(n − 1\). As \(n\) gets large, the difference in denominators becomes less noticeable, and the more natural estimate asymptotically approaches the unbiased estimate.
Correlation is a dimensionless measure of the linear association between a pair of variables (\(x\), \(y\)) and is obtained by standardizing the covariance by dividing it by the product of the standard deviations of the variables. Correlation takes a value between \(−1\) and \(+1\), with a value of \(0\) indicating no linear association.
The population correlation
, \(ρ\), between a pair of variables (\(x\), \(y\)) is defined by
\[
ρ(x, y) = \frac{E[(x − µ_x)(y − µ_y)]}{σ_x σ_y} = \frac{γ(x, y)}{σ_x σ_y} \tag{e2.4}
\]
The sample correlation
, \(Cor\), is an estimate of \(ρ\) and is calculated as
\[
Cor(x, y) = \frac{Cov(x, y)}{sd(x)sd(y)} \tag{e2.5}
\]
In R, the sample correlation for pairs \((x_i, y_i)\) stored in vectors x
and y
is cor(x, y)
. A value of \(+1\) or \(−1\) indicates an exact linear association, with the \((x, y)\) pairs falling on a straight line of positive or negative slope, respectively. The correlation between the CO and benzoapyrene measurements at Herald Square is now calculated both from the definition and using cor
.
> cov(x,y) / (sd(x)*sd(y))
[1] 0.3551
> cor(x,y)
[1] 0.3551
Although the correlation is small, there is nevertheless a physical explanation for the correlation because both products are a result of incomplete combustion. A correlation of \(0.36\) typically corresponds to a slight visual impression that \(y\) tends to increase as \(x\) increases, although the points will be well scattered.
2.2.2 The ensemble and stationary
The mean function of a time series model is
\[
µ(t) = E(x_t) \tag{e2.6}
\]
and, in general, is a function of \(t\). The expectation
in this definition is an average taken across the ensemble
of all the possible time series that might have been produced by the time series model (Fig. 2.2). The ensemble constitutes the entire population. If we have a time series model, we can simulate more than one time series (see Chapter 4). However, with historical data, we usually only have a single time series, so all we can do, without assuming a mathematical structure for the trend, is to estimate the mean at each sample point by the corresponding observed value. In practice, we make estimates of any apparent trend and seasonal effects in our data and remove them, using decompose
, for example, to obtain time series of the random component. Then time series models with a constant mean will be appropriate.
If the mean function is constant, we say that the time series model is stationary in the mean. The sample estimate of the population mean, \(µ\), is the sample mean, \(\bar{x}\):
\[
\bar{x} = \sum_{t=1}^{n} x_t / n \tag{e2.7}
\]
Equation (2.7) does rely on an assumption that a sufficiently long time series characterizes the hypothetical model. Such models are known as ergodic, and the models in this book are all ergodic.
2.2.3 Ergodic Series*
A time series model that is stationary in the mean is ergodic in the mean if the time average for a single time series tends to the ensemble mean as the length of the time series increases:
\[
\lim_{n \to \infty} \frac{\sum x_t}{n} = µ \tag{e2.8}
\]
This implies that the time average is independent of the starting point. Given that we usually only have a single time series, you may wonder how a time series model can fail to be ergodic
, or why we should want a model that is not ergodic
. Environmental and economic time series are single realizations of a hypothetical time series model, and we simply define the underlying model as ergodic.
There are, however, cases in which we can have many time series arising from the same time series model. Suppose we investigate the acceleration at the pilot seat of a new design of microlight aircraft in simulated random gusts in a wind tunnel. Even if we have built two prototypes to the same design, we cannot be certain they will have the same average acceleration response because of slight differences in manufacture. In such cases, the number of time series is equal to the number of prototypes. Another example is an experiment investigating turbulent flows in some complex system. It is possible that we will obtain qualitatively different results from different runs because they do depend on initial conditions. It would seem better to run an experiment involving turbulence many times than to run it once for a much longer time. The number of runs is the number of time series. It is straightforward to adapt a stationary time series model to be non-ergodic
by defining the means for the individual time series to be from some probability distribution.
2.2.4 Variance function
The variance function of a time series model that is stationary in the mean is
\[
\sigma^2(t) = E[(x_t − \mu)^2] \tag{e2.9}
\]
which can, in principle, take a different value at every time \(t\). But we cannot estimate a different variance at each time point from a single time series. To progress, we must make some simplifying assumption. If we assume the model is stationary
in the variance
, this constant population variance, \(σ^2\), can be estimated from the sample variance:
\[
Var(x) = \frac{\sum (x_t − \bar{x})^2}{n − 1} \tag{e2.10}
\]
In a time series analysis, sequential observations may be correlated. If the correlation is positive, Var(x) will tend to underestimate the population variance in a short series because successive observations tend to be relatively similar. In most cases, this does not present a problem since the bias decreases rapidly as the length \(n\) of the series increases.
2.2.5 Autocorrelation
The mean and variance play an important role in the study of statistical distributions because they summarise two key distributional properties – a central location
and the spread
. Similarly, in the study of time series models, a key role is played by the second-order properties
, which include the mean, variance, and serial correlation (described below).
Consider a time series model that is stationary
in the mean and the variance. The variables may be correlated, and the model is second-order stationary if the correlation between variables depends only on the number of time steps separating them. The number of time steps between the variables is known as the lag. A correlation of a variable with itself at different times is known as autocorrelation or serial correlation. If a time series model is second-order stationary, we can define an autocovariance function (\(acvf\)), \(\gamma_k\), as a function of the lag \(k\):
\[
\gamma_k = E[(x_t - \mu)(x_{t+k} - \mu)]
\tag{e2.11}
\]
The function \(\gamma_k\) does not depend on \(t\) because the expectation, which is across the ensemble
, is the same at all times \(t\). This definition follows naturally from Equation (2.1) by replacing \(x\) with \(x_t\) and \(y\) with \(x_{t+k}\) and noting that the mean \(\mu\) is the mean of both \(x_t\) and \(x_{t+k}\). The lag \(k\) autocorrelation function (\(acf\)), \(\rho_k\), is defined by:
\[
\rho_k = \frac{\gamma_k}{\sigma^2} \tag{e2.12}
\]
It follows from the definition that \(\rho_0\) is 1.
It is possible to set up a second-order stationary time series model that has skewness; for example, one that depends on time \(t\). Applications for such models are rare, and it is customary to drop the term ‘second-order’ and use ‘stationary’ on its own for a time series model that is at least second-order stationary. The term strictly stationary is reserved for more rigorous conditions.
The acvf
and acf
can be estimated from a time series by their sample equivalents. The sample acvf
, \(c_k\), is calculated as:
\[
c_k = \frac{1}{n} \sum_{t=1}^{n-k} (x_t - \bar{x})(x_{t+k} - \bar{x})
\tag{2.13}
\]
Note that the autocovariance at lag \(0\), \(c_0\), is the variance calculated with a denominator \(n\). Also, a denominator \(n\) is used when calculating \(c_k\), although only \(n - k\) terms are added to form the numerator. Adopting this definition constrains all sample autocorrelations to lie between \(-1\) and \(1\). The sample acf is defined as:
\[
r_k = \frac{c_k}{c_0}
\tag{2.14}
\]
We will demonstrate the calculations in R using a time series of wave heights (mm relative to still water level) measured at the centre of a wave tank. The sampling interval is 0.1 second and the record length is 39.7 seconds. The waves were generated by a wave maker driven by a pseudo-random signal that was programmed to emulate a rough sea. There is no trend and no seasonal period, so it is reasonable to suppose the time series is a realisation of a stationary process.
> www <- "http://www.massey.ac.nz/~pscowper/ts/wave.dat"
> wave.dat <- read.table (www, header=T) ; attach(wave.dat)
> plot(ts(waveht)) ; plot(ts(waveht[1:60]))
The upper plot Figure 1 (a) in Figure 2.3 shows the entire time series. There are no outlying values. The lower plot Figure 1 (b) is of the first sixty wave heights. We can see that there is a tendency for consecutive values to be relatively similar and that the form is like a rough sea, with a quasi-periodicity but no fixed frequency.
fig. 2.3. Wave height at centre of tank sampled at 0.1 second intervals.
# book resource code
wave.dat <- read.table ("../data/wave.dat", header=T)
# attach(wave.dat) #bf
plot(ts(wave.dat$waveht)) # fig 2.3a or upper
plot(ts(wave.dat$waveht[1:60])) # fig 2.3b or lower
acf(waveht)$acf[2]
acf(waveht, type = c("covariance"))$acf[2]
plot(waveht[1:396],waveht[2:397])
The autocorrelations of \(x\) are stored in the vector acf(x)$acf
, with the lag \(k\) autocorrelation located in acf(x)$acf[k+1]
. For example, the lag 1 autocorrelation for waveht
is Figure 1 (c):
>acf(waveht)$acf[2]
[1]0.47
The first entry, acf(waveht)$acf[1]
, is \(r_0\) and equals 1 (not appear in plot, but book code has it). A scatterplot, such as Figure 2.1 for the Herald Square data, complements the calculation of the correlation and alerts us to any non-linear patterns. In a similar way, we can draw a scatterplot corresponding to each autocorrelation. For example, for lag 1, we plot waveht[1:396]
against waveht[2:397]
to obtain Figure 2.4 Figure 1 (e). Autocovariances are obtained by adding an argument to acf
. The lag 1 autocovariance is given by: Figure 1 (d)
>acf(waveht,type=c("covariance"))$acf[2]
[1]33328
see first value in plot and note it matches 33328. not that code is referring to the second acf value, but that is the book code which includes the first observation, but this code does not hence [1].