Code
<- rio::import("https://byuistats.github.io/timeseries/data/ind_prod_us.csv") ind_prod
Eduardo Ramirez
The first part of any time series analysis is context. You cannot properly analyze data without knowing what the data is measuring. Without context, the most simple features of data can be obscure and inscrutable. This homework assignment will center around the series below.
Please research the time series. In the spaces below, give the data collection process, unit of analysis, and meaning of each observation for the series.
https://fred.stlouisfed.org/series/IPB50001N
The Federal Reserve’s monthly index of industrial production measures the real output of the U.S. industrial sector, which includes manufacturing, mining, and utilities, relative to a 2017 base of 100. The data is reported monthly in index form (not seasonally adjusted) and tracks changes in production activity over time, capturing fluctuations in economic performance. Each observation reflects the current level of industrial output compared to the base year, indicating growth or contraction in these sectors.
Lag ACF
1 0 1.0000000
2 1 0.9977744
3 2 0.9958954
4 3 0.9940528
5 4 0.9921186
6 5 0.9902334
7 6 0.9883675
8 7 0.9863674
9 8 0.9843885
10 9 0.9825565
Based on the correlogram of the US Industrial Production Index, there is evidence of a trend due to the slow decline in autocorrelation values across many lags, indicating persistent relationships over time. The high autocorrelation values gradually decrease, which is typical of a trending series. However, there is no clear evidence of a seasonal pattern, as the autocorrelation function (ACF) does not show repeating spikes at regular intervals.
ind_prod$date <- lubridate::mdy(ind_prod$date)
ind_prod$date <- yearmonth(ind_prod$date)
ind_prod$ind_prod_indx <- as.numeric(ind_prod$ind_prod_indx)
ind_tsbl <- as_tsibble(ind_prod, key = NULL, index = date, regular = TRUE)
# interval(ind_tsbl)
# has_gaps(ind_tsbl)
ind_decom_add <- ind_tsbl |>
model(feasts::classical_decomposition(ind_prod_indx,
type = "add")) |>
components()
autoplot(ind_decom_add)
The time series data is increasing over time, so based on what I have learned so far, we need to use the Multiplicative decomposition model. The seasonally adjusted X needs to account for seasonalities within the year and the overall average over time, so the seasonally adjusted X should do a good job of portraying the trend if there is one. The trend is rising over time, so we have a trend. The variation in the seasonal component between the additive decomposition is larger than the multiplicative decomposition, which we can see in the small bar to the left of the plot. Thus, the multiplicative decomposition does a better job of capturing the seasonal component. It is the same case for the other three components. So, the multiplicative model does a better job of capturing the trend, seasonal, and random components.
The random component is noticeable because the multiplicative model has a range between 0.8 and 1.25 and, for the most part, stays between the random values of 0.95 and 1.05, so this model does a great job of identifying the ups and downs in this time series. The higher random components in the multiplicative decomposition can be linked to major world events and economic events, like major U.S. and/or world wars, the financial crisis of 2008, and COVID.
The Additive decomposition model has random values between -15 and 5. The random component gets larger and larger over time, so it doesn’t capture anything that can be linked to abnormal situations. The add random component fails to captured reasonable world and economic events that can have their effect in this time series. It does not make sense because it starts to go haywire in the late 1990s, and the random component continues to increase thereafter.
The seasonal component charts below show us a more stable season within the time series, showing a better picture of the seasonal component for both classical decompositions. The additive model assumes a constant seasonal effect, leading it to oscillate around zero hence the large seasonal variations. In the multiplicative model, the seasonal effect varies proportionally with the level of trend, which is increasing, and the additive model fails to capture this.
# Filter to retain only the first 12 months of the seasonal component
ind_decom_add_12_seasonal <- ind_decom_add |> filter(row_number() <= 12) |> select(date, seasonal)
ind_decom_mult_12_seasonal <- ind_decom_mult |> filter(row_number() <= 12) |> select(date, seasonal)
# Plot additive seasonal component with only the first 12 months
plot_add_seasonal_12 <- ggplot(ind_decom_add_12_seasonal, aes(x = date, y = seasonal)) +
geom_line() +
ggtitle("Add Decomposition - Seasonal Component (First 12 Months)") +
ylab("Seasonal Component") +
xlab("Month")
# Plot multiplicative seasonal component with only the first 12 months
plot_mult_seasonal_12 <- ggplot(ind_decom_mult_12_seasonal, aes(x = date, y = seasonal)) +
geom_line() +
ggtitle("Mult Decomposition - Seasonal Component (First 12 Months)") +
ylab("Seasonal Component") +
xlab("Month")
# Combine both plots: Add on top, Multi on bottom
plot_add_seasonal_12 / plot_mult_seasonal_12
The correlogram of the random component of the US Industrial Production Index indicates some statistically significant autocorrelations at certain lags, but these may not hold practical significance, as the random component should ideally exhibit minimal pattern. Although the presence of statistically significant correlations could suggest non-random influences, it is important to carefully assess whether these correlations correspond to meaningful economic fluctuations or are simply due to noise. Correlation doesn’t always equal causation so I just need to clarify with Brother Moncayo on how to properly answer this question.
Removing trend and seasonal variation is essential to reveal the true autocorrelation structure of the time series, as trends and seasonal patterns can introduce misleadingly high or low correlations at various lags. By eliminating these components, the correlogram becomes a more accurate tool for identifying short-term relationships within the data. This step is crucial for meeting the stationarity assumption needed for reliable time series analysis.
Analyzing the autocorrelation of the random component helps to understand the underlying fluctuations and irregularities in the time series, highlighting the small-scale dynamics that may not be apparent from the main trend or seasonal patterns. This analysis can reveal subtle patterns or dependencies within the residuals, indicating if there are any remaining structures, such as minor trends or cyclical behavior, that could impact the model’s accuracy. It provides deeper insights into the data’s behavior, aiding in refining the model and ensuring it adequately captures all significant features.
Criteria | Mastery (10) | Incomplete (0) | |
Question 1: Context and Measurement | The student thoroughly researches the data collection process, unit of analysis, and meaning of each observation for both the requested time series. Clear and comprehensive explanations are provided. | The student does not adequately research or provide information on the data collection process, unit of analysis, and meaning of each observation for the specified series. | |
Mastery (5) | Incomplete (0) | ||
Question 2a: Correlogram | The student plots a correlogram of the time series requested. The plot accurately displays autocorrelation values at various lags. If code is well-commented, providing clarity on the plotting process. The labels, title, and legends are appropriate and match the quality of the illustrations in the Time Series notebook. | The student attempts to plot a correlogram of the time series requested but encounters significant errors or lacks clarity in their plot. If code is used, it may lack sufficient commenting or coherence, making it challenging to understand the plotting process. Overall, the plot may lack detail or accuracy, highlighting areas for improvement in time series visualization skills. | |
Mastery (15) | Incomplete (0) | ||
Question 2b: Interpretation | The student effectively interprets the correlogram to identify evidence of trend or seasonal components in the time series. Their description matches the textbook description in page 37. | The student attempts to interpret the correlogram but encounters errors or lacks clarity in their analysis. There may be inaccuracies in interpreting autocorrelation values or misinterpretation of the findings, indicating a limited understanding of correlogram analysis techniques. Overall, the justification for findings may lack depth or accuracy. | |
Mastery (5) | Incomplete (0) | ||
Question 3a: Decomposition | The student plots a decomposition of the US Industrial Production Index series, including the original series, trend, seasonal variation, and random component. The code is well-commented, providing clarity on the decomposition process. The labels, title, and legends are appropriate and enhance the understanding of the plot, matching the quality of illustrations in the Time Series notebook. | The student attempts to plot a decomposition of the US Industrial Production Index series but encounters significant errors or lacks clarity in their plot. The code lacks sufficient commenting or coherence, making it challenging to understand the decomposition process. Overall, the plot may lack detail or accuracy. | |
Mastery (5) | Incomplete (0) | ||
Question 3b: Modeling Justification | Provides a well-reasoned justification for choosing either the additive or multiplicative decomposition model, clearly explaining how the data’s characteristics (e.g., seasonality, trend) influence the choice. | Fails to provide a clear or logical justification, or the explanation is incorrect or unsupported by the data’s characteristics. | ||
Mastery (5) | Incomplete (0) | ||
Question 4a: Correlogram of random component | The student plots a correlogram of the time series requested. The plot accurately displays autocorrelation values at various lags. If code is well-commented, providing clarity on the plotting process. The labels, title, and legends are appropriate and match the quality of the illustrations in the Time Series notebook. | The student attempts to plot a correlogram of the time series requested but encounters significant errors or lacks clarity in their plot. If code is used, it may lack sufficient commenting or coherence, making it challenging to understand the plotting process. Overall, the plot may lack detail or accuracy, highlighting areas for improvement in time series visualization skills. | |
Mastery (15) | Incomplete (0) | ||
Question 4b: Interpretation | Clearly interprets the correlogram, explaining the statistical significance of correlations and addressing practical significance. Provides well-reasoned justification for when statistically significant correlations are not practically important. | Fails to interpret the correlogram accurately, does not explain statistical or practical significance clearly, or provides weak justification for distinguishing between statistical and practical significance. | | ||
Mastery (10) | Incomplete (0) | ||
Question 5a: Introspection | The student explains the importance of removing trend and seasonal variation before analyzing correlograms, showing an understanding of stationarity assumptions in time series analysis. They recognize that trend and seasonality violate these assumptions, potentially distorting autocorrelation patterns. | The student attempts to explain the importance of removing trend and seasonal variation before analyzing correlograms but may struggle with clarity or accuracy. Their understanding of stationarity assumptions in time series analysis might be limited, leading to inconsistencies or inaccuracies. Overall, their explanation may lack depth, indicating areas for improvement in understanding preprocessing steps and stationarity assumptions. | |
Mastery (10) | Incomplete (0) | ||
Question 5b: Introspection | The student effectively speculates on the importance of autocorrelation analysis of the random component in time series data for modeling, investigation, and forecasting. Their discussion shows understanding of the topics we have already covered in class. The submission shows effort. | Overall, their explanation may lack depth, or clarity. |
|
Total Points | 80 |