Time Series Homework: Chapter 3 Lesson 2

Eduardo Ramirez

Data

Code

build_dat <- rio::import("https://byuistats.github.io/timeseries/data/mortality_us.xlsx")

Questions

Question 1 - Context and Measurement (10 points)

The first part of any time series analysis is context. You cannot properly analyze data without knowing what the data is measuring. Without context, the most simple features of data can be obscure and inscrutable. This homework assignment will center around the series below.

Please research the time series. In the spaces below, give the data collection process, unit of analysis, and meaning of each observation for the series.

US Mortality

https://wonder.cdc.gov/wonder/help/cmf.html#

Note: The data is self-explanatory, don’t get lost in the documentation page.

Answer

Overview data

Code

pacman::p_load("tsibble", "fable", "feasts",
    "tsibbledata", "fable.prophet", "tidyverse", "patchwork")
build_dat <- rio::import("https://byuistats.github.io/timeseries/data/mortality_us.xlsx") |>
    rename(
        mort = "mort_rate_per_100,000",
        index = year  # rename `year` to `index`
    ) |>
    mutate(
        mort = as.numeric(mort),
        deaths = as.numeric(deaths)
    ) |>
    as_tsibble(index = index)  # set `index` as the tsibble index

Answer

I ask chatgpt or help plotting this do to differences in the y scale.

Chatgpt: In this ggplot, mort and deaths are plotted on dual y-axes to allow simultaneous visualization despite differing scales. deaths is linearly scaled to align with mort values on the primary y-axis, while a secondary y-axis reverses this transformation to reflect deaths on its natural scale. This approach allows for direct visual comparison of trends and seasonalities in both variables over a common time index, facilitating insight into potential covariation and relative dynamics across time.

Code

# Calculate the scaled deaths for the secondary axis
max_mort <- max(build_dat$mort, na.rm = TRUE)
max_deaths <- max(build_dat$deaths, na.rm = TRUE)
build_dat <- build_dat %>%
    mutate(scaled_deaths = deaths / max_deaths * max_mort)

# Plot with dual y-axes
mort_series_plot <- ggplot(build_dat, aes(x = index)) +
    geom_line(aes(y = mort, color = "Mortality")) +
    geom_line(aes(y = scaled_deaths, color = "Deaths")) +  # Use scaled deaths for plotting
    scale_y_continuous(
        name = "Mortality",
        sec.axis = sec_axis(~ . * max_deaths / max_mort, name = "Deaths")  # Secondary axis transformation
    ) +
    scale_color_manual(values = c("Mortality" = "blue", "Deaths" = "red")) +
    theme(legend.position = "bottom") +
    labs(color = "Legend")
mort_series_plot

Answer

next part

Code

acf_appr <- ACF(build_dat, y = mort) |> autoplot() +
    labs(title = "mort")
acf_act <- ACF(build_dat, y = deaths) |> autoplot() +
    labs(title = "deaths")
joint_ccf_plot <- build_dat |>
  CCF(y = mort, x = deaths) |> autoplot() +
  labs(title = "CCF Plot")
(acf_appr + acf_act) / joint_ccf_plot

Code

CCF(build_dat, mort, deaths)

# A tsibble: 29 x 2 [1Y]
        lag    ccf
   <cf_lag>  <dbl>
 1     -14Y 0.0883
 2     -13Y 0.0799
 3     -12Y 0.0586
 4     -11Y 0.0452
 5     -10Y 0.0448
 6      -9Y 0.0457
 7      -8Y 0.0467
 8      -7Y 0.0463
 9      -6Y 0.0409
10      -5Y 0.0253
# ℹ 19 more rows

next part

I ran into a no season error when doing the classical decomposition, this is probably why Bro. Moncayo ask us how we can do a classical decomposition without the tred

here is chatgpt’s notes

The error occurs because the feasts::classical_decomposition() function requires multiple observations per period to detect seasonality, but with only one observation per year, no seasonal pattern can be extracted, resulting in a null_mdl object that causes further processing errors.

Code

# Fit trend models to the data
app_model <- build_dat %>%
    model(trend_model = TSLM(mort ~ trend()))

act_model <- build_dat %>%
    model(trend_model = TSLM(deaths ~ trend()))

# Extract the fitted values and residuals using augment()
app_decompose <- app_model %>%
    augment() %>%
    select(index, mort, trend = .fitted, remainder = .resid)

act_decompose <- act_model %>%
    augment() %>%
    select(index, deaths, trend = .fitted, remainder = .resid)

# Plot the ACF of the remainder components
app_random <- ACF(app_decompose, remainder) %>% autoplot() +
    labs(title = "ACF of Mortality Remainder")

act_random <- ACF(act_decompose, remainder) %>% autoplot() +
    labs(title = "ACF of Deaths Remainder")

# Merge the remainder components for cross-correlation analysis
random_decompose <- app_decompose %>%
    select(index, random_app = remainder) %>%
    left_join(
        act_decompose %>% select(index, random_act = remainder),
        by = "index"
    )

# Plot the cross-correlation function between the remainder components
joint_ccf_random <- random_decompose %>%
    CCF(y = random_app, x = random_act) %>%
    autoplot() +
    labs(title = "CCF between Mortality and Deaths Remainders")

# Display the plots
library(patchwork)
(app_random + act_random) / joint_ccf_random

notes on the error with seasonalities for this dataset:The error arose because the classical decomposition necessitates multiple observations within each seasonal cycle, which isn’t feasible with annual data devoid of seasonality; we resolved this by applying a time series linear model (TSLM) to capture the trend component and utilized augment() to extract residuals for subsequent analysis. By shifting from a decomposition method requiring seasonality to a trend-focused model, we circumvented the limitations imposed by the data’s temporal granularity.

Question 2 - US Mortality Rate: Visualization (30 points)

a) Please plot the US Mortality Rate series (mortality per 100,000).

Answer

Code

# Load required libraries
pacman::p_load("tsibble", "fable", "feasts",
    "tsibbledata", "fable.prophet", "tidyverse", "patchwork")

# Import and rename the data
motor_dat <- rio::import("https://byuistats.github.io/timeseries/data/mortality_us.xlsx") |>
    rename(
        mort = "mort_rate_per_100,000",
        index = year  # rename `year` to `index`
    ) |>
    mutate(
        mort = as.numeric(mort),
        deaths = as.numeric(deaths)
    ) |>
    as_tsibble(index = index)

# Task 1: Plot the US Mortality Rate series (mortality per 100,000)
autoplot(motor_dat, mort) +
    labs(x = "Year", y = "Mortality Rate per 100,000") +
    ggtitle("US Mortality Rate Over Time")

b) Use the exponential smoothing method to model the US Mortality Rate series. Use the smoothing parameter that R calculates by minimizing the SS1PE. Add it to the plot in 2a

Answer

Code

# Task 2: Exponential Smoothing with R's Optimized Smoothing Parameter
# Assuming R’s alpha (0.1429622) was calculated to minimize SS1PE
motor_dat1 <- motor_dat |>
    model(Optimized = ETS(mort ~
        trend("A", alpha = 0.1429622, beta = 0) +
        error("A") +
        season("N"),
        opt_crit = "amse", nmse = 1))

# Plot the optimized model
augment(motor_dat1) |>
    ggplot(aes(x = index, y = mort)) +
    geom_line() +
    geom_line(aes(y = .fitted, color = "Fitted - Optimized Alpha")) +
    labs(color = "") +
    ggtitle("Exponential Smoothing with Optimized Alpha")

Code

# Task 3: Exponential Smoothing with α = 0.2
motor_dat2 <- motor_dat |>
    model(FixedAlpha0_2 = ETS(mort ~
        trend("A", alpha = 0.2, beta = 0) +
        error("A") +
        season("N"),
        opt_crit = "amse", nmse = 1))

# Plot the α = 0.2 model alongside the previous
augment(motor_dat2) |>
    ggplot(aes(x = index, y = mort)) +
    geom_line() +
    geom_line(aes(y = .fitted, color = "Fitted - Alpha = 0.2")) +
    labs(color = "") +
    ggtitle("Exponential Smoothing with Alpha = 0.2")

Code

# Task 4: Exponential Smoothing with α = 1/n (where n is the number of observations)
alpha_value <- 1 / nrow(motor_dat)
motor_dat3 <- motor_dat |>
    model(FixedAlpha1_n = ETS(mort ~
        trend("A", alpha = alpha_value, beta = 0) +
        error("A") +
        season("N"),
        opt_crit = "amse", nmse = 1))

# Plot the α = 1/n model along with previous models
augment(motor_dat3) |>
    ggplot(aes(x = index, y = mort)) +
    geom_line() +
    geom_line(aes(y = .fitted, color = paste("Fitted - Alpha =", round(alpha_value, 4)))) +
    labs(color = "") +
    ggtitle(paste("Exponential Smoothing with Alpha =", round(alpha_value, 4)))

c) Please repeat the modeling above but instead choose a smoothing parameter value \(\alpha=0.2\). Add it to the plot in 2a

Answer

Code

# Task 3: Exponential Smoothing with α = 0.2
motor_dat2 <- motor_dat |>
    model(FixedAlpha0_2 = ETS(mort ~
        trend("A", alpha = 0.2, beta = 0) +
        error("A") +
        season("N"),
        opt_crit = "amse", nmse = 1))

# Plot the α = 0.2 model alongside the previous
augment(motor_dat2) |>
    ggplot(aes(x = index, y = mort)) +
    geom_line() +
    geom_line(aes(y = .fitted, color = "Fitted - Alpha = 0.2")) +
    labs(color = "") +
    ggtitle("Exponential Smoothing with Alpha = 0.2")

d) Please repeat the modeling above but instead choose a smoothing parameter value \(\alpha=\frac{1}{n}\).Add it to the plot in 2a

Answer

Code

# Task 4: Exponential Smoothing with α = 1/n (where n is the number of observations)
alpha_value <- 1 / nrow(motor_dat)
motor_dat3 <- motor_dat |>
    model(FixedAlpha1_n = ETS(mort ~
        trend("A", alpha = alpha_value, beta = 0) +
        error("A") +
        season("N"),
        opt_crit = "amse", nmse = 1))

# Plot the α = 1/n model along with previous models
augment(motor_dat3) |>
    ggplot(aes(x = index, y = mort)) +
    geom_line() +
    geom_line(aes(y = .fitted, color = paste("Fitted - Alpha =", round(alpha_value, 4)))) +
    labs(color = "") +
    ggtitle(paste("Exponential Smoothing with Alpha =", round(alpha_value, 4)))

d) Which of the smoothing parameter values you tried before work best for the series? Justify your answer.

Answer

The smoothing parameter that provided the best fit was the optimized α value (0.1429622), as it minimized the Sum of Squared One-Step Prediction Errors (SS1PE). In comparison, fixed α values like 0.2 and α= 1/n either over-smoothed or under-smoothed the series, making the optimized α most effective for accurate modeling of the mortality rate series.

Question 5 - US Mortality Rate: Excess Mortality (30 points)

The jump at the last two years of the US Mortality Rate series is clearly the effect that Covid-19 had on mortality across the US.

a) Please calculate the excess mortality rate during 2020 and 2021 using the smoothing parameter values you employed in the previous question. Your results should be displayed in a table that is professionally formatted.

Answer

Code

# Task 1: Calculate the Excess Mortality for 2020 and 2021

# Extract observed data for 2020 and 2021
observed_2020_2021 <- motor_dat |> filter(index >= 2020)

# Calculate fitted (expected) values for each model
fitted_optimized <- augment(motor_dat1) |> filter(index >= 2020) |> pull(.fitted)
fitted_alpha_0_2 <- augment(motor_dat2) |> filter(index >= 2020) |> pull(.fitted)
fitted_alpha_1_n <- augment(motor_dat3) |> filter(index >= 2020) |> pull(.fitted)

# Calculate excess mortality by comparing actual rates with fitted (expected) rates
excess_optimized <- observed_2020_2021$mort - fitted_optimized
excess_alpha_0_2 <- observed_2020_2021$mort - fitted_alpha_0_2
excess_alpha_1_n <- observed_2020_2021$mort - fitted_alpha_1_n

# Prepare a formatted table to display the results
excess_mortality_table <- tibble(
    Year = observed_2020_2021$index,
    Observed_Mortality = observed_2020_2021$mort,
    Excess_Optimized = excess_optimized,
    Excess_Alpha_0_2 = excess_alpha_0_2,
    Excess_Alpha_1_n = excess_alpha_1_n
)

# Display the formatted table for review
excess_mortality_table_formatted <- excess_mortality_table |>
    rename(
        "Year" = Year,
        "Observed Mortality Rate" = Observed_Mortality,
        "Excess Mortality (Optimized Alpha)" = Excess_Optimized,
        "Excess Mortality (Alpha = 0.2)" = Excess_Alpha_0_2,
        "Excess Mortality (Alpha = 1/n)" = Excess_Alpha_1_n
    )

# Print the formatted table
excess_mortality_table_formatted

# A tibble: 2 × 5
   Year `Observed Mortality Rate` Excess Mortality (Opt…¹ Excess Mortality (Al…²
  <dbl>                     <dbl>                   <dbl>                  <dbl>
1  2020                     1027                     189.                   182.
2  2021                     1044.                    179.                   162.
# ℹ abbreviated names: ¹`Excess Mortality (Optimized Alpha)`,
#   ²`Excess Mortality (Alpha = 0.2)`
# ℹ 1 more variable: `Excess Mortality (Alpha = 1/n)` <dbl>

b) What is the meaning of excess mortality as you calculated it? Please explain it to a general audience, as if you were being interviewed in a national news show.

Answer

Excess mortality refers to the number of deaths occurring above what would normally be expected in a given period based on historical trends. For example, if we typically expect around 800 deaths per 100,000 people each year, but we observe 1,000 per 100,000 during 2020 and 2021, the excess mortality would be 200 per 100,000. This difference often reflects the impact of unusual events or crises, such as the Covid-19 pandemic, which led to an increase in mortality rates beyond what we would normally see.

c) Which excess mortality rate estimate do you think is correct? Some commentators would suggest you are choosing the parameters to defend your political goals, not to conduct a scientific analysis. Please comment on the difficulty of choosing parameter values to estimate the death toll of Covid-19.

Answer

Estimating excess mortality is challenging because it requires selecting parameters that balance responsiveness to sudden changes (like a pandemic) with stability over time. In our analysis, we used three different parameter values: one optimized based on historical trends, one set at a moderate value (α = 0.2), and one based on a conservative approach (α = 1/n) that assumes slow change.

Each choice reflects different assumptions about how mortality rates are likely to change over time. An optimized parameter aims to minimize errors but may overly fit past data, while a fixed parameter like 0.2 provides moderate sensitivity. The most conservative option, 1/n, smooths the data more aggressively, underemphasizing sudden increases.

Choosing parameters isn’t about political bias but about finding a balance that reflects reality as closely as possible. It is, however, a difficult task: too much responsiveness may exaggerate changes, while too little may underestimate significant shifts.

Rubric

Criteria	Mastery (10)	Incomplete (0)
Question 1: Context and Measurement	The student thoroughly researches the data collection process, unit of analysis, and meaning of each observation for both the requested time series. Clear and comprehensive explanations are provided.	The student does not adequately research or provide information on the data collection process, unit of analysis, and meaning of each observation for the specified series.
	Mastery (5)	Incomplete (0)
Question 2a: Mortality Plot	The student accurately plots the US Mortality Rate series in R, ensuring clear labeling and a descriptive title for easy interpretation. The visualization effectively presents the data with distinguishable points or lines and appropriate formatting. Additionally, the R code is well-commented, providing clear explanations of each step and maintaining readability.	The student encounters challenges in plotting the US Mortality Rate series in R. The plot may lack essential labels or a descriptive title, making it difficult to interpret. Additionally, the visualization might be unclear or cluttered, and the R code may lack sufficient comments, hindering understanding of the process. Overall, improvement is needed in effectively plotting time series data in R.
	Mastery (5)	Incomplete (0)
Question 2b: Smoothing SS1PE	Students use R to compute exponential smoothing for modeling the US Mortality Rate series. Their R code is well-commented, providing clear explanations for each step of the process, ensuring transparency in the computational process.	Students may encounter challenges in implementing exponential smoothing in R, resulting in incomplete or ineffective computations. Their R code might lack sufficient comments, hindering clarity in understanding the computational process.
	Mastery (5)	Incomplete (0)
Question 2c: Smoothing a=0.2	Students employ R to repeat the modeling of the US Mortality Rate series using a specified smoothing parameter value of alpha=0.2. Their R code is well-commented, providing clear explanations for each step of the process, ensuring transparency in the computational process.	Students may encounter challenges in implementing exponential smoothing in R, resulting in incomplete or ineffective computations. Their R code might lack sufficient comments, hindering clarity in understanding the computational process.
	Mastery (5)	Incomplete (0)
Question 2d: Smoothing a=1/n	Students employ R to repeat the modeling of the US Mortality Rate series using a specified smoothing parameter value of alpha=1/n. Their R code is well-commented, providing clear explanations for each step of the process, ensuring transparency in the computational process.	Students may encounter challenges in implementing exponential smoothing in R, resulting in incomplete or ineffective computations. Their R code might lack sufficient comments, hindering clarity in understanding the computational process.
	Mastery (10)	Incomplete (0)
Question 2d: Evaluation of Parameter Choice	Students justify their choice of parameter in the context of the underlying factors affecting US Mortality Rates evident in the data. The students evidence their understanding of the implications of the values in the smoothing parameter. Students show they have done some background research into the data to answer the question.	Students fail to adequately justify their choice of parameter in relation to the underlying factors affecting US Mortality Rates evident in the data. They may lack evidence of understanding the implications of the values in the smoothing parameter or fail to demonstrate how these implications relate to the context of the data. Additionally, they may show limited evidence of background research into the data to support their justification, indicating a lack of depth in their analysis.
	Mastery (10)	Incomplete (0)
Question 3a: Excess MortalityTable	Students accurately calculate the excess mortality rate during 2020 and 2021 using the smoothing parameter values employed in the previous question. They present their results in table, clearly displaying the excess mortality rate for each year alongside the corresponding smoothing parameter values. The table is well-labeled and easy to interpret.	Students demonstrate inaccuracies in calculating the excess mortality rate during 2020 and 2021. Their presentation of the results in a table may lack clarity and professionalism, with issues such as unclear labeling, inconsistent formatting, or difficulty in interpreting the information provided. Additionally, they may overlook important details or fail to include all necessary information in the table, making it challenging for readers to understand the table.
	Mastery (10)	Incomplete (0)
Question 3b: Excess Mortality	Explanations effectively convey the meaning of excess mortality to a general audience, avoiding technical terms and providing a clear, accessible description. They define excess mortality as the number of deaths observed during a specific period compared to what would be expected based on historical data.	Responses may struggle to explain excess mortality clearly to a general audience, potentially using technical language or lacking coherence. They may fail to provide relatable examples or context, making it difficult for the audience to understand the concept and its significance.
	Mastery (10)	Incomplete (0)
Question 3c: Evaluation of assumptions used for inference	Responses address the challenge of selecting parameter values to make inference in time series. They provide a comprehensive analysis, considering factors like modeling assumptions, and methodological variations that influence parameter selection. Explanations highlight the need of transparent reporting to ensure robust and reliable estimates in professional discourse.	Below expectations, responses may lack depth or clarity in addressing the challenge of selecting parameter values for making inference in time series. They may overlook key factors influencing parameter selection, such as data quality or specific characteristics of the time series data. Additionally, they may not effectively consider the impact of modeling assumptions or methodological variations on parameter selection. Furthermore, they may fail to emphasize the importance of transparent reporting in ensuring the reliability and validity of estimates, potentially resulting in a lack of confidence in the conclusions drawn from the analysis.
Total Points	70