1 Introduction

This assingment is going to look at several different options of expontential smoothing models to determine the best and final model for a time series with trend and seasonality.

2 Materials

2.1 Data Set

This assignment uses a time series data set from the United States Census Bureau. The time series is Annual Rate of New Single-Family Houses Sold. This data set only covers homes sold in the United States over the span of 744 months from January 1963 to December of 2024. However, in this assignment we will only look at the data from January 2010 and December 2024.

2.2 Data Cleaning

The original data set is going to be subsetted into a time series spanning from January 2010 to December 2024. However, since we are currently in December of 2024 the data set has not been full updated, so there are some observations that need to be removed since they are missing, after the missing variables are removed the value variable can be converted to a numeric variable.

url="https://ChloeWinters79.github.io/STA321/Data/USHomeSales.csv"
home.sales = read.csv(url, header = TRUE)
n.row = dim(home.sales)[1]
data.home.sales.0 = home.sales[(n.row-179):n.row, ]
data.home.sales = subset(data.home.sales.0, Value != "N/A")
data.home.sales$Value <- as.numeric(as.character(data.home.sales$Value))

2.3 Trend and Seasonality

This assignment does require that the time series we are working with show both trend and seasonality. To confirm that both of these are present in the time series, some graphs were printed to asses the trend and seasonality of the time series.

homesales.ts = ts(data.home.sales[,2], frequency = 12, start = c(2010, 1))
cls.homesales = decompose(homesales.ts)
par(mar=c(2,2,2,2))
plot(cls.homesales, xlab="")
Classical decomposition of additive time series

Classical decomposition of additive time series

The above graph shows the decomposition of additive time series, it shows a mostly positive increasing trend, however there does appear to be a decrease after 2020. Considering the context that 2020 was the start of the COVID-19 pandemic shut down so seeing a dip in the trend is not shocking considering the context. Additionally, we can see a seasonal pattern present in the graph, which means the time series meets the trend and seasonal requirements for this assignment.

3 Methodolgy & Analysis

For this project, we want to determine the best exponential smoothing model to use for this data set. To determine the best smoothing model we are going to compare the accuracy of several different models, SES, Holt, Holt-Winters and some additive, exponential and damped variations of Holt and Holt-Winters.

test.homes = data.home.sales$Value[166:177]
train.homes = data.home.sales$Value[1:165]
home=ts(data.home.sales$Value[1:168], start=100, frequency = 12)
fit1 = ses(home, h=12)
fit2 = holt(home, initial="optimal", h=12)             ## optimal alpha and beta
fit3 = holt(home,damped=TRUE, h=12 )                   ## additive damping
fit4 = holt(home,exponential=TRUE, damped=TRUE, h =12) ## multiplicative damp
fit5 = hw(home,h=12, seasonal="additive")              ## default h = 10
fit6 = hw(home,h=12, seasonal="multiplicative")
fit7 = hw(home,h=12, seasonal="additive",damped=TRUE)
fit8 = hw(home,h=12, seasonal="multiplicative",damped=TRUE)
accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
                             accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
                            "HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "The accuracy measures of various exponential smoothing models 
      based on the training data")
The accuracy measures of various exponential smoothing models based on the training data
ME RMSE MAE MPE MAPE MASE ACF1
SES 2.2136 48.4549 35.2626 0.0553 6.2339 0.4093 0.0004
Holt Linear -0.1598 48.4565 35.3792 -0.4352 6.2890 0.4106 0.0035
Holt Add. Damped 2.2835 48.4843 35.4168 0.0809 6.2728 0.4111 0.0011
Holt Exp. Damped 2.6742 48.4464 35.3365 0.2005 6.2397 0.4101 -0.0002
HW Add. -0.5234 48.3285 34.9578 -0.4813 6.1911 0.4057 0.0029
HW Exp. 1.1751 53.2488 37.2653 -0.2176 6.6320 0.4325 0.2248
HW Add. Damp 1.4851 48.3832 35.1746 -0.1231 6.2356 0.4082 0.0022
HW Exp. Damp 1.0625 48.5948 35.5074 -0.1541 6.2970 0.4121 0.0241

Looking at the results from the table above, it seems like the Holt-Winters additive model appears to be the most appropriate exponential smoothing model. The Holt-Winters beats out the other models in every situation except on where it gets beat out by Holt Exponential Dampened.

In addition to the accuracy table above, it is also beneficial to see a visual representation of the different exponential smoothing models options and compare them to the original serial plot.

par(mfrow=c(2,1), mar=c(3,4,3,1))
###### plot the original data
pred.id = 166:177
plot(1:165, train.homes, lwd=2,type="o", ylab="Home Sales", xlab="", 
     xlim=c(1,177), ylim=c(200, 1200), cex=0.3,
     main="Non-Seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="red")
lines(pred.id, fit2$mean, col="blue")
lines(pred.id, fit3$mean, col="purple")
lines(pred.id, fit4$mean, col="navy")
##
points(pred.id, fit1$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit2$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit3$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit4$mean, pch=21, col="navy", cex = 0.5)
#points(fit0, col="black", pch=1)
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"), 
   cex = 0.7, bty="n")
###########
plot(1:165, train.homes, lwd=2,type="o", ylab="Home Sales", xlab="", 
     xlim=c(1,177), ylim=c(200, 1200), cex=0.3,
     main="Holt-Winterd Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"), 
   cex = 0.7, bty="n")
Case study: Comparing various exponential smoothing models.

Case study: Comparing various exponential smoothing models.

Looking at the graphs above the first depiction shows the predictive lines to almost go fully horizontal, or mainly horizontal with minimal slope. However, the second graph does show predictive lines with more of a shape and potential variation that is consistent with the information seen in the existing serial plot. The Holt-Winters Additive line is within the second graph, further cementing it as the most appropriate model.

This assignment is also using a training data set, which will be used to identify the best model, with the assistance of the testing data set. In order to use the model for real-forecast, the model needs to be refit using the entire data to update the final working models smoothing parameters.

acc.fun = function(test.data, mod.obj){
  PE=100*(test.data-mod.obj$mean)/mod.obj$mean
  MAPE = mean(abs(PE))
  ###
  E=test.data-mod.obj$mean
  MSE=mean(E^2)
  ###
  accuracy.metric=c(MSE=MSE, MAPE=MAPE)
  accuracy.metric
}
pred.accuracy = rbind(SES =acc.fun(test.data=test.homes, mod.obj=fit1),
                      Holt.Add =acc.fun(test.data=test.homes, mod.obj=fit2),
                      Holt.Add.Damp =acc.fun(test.data=test.homes, mod.obj=fit3),
                      Holt.Exp =acc.fun(test.data=test.homes, mod.obj=fit4),
                      HW.Add =acc.fun(test.data=test.homes, mod.obj=fit5),
                      HW.Exp =acc.fun(test.data=test.homes, mod.obj=fit6),
                      HW.Add.Damp =acc.fun(test.data=test.homes, mod.obj=fit7),
                      HW.Exp.Damp =acc.fun(test.data=test.homes, mod.obj=fit8))
kable(pred.accuracy, caption="The accuracy measures of various exponential smoothing models 
      based on the testing data")
The accuracy measures of various exponential smoothing models based on the testing data
MSE MAPE
SES 2510.071 6.258430
Holt.Add 1554.106 4.657823
Holt.Add.Damp 2512.571 6.262516
Holt.Exp 2511.084 6.260095
HW.Add 1737.009 5.053927
HW.Exp 2306.645 6.038416
HW.Add.Damp 3082.966 7.147686
HW.Exp.Damp 3664.936 8.133387

Looking at the above accuracy table, we actually see that the Holt Additive is the best of the eight smoothing models, with Holt-Winters Additive being a close second. This does come as a surprise considering the Holt-Winters Additive model has been performing better so far. Since the Holt-Winters Additive model was more often the preferred model, over Holt Additive, and since Holt-Winters is a close second here, we are going to move forward using this model. However, it is still important to acknowledge that the above accuracy table does not identify this as the number one model choice.

4 Results & Conclusions

In the previous analysis, the model was trained using the training data set in order to identify the best model for both the testing and training data sets. When using real forecast the models need to be refitted at the end of the analysis using both the training and testing data sets, combined to create the whole data set. When using the training and testing sets together the model can be refit and smoothing parameters in the final model can be updated.

homes=ts(data.home.sales$Value[1:177], start=200, frequency = 12)
final.model = hw(homes,h=12, seasonal="additive") 
smoothing.parameter = final.model$model$par[1:3]
kable(smoothing.parameter, caption="Estimated values of the smoothing parameters in
      Holt-Winters linear trend with additive seasonality")
Estimated values of the smoothing parameters in Holt-Winters linear trend with additive seasonality
x
alpha 0.8013460
beta 0.0001006
gamma 0.0001000

The estimates values for the smoothing parameter are the three updated values, alpha, beta, and gamma that are shown above. The values apply specifically to the Holt-Winters method for linear trends with additive seasonality. Taking these values into account, the real-forecasting is complete since the model was refit with updating smoothing parameters for the selected final model.

5 General Discussions

While the final smoothing model was selected and our analysis was conducted there are some drawbacks of this assignment that need to be addressed. To start, the data set itself is not perfect. Due to the COVID-19 pandemic and shut down, there is a very large spike and then subsequent drop in the new home sales. This large jump and fall is not consistent with the earlier trend and begs the question if that is where the trend would have went if the pandemic had not happened. Additionally, with this trend being so recent, it does make one question if it is having a significant impact of the models and their smoothing, and if it is skewing the results of our analysis.

Additionally, is was not expected that the when looking at the accuracy measures based just on the testing data, that Holt-Winters Additive was not the best model, and it was actually second to Holt Additive. Once the trends start to get back to normal post COVID-19 and there is more post COVID data to look at it would be beneficial to go back and rerun the analysis and see if Holt-Winters Additive still makes the most sense as the final model, or if Holt Additive seems to be the better model. However, that is analysis that we will have to wait a few several years to do.

---
title: "Exponential Smoothing of a Time Series with Trend and Seasonality for New Home Sales"
author: "Chloé Winters"
date: "2024-12-11"
output:
  html_document:
    toc: yes
    toc_float: yes
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: yes
    number_sections: yes
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    theme: lumen
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
  word_document:
    toc: yes
    toc_depth: '4'
---

```{=html}

<style type="text/css">

/* Cascading Style Sheets (CSS) is a stylesheet language used to describe the presentation of a document written in HTML or XML. it is a simple mechanism for adding style (e.g., fonts, colors, spacing) to Web documents. */

h1.title {  /* Title - font specifications of the report title */
  font-size: 24px;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}
h4.author { /* Header 4 - font specifications for authors  */
  font-size: 20px;
  font-family: system-ui;
  color: DarkRed;
  text-align: center;
}
h4.date { /* Header 4 - font specifications for the date  */
  font-size: 18px;
  font-family: system-ui;
  color: DarkBlue;
  text-align: center;
}
h1 { /* Header 1 - font specifications for level 1 section title  */
    font-size: 22px;
    font-family: system-ui;
    color: navy;
    text-align: left;
}
h2 { /* Header 2 - font specifications for level 2 section title */
    font-size: 20px;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - font specifications of level 3 section title  */
    font-size: 18px;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - font specifications of level 4 section title  */
    font-size: 18px;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

</style>
```
```{r setup, include=FALSE}

#
# specifications of outputs of code in code chunks
knitr::opts_chunk$set(echo = TRUE,      
                      warning = FALSE,   
                      message = FALSE,  
                      results  = TRUE     
                      )   

library(knitr)
library(pander)
library(mlbench)
library(MASS)
library(forecast)
library(ISwR)
```


# Introduction

This assingment is going to look at several different options of expontential smoothing models to determine the best and final model for a time series with trend and seasonality. 

# Materials

## Data Set

This assignment uses a time series data set from the United States Census Bureau. The time series is Annual Rate of New Single-Family Houses Sold. This data set only covers homes sold in the United States over the span of 744 months from January 1963 to December of 2024. However, in this assignment we will only look at the data from January 2010 and December 2024.


## Data Cleaning

The original data set is going to be subsetted into a time series spanning from January 2010 to December 2024. However, since we are currently in December of 2024 the data set has not been full updated, so there are some observations that need to be removed since they are missing, after the missing variables are removed the value variable can be converted to a numeric variable. 

```{r}
url="https://ChloeWinters79.github.io/STA321/Data/USHomeSales.csv"
home.sales = read.csv(url, header = TRUE)
n.row = dim(home.sales)[1]
data.home.sales.0 = home.sales[(n.row-179):n.row, ]
data.home.sales = subset(data.home.sales.0, Value != "N/A")
data.home.sales$Value <- as.numeric(as.character(data.home.sales$Value))
```

## Trend and Seasonality

This assignment does require that the time series we are working with show both trend and seasonality. To confirm that both of these are present in the time series, some graphs were printed to asses the trend and seasonality of the time series. 

```{r fig.align='center', fig.cap= "Classical decomposition of additive time series", fig.width=6, fig.height=4}
homesales.ts = ts(data.home.sales[,2], frequency = 12, start = c(2010, 1))
cls.homesales = decompose(homesales.ts)
par(mar=c(2,2,2,2))
plot(cls.homesales, xlab="")
```

The above graph shows the decomposition of additive time series, it shows a mostly positive increasing trend, however there does appear to be a decrease after 2020. Considering the context that 2020 was the start of the COVID-19 pandemic shut down so seeing a dip in the trend is not shocking considering the context. Additionally, we can see a seasonal pattern present in the graph, which means the time series meets the trend and seasonal requirements for this assignment. 

# Methodolgy & Analysis 

For this project, we want to determine the best exponential smoothing model to use for this data set. To determine the best smoothing model we are going to compare the accuracy of several different models, SES, Holt, Holt-Winters and some additive, exponential and damped variations of Holt and Holt-Winters.

```{r fig.align='center',fig.width=6, fig.height=4.5, fig.cap="Case study: Comparing various expoential smoothing models."}
test.homes = data.home.sales$Value[166:177]
train.homes = data.home.sales$Value[1:165]
home=ts(data.home.sales$Value[1:168], start=100, frequency = 12)
fit1 = ses(home, h=12)
fit2 = holt(home, initial="optimal", h=12)             ## optimal alpha and beta
fit3 = holt(home,damped=TRUE, h=12 )                   ## additive damping
fit4 = holt(home,exponential=TRUE, damped=TRUE, h =12) ## multiplicative damp
fit5 = hw(home,h=12, seasonal="additive")              ## default h = 10
fit6 = hw(home,h=12, seasonal="multiplicative")
fit7 = hw(home,h=12, seasonal="additive",damped=TRUE)
fit8 = hw(home,h=12, seasonal="multiplicative",damped=TRUE)
```

```{r}
accuracy.table = round(rbind(accuracy(fit1), accuracy(fit2), accuracy(fit3), accuracy(fit4),
                             accuracy(fit5), accuracy(fit6), accuracy(fit7), accuracy(fit8)),4)
row.names(accuracy.table)=c("SES","Holt Linear","Holt Add. Damped", "Holt Exp. Damped",
                            "HW Add.","HW Exp.","HW Add. Damp", "HW Exp. Damp")
kable(accuracy.table, caption = "The accuracy measures of various exponential smoothing models 
      based on the training data")
```

Looking at the results from the table above, it seems like the Holt-Winters additive model appears to be the most appropriate exponential smoothing model. The Holt-Winters beats out the other models in every situation except on where it gets beat out by Holt Exponential Dampened. 

In addition to the accuracy table above, it is also beneficial to see a visual representation of the different exponential smoothing models options and compare them to the original serial plot. 

```{r fig.align='center',fig.width=6, fig.height=6.5, fig.cap="Case study: Comparing various exponential smoothing models."}
par(mfrow=c(2,1), mar=c(3,4,3,1))
###### plot the original data
pred.id = 166:177
plot(1:165, train.homes, lwd=2,type="o", ylab="Home Sales", xlab="", 
     xlim=c(1,177), ylim=c(200, 1200), cex=0.3,
     main="Non-Seasonal Smoothing Models")
lines(pred.id, fit1$mean, col="red")
lines(pred.id, fit2$mean, col="blue")
lines(pred.id, fit3$mean, col="purple")
lines(pred.id, fit4$mean, col="navy")
##
points(pred.id, fit1$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit2$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit3$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit4$mean, pch=21, col="navy", cex = 0.5)
#points(fit0, col="black", pch=1)
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("SES","Holt Linear","Holt Linear Damped", "Holt Multiplicative Damped"), 
   cex = 0.7, bty="n")
###########
plot(1:165, train.homes, lwd=2,type="o", ylab="Home Sales", xlab="", 
     xlim=c(1,177), ylim=c(200, 1200), cex=0.3,
     main="Holt-Winterd Trend and Seasonal Smoothing Models")
lines(pred.id, fit5$mean, col="red")
lines(pred.id, fit6$mean, col="blue")
lines(pred.id, fit7$mean, col="purple")
lines(pred.id, fit8$mean, col="navy")
##
points(pred.id, fit5$mean, pch=16, col="red", cex = 0.5)
points(pred.id, fit6$mean, pch=17, col="blue", cex = 0.5)
points(pred.id, fit7$mean, pch=19, col="purple", cex = 0.5)
points(pred.id, fit8$mean, pch=21, col="navy", cex = 0.5)
###
legend("bottomright", lty=1, col=c("red","blue","purple", "navy"),pch=c(16,17,19,21),
   c("HW Additive","HW Multiplicative","HW Additive Damped", "HW Multiplicative Damped"), 
   cex = 0.7, bty="n")
```

Looking at the graphs above the first depiction shows the predictive lines to almost go fully horizontal, or mainly horizontal with minimal slope. However, the second graph does show predictive lines with more of a shape and potential variation that is consistent with the information seen in the existing serial plot. The Holt-Winters Additive line is within the second graph, further cementing it as the most appropriate model. 

This assignment is also using a training data set, which will be used to identify the best model, with the assistance of the testing data set. In order to use the model for real-forecast, the model needs to be refit using the entire data to update the final working models smoothing parameters. 


```{r}
acc.fun = function(test.data, mod.obj){
  PE=100*(test.data-mod.obj$mean)/mod.obj$mean
  MAPE = mean(abs(PE))
  ###
  E=test.data-mod.obj$mean
  MSE=mean(E^2)
  ###
  accuracy.metric=c(MSE=MSE, MAPE=MAPE)
  accuracy.metric
}
```

```{r}
pred.accuracy = rbind(SES =acc.fun(test.data=test.homes, mod.obj=fit1),
                      Holt.Add =acc.fun(test.data=test.homes, mod.obj=fit2),
                      Holt.Add.Damp =acc.fun(test.data=test.homes, mod.obj=fit3),
                      Holt.Exp =acc.fun(test.data=test.homes, mod.obj=fit4),
                      HW.Add =acc.fun(test.data=test.homes, mod.obj=fit5),
                      HW.Exp =acc.fun(test.data=test.homes, mod.obj=fit6),
                      HW.Add.Damp =acc.fun(test.data=test.homes, mod.obj=fit7),
                      HW.Exp.Damp =acc.fun(test.data=test.homes, mod.obj=fit8))
kable(pred.accuracy, caption="The accuracy measures of various exponential smoothing models 
      based on the testing data")
```

Looking at the above accuracy table, we actually see that the Holt Additive is the best of the eight smoothing models, with Holt-Winters Additive being a close second. This does come as a surprise considering the Holt-Winters Additive model has been performing better so far. Since the Holt-Winters Additive model was more often the preferred model, over Holt Additive, and since Holt-Winters is a close second here, we are going to move forward using this model. However, it is still important to acknowledge that the above accuracy table does not identify this as the number one model choice. 

# Results & Conclusions 

In the previous analysis, the model was trained using the training data set in order to identify the best model for both the testing and training data sets. When using real forecast the models need to be refitted at the end of the analysis using both the training and testing data sets, combined to create the whole data set. When using the training and testing sets together the model can be refit and smoothing parameters in the final model can be updated.

```{r}
homes=ts(data.home.sales$Value[1:177], start=200, frequency = 12)
final.model = hw(homes,h=12, seasonal="additive") 
smoothing.parameter = final.model$model$par[1:3]
kable(smoothing.parameter, caption="Estimated values of the smoothing parameters in
      Holt-Winters linear trend with additive seasonality")
```

The estimates values for the smoothing parameter are the three updated values, alpha, beta, and gamma that are shown above. The values apply specifically to the Holt-Winters method for linear trends with additive seasonality. Taking these values into account, the real-forecasting is complete since the model was refit with updating smoothing parameters for the selected final model. 


# General Discussions

While the final smoothing model was selected and our analysis was conducted there are some drawbacks of this assignment that need to be addressed. To start, the data set itself is not perfect. Due to the COVID-19 pandemic and shut down, there is a very large spike and then subsequent drop in the new home sales. This large jump and fall is not consistent with the earlier trend and begs the question if that is where the trend would have went if the pandemic had not happened. Additionally, with this trend being so recent, it does make one question if it is having a significant impact of the models and their smoothing, and if it is skewing the results of our analysis. 

Additionally, is was not expected that the when looking at the accuracy measures based just on the testing data, that Holt-Winters Additive was not the best model, and it was actually second to Holt Additive. Once the trends start to get back to normal post COVID-19 and there is more post COVID data to look at it would be beneficial to go back and rerun the analysis and see if Holt-Winters Additive still makes the most sense as the final model, or if Holt Additive seems to be the better model. However, that is analysis that we will have to wait a few several years to do. 
