Firstly, we try to understand what time series data is. When data points are observed in accordance with the occurrence of time, we say we have a time series data. In today’s world data obtained from observations collected sequentially over time are very common. For example, every business has to observe weekly interest rates, daily stock prices, yearly sales, monthly inventory levels to be maintained etc.
A successful analysis of these can help professionals observe patterns and ensure the smooth functioning of the business. The most important part of time series analysis is forecasting or prediction of future values using the historical data. These predictions help in determining the future course of action and give an approximate idea about how the business will look a year from now.
Before we dive into the analysis of temporal data in R, let us understand the different components of time series data. These components are shown below in the figure:
Trend Component: By trend component, we mean that the general tendency of the data to increase or decrease during a long period of time.
Seasonal Component: The variations in the time series that arise due to the rhythmic forces which operate over a span of less than 12 months or a year.
Cyclical Component: The oscillatory movements in a time series that last for more than a year.
Random Component: Random or irregular variations or fluctuations which are not accounted for by trend, seasonality and cyclical components are defined as the random component. These are also called episodic fluctuations.
Before we begin, make sure the “forecast” package is installed on your devices. We can easily observe these components once we use the decompose() function in R: For example, let us consider the ausbeer dataset from the ‘fpp’ package in R. We subset the data and consider a time period of 1992-2006.
Plotting this time series:
> beer2 = window(ausbeer,start=1992,end=2006-.1)
We can clealry see a seasonal pattern in the plot. Now we try to decompose this series in order to extract all the components:
Before we proceed to the analysis of the time series data, we need to make a simplifying assumption in order to maintain the regularity of the time series. This assumption is the stationarity assumption. This simply means that the mean and variance of the time series are constant over time i.e they are invariant over time. Also, the covariance between two time periods depends on the gap between the two periods and not the time at which they are computed.
In order to verify the stationarity of a specific
The null hypothesis under this test is that the time series under study in not stationary and we accept Ho if the p-value>0.05. Carrying out this test in R:
The results obtained are as follows:
Hence, clearly, we reject the null hypothesis and say that the time series considered is stationary. In case it wasn’t stationary we could’ve simply used the
Now, once you have checked for the stationarity, we can go onto the next stage of forecasting future observations. There are many forecasting methods but in this blog, we will be discussing the following:
- The average method
- Naïve method
- Seasonal naïve method
Average Method: Under this method, the forecasted future values are just an average of the historical values. It can be easily executed in R using the following commands:
beerfit1 = meanf(beer2, h=5) #where in h=5 shows that the forecast is made for 5 years.
Where the blue line represents the forecasted values and the dark and light grey areas represent the 95% and 80% confidence intervals.
Naïve Method: As the name suggests, under this method the forecasted values are just equal to the last observation.
beerfit2 = naive(beer2, h=5)
Seasonal Naïve Method: We know that there is some seasonality in our
beerfit3 = snaive(beer2, h=5)
ARIMA: Now coming onto the most important and widely used methods of forecasting. The time series analysis is based on the assumption that the underline time series is stationary or can make stationary by differencing it 1 or more times. This is known as the ARIMA (p, d, q) model where d denotes the number of times a time series has to be differenced to make it stationary. In most applications d = 1, i.e, we take only the first differences of the time series.
Of course, if a time series is already stationary, then an ARIMA (p, d, q) becomes ARMA (p, q) model.
Now since lets break this abbreviation down in other parts:
Auto Regression or Auto Regressive models:
The above model is called an autoregressive model of order p, AR(p), since it involves regressing Y at time t on its lagged p periods into the past, the value of p
The other being Moving average models:
The above model is called a moving average model of order q, MA(q), since we express Xt as a weighted or moving average of the current and past white noise error terms.
Seasonal ARIMA: When it comes to highly seasonal data, like in our case the ausbeer time series data, we use an extension of ARIMA models called SARIMA models wherein the regular non-seasonal ARIMA includes the additional seasonal terms
where m = number of observations per year. We use
Now comes an important segment where the order of ARIMA and Seasonal ARIMA models need to be determined. This can be either done by using the ACF and PACF plots which help in determining the order of MA and AR models respectively or a straightforward way of using the auto.arima() function from the forecast package which will automatically give us the best model with the lowest AIC value.
But since I want this blog to be easily understood by newbies, we only consider the auto.
beerfit4 = auto.arima(beer2,stepwise = F,approximation = F,trace = T) beerfit4
This runs a number of iterations and checks the AIC value for different order values of ARIMA. Finally providing us the best model with the lowest AIC value.
fore <- forecast(beerfit4,h=5)
The last step in this analysis is to measure the forecasting accuracy of the models. This can be done by the accuracy() function in the forecast package. It gives us a list of different accuracy measures that can be used to determine the best model fit. #compare the forecasted value in 2007 with the actual value(beer3)
beer3 = window(ausbeer, start=2007)
From the above result we clearly note that the lowest RMSE value has been obtained for the Seasonal ARIMA model i.e 12.161. Hence, it is the best fit model.