Data analysis: Prophet factor decomposition tool practice

Original link: https://huhuhang.com/post/machine-learning/dm-tutorials/prophet-factor-decomposition-tool-practice

introduce

In the time series analysis experiment, we learned how to use ARMA and ARIMA to model stationary and non-stationary time series, and fully understood the stationary test and pure randomness test involved in the time series analysis process. In this article, we will learn about another commonly used time series modeling method, and at the same time learn the use of Prophet, Facebook’s open source time series modeling tool.

This article requires special authorization , and the copyright of the content belongs to the author. Without authorization, reprinting is prohibited.

knowledge points

  • seasonal trend series

  • factorization

  • Prophet tool introduction

  • Prophet Tools Quick Start

  • trend change point

  • multiplicative model

seasonal trend series

Earlier we learned about modeling stationary and non-stationary time series using ARMA and ARIMA. In practice, the sequences we deal with exhibit some degree of randomness. Of course, this randomness doesn’t get to the point where it’s not worth analyzing. In the forecast analysis of time series, there is also a very common series, which show very significant seasonal characteristics.

For example, a dataset of surface air temperatures over the years (2000-2016) for the City of London is given below. You will find that the change of temperature will show obvious regularity with time.

 import pandas as pd  
from matplotlib import pyplot as plt  
% matplotlib inline  
  
london = pd . read_csv (  
    "http://labfile.oss.aliyuncs.com/courses/1176/surface-temperature-observations.csv" , index_col = 0 )  
london . index = pd . to_datetime ( london . index ) # convert time index  
print ( "Amount of data: " , len ( london ))  
london.plot ( figsize = ( 10 , 5 ) ) # Plotting  

Introduction to factorization

Then, for such sequence data, we often use a method called factor decomposition to analyze. The factorization method was first proposed by the British statistician WMPersons in 1919. To put it simply, we believe that although the fluctuations of time series are various, they can all be summarized as being affected by the following four types of factors:

  • Long-term trend Trend: The influence of the $T_t$ factor will cause the sequence star to show an obvious long-term trend (increasing, decreasing, etc.).

  • Cyclic fluctuation Circle: The $C_t$ factor will cause the sequence to show repeated cyclical fluctuations from low to high and then from high to low.

  • Seasonal change Season: The $S_t$ factor will cause the sequence star to show stable periodic fluctuations related to seasonal changes.

  • Stochastic volatility immediate $I_t$: In addition to long-term trends, cyclic fluctuations and seasonal changes, the sequence is also affected by various other factors, and these effects cause the sequence to show certain random fluctuations.

Therefore, when performing time series analysis, we assume that the sequence will be affected by all or part of these four factors, showing different fluctuation characteristics. Two models are thus derived: the additive model and the multiplicative model.

The additive model, as the name implies, is to add and combine the above four types of factors:


$$
x_t = T_t + C_t + S_t + I_t$$

The multiplicative model is also the multiplication and combination of the above four types of factors:


$$
x_t = T_t * C_t * S_t * I_t$$

The idea of ​​factorization has two main advantages:

  1. Overcome the interference of other factors and simply measure the impact of a certain deterministic factor (season, trend, etc.) on the sequence.

  2. According to the deterministic characteristics of the sequence, the interaction relationship between various deterministic factors and their comprehensive influence on the sequence are deduced.

Prophet tool introduction

Prophet is a time series analysis tool open sourced by Facebook in 2017, which provides interfaces supporting R and Python languages. The theoretical basis developed by Prophet relies on the additive model in the factorization method, so it is very suitable for the analysis of seasonal time series data. At the same time, Prophet has a built-in missing value and outlier processing mechanism, which can reduce the burden of sequence preprocessing to a certain extent.

Prophet was originally an internal tool of Facebook’s core data analysis team. After being open sourced, it did help more people complete deterministic analysis of seasonal time series. Although the additive model sounds simple, its derivation and implementation are very complicated. Prophet lowers the threshold for use and analysis to a certain extent.

Additive model

Next, we will use the London city surface air temperature dataset provided above to learn how to use the Prophet tool.

Prophet stipulates that the incoming data must follow a certain data structure, and it can only contain two columns of data, namely: time ds and value y . So, we are now processing the london dataset.

 """This cell can only be executed once, repeated execution needs to restart the kernel  
"""  
london = london . reset_index () # reset index  
london . columns = [ 'ds' , 'y' ]  
london . head ()  

The upper cell can only be executed once, because the operation of resetting the index reset_index() will have a cumulative effect and an error will be reported if repeated operations are performed.

Next, we need to check the format of ds . According to Prophet regulations, the recommended format is YYYY-MM-DD or YYYY-MM-DD HH:MM:SS . Also, y column must be numeric. Since london meets the requirements, there is no need to make any changes.

Prophet provides an API structure similar to scikit-learn. After we use it to create an example of an additive model, we can call fit and predict methods to complete the training and prediction.

 from fbprophet import Prophet  
import warnings  
  
warnings . filterwarnings ( 'ignore' ) # ignore warnings  
m = Prophet () # create an additive model  
m .fit ( london ) # training  

Next, we create a prediction sequence, here we can use make_future_dataframe method directly, or we can create a DataFrame of the corresponding format through Pandas. make_future_dataframe method supports all freq supported by pd.date_range method. The frequency of the original data is days freq='D' , so here create the sequence time index to be predicted for the next 365 days.

 future = m . make_future_dataframe ( periods = 365 , freq = 'D' ) # generate forecast sequence  
future . tail () # Display the last 5 data of the sequence  

Prediction is of course calling predict method. Note that predict returns a DataFrame with 19 columns. The DataFrame contains seasonality indicator data and corresponding confidence intervals. Here, we only take out 'yhat', 'yhat_lower', 'yhat_upper' 3 columns of data, which represent the predicted value $\hat y$, and the corresponding confidence interval.

 forecast = m . predict ( future ) # forecast  
forecast [[ 'ds' , 'yhat' , 'yhat_lower' , 'yhat_upper' ]] . head () # Keep only the predicted values ​​and corresponding confidence intervals  

Finally, we can plot the predicted data.

 fig = m .plot ( forecast ) # plot  

As shown above, black dots represent true values, blue lines represent predicted values, and blue intervals represent confidence intervals. Since make_future_dataframe will also include the time of the original sequence when generating the prediction sequence, the original sequence + prediction sequence is actually passed in during prediction. So in the picture above you can see that the confidence interval extends from 2000 to 2018.

In fact, we can also generate forecast sequences and plots by ourselves without using the corresponding methods provided by Prophet. This can ensure that the predicted value and confidence interval are drawn on the premise that the original sequence remains unchanged.

 """This cell can only be executed once, repeated execution needs to restart the kernel  
"""  
  
future_ = future [ len ( london ):] # Get the time index of the forecast sequence that does not contain the original sequence  
forecast_ = m . predict (  
    future_ )[[ 'ds' , 'yhat' , 'yhat_lower' , 'yhat_upper' ]] # prediction  
  
london . columns = [ 'ds' , 'yhat' ] # Modify the original data column name  
forecast_ = pd . concat ([ london , forecast_ ], sort = False ) # merge original data and forecast data DataFrame  
forecast_ = forecast_ . reset_index ( drop = True ) # Reset the index number of the merged DataFrame  
  
fig , axes = plt . subplots ( figsize = ( 12 , 7 ))  
axes . plot ( forecast_ . index , forecast_ [ 'yhat' ]) # Draw the original data graph  
axes . fill_between ( forecast_ . index , forecast_ [ 'yhat_lower' ],  
                  forecast_ [ 'yhat_upper' ], color = 'orange' , alpha =. 3 ) # draw forecast series and confidence intervals  

It can be seen that the above guarantees that the original data image remains unchanged, and the predicted value and the confidence interval for it are directly added at the end. It can be seen from the figure that the change trend of the predicted data is similar to that of the original data, continuing the seasonal variation characteristics of the sequence.

trend change point

In the above section, we used the real data set to learn Prophet to quickly build an additive model. In fact, Prophet also provides some other useful methods to help us use in time series forecasting, one of which is to draw trend change points.

Trend change point, that is, the sudden change caused by the time series, for example, from a certain moment, the overall trend of the series suddenly changes from decline to growth. Marking trend change points is a very good way to observe the sequence cycle and change trend. Prophet can automatically detect and mark these change points, we need to use add_changepoints_to_plot method.

 from fbprophet.plot import add_changepoints_to_plot  
  
fig_base = m .plot ( forecast ) # draw forecast results  
fig = add_changepoints_to_plot ( fig_base . gca (), m , forecast ) # Add change points to the forecast result plot  

Please note that the change point here is not a mark when the sequence value changes from increase to decrease or from decrease to increase, but the overall trend of sequence change. As shown in the figure above, the horizontal red line marks the overall change trend of the sequence, and the change point of this trend line is marked as the trend change point.

multiplicative model

In the above example of temperature changes in London, an additive model works well because the forecast only needs to add the seasonal trend to the series, similar to replicating the changing trend. However, in some series, the additive model cannot accurately reflect the changing trend, and the seasonal trend may increase exponentially. At this point, the multiplicative model needs to be used.

For example, the following provides a series of changes in passenger numbers for an airline between 1949 and 1960.

 air = pd . read_csv (  
    'http://labfile.oss.aliyuncs.com/courses/1176/example_air_passengers.csv' )  
air.plot ( figsize = ( 10 , 5 ) ) # Plotting  

It can be clearly seen that the number of passengers is reflected in the seasonality, showing a trend of increasing year by year. At this point, if we apply the additive model prediction.

 m_additive = Prophet () # additive model  
m_additive . fit ( air ) # training  
future = m_additive . make_future_dataframe ( 50 , freq = 'MS' ) # generate forecast time series  
forecast = m_additive . predict ( future ) # forecast  
fig = m_additive .plot ( forecast ) # drawing  

As shown in the figure above, the additive model is stationary on the seasonal cycle variation. Although there is an overall growth trend, the series value forecast in the subsequent cycle obviously cannot accurately reflect the changes in the original data. At this point, we introduce the multiplicative model.

 m_multiplicative = Prophet ( seasonality_mode = 'multiplicative' ) # multiplicative model  
m_multiplicative.fit ( air ) # training _  
future = m_multiplicative . make_future_dataframe ( 50 , freq = 'MS' ) # generate forecast time series  
forecast = m_multiplicative . predict ( future ) # forecast  
fig = m_multiplicative .plot ( forecast ) # drawing  

As shown in the figure above, the multiplicative model can reflect the characteristics that the number of passengers multiplies with the period. Compared with the additive model, the prediction result will definitely be more accurate.

summary

In this article, we have been exposed to series data that exhibits seasonal characteristics, and learned to use the Prophet tool to build additive and multiplicative models. Simply put, if the series does not vary significantly from season to season, then an additive model can be used. Conversely, if the series increases over time, with an increasing or decreasing trend within each seasonal period, a multiplicative model is preferred.

This article requires special authorization , and the copyright of the content belongs to the author. Without authorization, reprinting is prohibited.

series of articles

This article is transferred from: https://huhuhang.com/post/machine-learning/dm-tutorials/prophet-factor-decomposition-tool-practice
This site is only for collection, and the copyright belongs to the original author.