Forecasting models are divided into two (overlapping) submodels. The first type of model is when we use past data (sales) to predict the future (demand). This is termed time series analysis which includes moving averages, weighted moving averages, single exponential smoothing, exponential smoothing with trend, trend analysis, linear regression, multiplicative decomposition, and additive decomposition.
The second model is for situations where one variable (demand) is a function of one or more other variables. This is termed (multiple) regression. There is overlap between the two models in that simple (one independent variable) linear regression can be performed with either of the two submodels.
In addition, this package contains a third model which enables us to create forecasts given a particular regression model.
The input to time series analysis is a series of numbers representing data over the most recent n time periods. While the major result is always the forecast for the next period, additional results presented vary according to the technique that is chosen. For every technique, the output includes the sequence of 'forecasts' that are made on past data and the forecast for the next period. When using trend analysis or seasonal decomposition, forecasts can be made for more than one period into the future.
The summary measures include the traditional error measures of bias (average error), mean squared error, standard error and mean absolute deviation (MAD).
NOTE: Different authors compute the standard error in slightly different ways. That is, the denominator in the square root is given by n-2 by some authors and n-1 by others. DS for Windows uses n-2 in the denominator (unless the Taylor text option is chosen) for simple cases and always displays the denominator in the output.
Suppose that we have data as given in the following table and wish to forecast the demand for the week of February 14 (and maybe February 21, February 28, ...).
Week | Sales |
January 3 January 10 January 17 January 24 January 31 February 7 |
100 120 110 105 110 120 |
The general framework for time series forecasting is given by indicating the number of past data points. The preceding example has past data for the last six periods (weeks) and we wish to forecast for the next period - period 7 (February 14).
Forecasting method. The drop-down method box contains the eight methods that were named at the top of this module. Of course, the results depend on the forecasting method chosen. A moving average is shown above.
Number of periods in the moving average, n. To use the moving average or weighted moving average the number of periods in the average must be given. This is some integer between 1 and the number of time periods of data. In the preceding example, 2 periods were chosen as seen in the extra data area.
Values for dependent (y) variable. These are the most important numbers as they represent the data. In most cases these will simply be the past sales or demands. The initial data can be seen in the screen in the demand column given by 100, 120,..., 120.
The solution screens are all similar but the exact output depends on the method chosen. For the smoothing techniques of moving averages (weighted or unweighted) and single exponential smoothing, there is one set of output, while for exponential smoothing with trend, there is a slightly different output display. For the regression there is another set of output. We begin with the moving averages as exemplified by the solution for the example shown in the screen below. The screen we show is the screen of details rather than the first screen that contains summary results. (We show the summary screen later in this section where it is more interesting).
Example 1 - Moving averages
We are using a two-week (n = 2) moving average. The output is as follows.
Forecasts. The first column of output data is the set of forecasts that would be made when using the technique. Notice that since this is a two-week moving average, the first forecast cannot be made until the third week. This value is the 110, which appears as the first entry in the "Forecast" column. The 110 is computed as (100+120)/2. The following three numbers - 115, 107.5 and 107.5 - represent the "forecasts" of the old data; the last number in the column, 115, is marked as the forecast for the next period - period number 7.
Next period forecast. As mentioned in the previous paragraph, the last forecast is below the data and is the forecast for the next period; it is marked as such on the screen. In the example it is 115.
Error. This column begins the error analysis. The difference between the forecast and the demand appears in this column. The first row to have an entry is the row in which the first forecast takes place. In this example, the first forecast occurs on Jan. 17 (row 3) and the forecast was for 110, which means that the error was 0. In the next week the forecast was for 115 but the demand was only 105, so the error was -10 (minus 10).
Absolute value of the error. This column contains the absolute value of the error and is used to compute the MAD, or total absolute deviation. Notice that the -10 in the error column has become a (plain, unsigned, positive) 10 in this column.
Error squared. This column contains the square of each error in order to compute the mean squared error and standard error. The 10 has been squared and is listed as 100. We caution that because we are squaring numbers it is quite possible that the numbers will become large here and that the display will become a little messy. This is especially true when printing.
Totals. The total for the demand and each of the three error columns appears in this row. This row will contain the answers to problems in books that rely on the total absolute deviation rather than the mean absolute deviation. Books using total instead of mean should caution students about unfair comparisons when there are different numbers of periods in the error computation.
Averages. The averages for each of the three errors appear in this row. The average error is termed the bias and many books neglect this very useful error measure. The average absolute error is termed the MAD and appears in nearly every book due to its computational ease. The average squared error is termed the mean squared error (MSE) and is typically associated with regression/least squares. These three names are indicated on the screen as Bias, MAD and MSE underneath their values. In this example, the bias is 1.25, the MAD is 6.25, and the MSE is 65.625.
Standard error. One more error measure is important. This is the standard error. Different books have different formulas for the standard error. That is, some use n-1 in the denominator, and some use n-2. This program uses n-2 (unless it was started with the Taylor option). Check your textbook before checking your answers. In this example, the standard error is 11.4564.
NOTE: The Normal distribution calculator can be used to find confidence intervals and address other probabilistic questions related to forecasting.
Example 2 - Weighted moving averages
If the weighted moving average method is chosen, two new columns will appear on the data table as shown in the following screen. The far right column is where the weights are to be placed. The weights may be fractions that sum to one as in this example (.6 and .4), but they do not have to sum to 1. If they do not they will be rescaled. For example, weights of 2 and 1 will be converted to 2/3 and 1/3. In this example, weights of .6 and .4 have been used to perform the forecasting For example, the forecast for week 7 is .6*120 + .4*110 = 116.
A (secondary) solution screen appears below. As before, the errors and the error measures are computed.
Example 3 - Exponential smoothing
Alpha for exponential smoothing. In order to use exponential smoothing, a value for the smoothing constant, alpha, must be entered. This number is between 0 and 1. At the top of the screen a scrollbar/text box combination will appear, enabling you to enter the value for the smoothing constant, alpha as shown in the following screen. The smoothing constant alpha is .5 in this example.
NOTE: If you select alpha = 0 the software will find the best value for alpha!
A starting forecast for exponential smoothing. In order to perform exponential smoothing, a starting forecast is necessary. When exponential smoothing is selected, the column label "forecast" will appear on the screen. Underneath will be a blank column. If you want, you may enter one number in this column as the forecast. If you enter no number, the starting forecast is taken as the starting demand.
The results screen has the same columns and appearance as the previous two methods as shown next.
One of the output displays (not shown in this manual) presents error measures as a function of alpha.
Example 4 - Exponential smoothing with trend.
Exponential smoothing with trend requires two smoothing constants. A smoothing constant, beta, for the trend is added to the model.
Beta, for exponential smoothing. In order to perform exponential smoothing with trend a smoothing constant must be given (in addition to alpha). If beta is 0, single exponential smoothing is performed. If beta is positive, exponential smoothing with trend is performed as shown.
Initial trend. In this model, the trend will be set to 0 unless it is initialized. It should be set for the same time period as the initial forecast.
The solution screen for this technique is different from the screens for the previously described techniques. The forecast computations appear in the column labeled "unadjusted forecast." These numbers are the same as in the previous example (because we used the same value for alpha). The trend forecasts appear in the column labeled "trend." The trend is the difference between the doubly smoothed forecasts from period to period (weighted by beta). The forecasts appear in the column marked "adjusted forecast."
Note: Unfortunately, there are several different exponential smoothing with trend methods. While they are all similar the results will vary. Therefore, it is possible that the results given by DS for Windows will not match the results of your text. This is very unfortunate but unavoidable. If you are using a Prentice-Hall text be certain that the software is registered (Help, User information)for that text in order to get the matching results.
Example 5 - Trend analysis
As mentioned previously, the solution screen for regression differs from the solution screens for the other forecasting techniques. A sample output for the same problem appears below.
Values for independent (x) variable. For time-series regression, the default values are set to 1 through n and cannot be changed. For paired regression, the actual values of the dependent variable need to be entered (see example 6).
The screen is set up in order that the computations made for finding the slope and the intercept will be apparent. In order to find these values it is necessary to compute the sum of the x2 and the sum of the xy. These two columns are presented. Depending on the book, either the sum of these columns or the average of these columns, as well as the first two columns, will be used to generate the regression line. The line is given by the slope and the intercept, which are listed at the bottom left of the screen. In this example, the line that fits the data best is given by
Y = 104.33 + 1.857*X
which is read as "Sales has a base of 104 with an increase of 1.857 per week."
If the data is sequential, the next period forecast is displayed. This is given by inserting one more than the number of periods into the regression line. In the example, we would insert 7 into the preceding equation, yielding 117.33, as shown on the screen at the bottom of the forecast column.
The standard error is computed and shown as with all other methods. In this example, it is 8.0696, which is better than any other method seen yet. Also notice that the mean squared error is displayed (43.41 in this example). The bias is, of course, 0, as linear regression is unbiased. We display the summary screen as follows.
Notice that the correlation coefficient and r-squared (r^2) coefficient are displayed as output. In the summary are the forecasts for the next several periods, since this was a trend analysis (time series regression).
Forecasting has the capability to show a graph for time series analysis. We display this next.
Example 6 - Non time series regression
Regression can be used on data that is causal. In the next screen we present the sales of umbrellas as a function of the number of inches of rain in the last four quarters of the year. The interpretation of the solution screen is that the line that best fits this data is given by sales = 49.93 + 27.43 * number of inches of rain.
Above the data is a textbox that enables us to place in a value for x to enter into the regression equation. The solution appears in the summary table (not displayed). In our example, if x=10 then the summary table indicates that y=324.
Example 7 - Deseasonalization
The following screen displays a problem with seasonal data. As can be seen in the screen there are 12 data points.
You must enter the number of seasons such as four quarters or twelve months or five or seven days. In addition you must enter the basis for smoothing. You may use either the centered moving average (which is common) or the average of all of the data.
The solution screen contains several columns.
Centered moving average. The data is smoothed using a moving average that is as long as the time period - 4 seasons. Because there are an even number of seasons, the weighted moving average consists of one-half of the end periods and all of the three middle periods. For example, for summer 1996, the weighted average is:
{.5(96) + 68 + 95 + 94 + .5(93)}/4 = 87.875
This average cannot be taken for the first n/2 periods and begins in period 3.
Demand to moving average ratio. For all of the data points that have moving averages computed, the ratio of the actual data to the moving average is computed. For example, for summer 1996 the ratio is 95/87.875 = 1.08108.
Seasonal factors. The seasonal factors are computed as the average of all of the ratios. For example, the summer seasonal factor is the average of 1.08108 (summer 1996) and .997167 (summer 1997), which yields 1.0391, as shown for summer 1996, summer 1997 and summer 1998.
Smoothed data. The original data is divided by its seasonal factor in order to take out the seasonal effects and compute the smoothed data.
Unadjusted forecast. After smoothing the data the software finds the trend line for the smoothed data. This column represents the 'forecasts' using this trend line. The trend line itself can be found on the summary results screen.
Adjusted forecast. The final column (before the error analysis) takes the forecasts from the trend line and then multiplies them by the appropriate seasonal factors. The errors are based on these adjusted forecasts versus the original data.
The summary table contains the forecasts for the coming periods.
We do not display the output here. The additive model uses differences rather than ratios to determine the seasonal factors that are additive rather than multiplicative.
As noted earlier, the forecasting module can perform multiple regression. There are two inputs to the data. The number of periods of data must be given; in addition the number of independent variables must be given. In this first example, we will extend the regression problem in Example 6. Note that for simple regression (one independent variable) there are two ways to solve the problem. In this example, we have used two independent variables and therefore multiple regression must be used. We have entered 4 for the number of periods and 2 for the number of independent variables.
We have filled in the data and the solution screen appears next. The input has four columns - one for the name of the time period; one for the dependent variable, umbrellas; one for the independent variable, rain; and one for the independent variable, time (1 through 4). The output display is somewhat different from before. The computations are not shown. The regression equation is not shown explicitly on this screen but can be found by looking at the beta coefficients below the table. That is, the equation is Umbrella sales = 98.2381 + 26.5238 * Rain -11.9381*time. This is shown explicitly on the summary screen that we do not display.
Projecting
The third model in forecasting allows us to take a regression equation and project it. Consider the example below.
When we set up the problem we indicated that there were 5 independent variables and that we wanted to create 3 forecasts. The regression line is given by the first column
(Y= 80 + 3x1 + 7x2 +21x3 - 6x4 + 2x5). The three columns contain the data for x1 through x5 for each of the three forecasts to be made. Row 1 contains a 1 since this is for the intercept. Finally, the bottom row contains the forecasts which are 942, 1018 and 1085 for the three scenarios.