FORECAST_LINEAR: Using a Linear Regression Equation

How to:

The linear regression equation estimates values by assuming that the dependent variable (the new calculated values) and the independent variable (the sort field values) are related by a function that represents a straight line:

y = mx + b

where:

y
Is the dependent variable.
x
Is the independent variable.
m
Is the slope of the line.
b
Is the y-intercept.

FORECAST_LINEAR uses a technique called Ordinary Least Squares to calculate values for m and b that minimize the sum of the squared differences between the data and the resulting line.

The following formulas show how m and b are calculated.

where:

n
Is the number of data points.
y
Is the data values (dependent variables).
x
Is the sort field values (independent variables).

Trend values, as well as predicted values, are calculated using the regression line equation.

Syntax: How to Calculate a Linear Regression Column

FORECAST_LINEAR(display, infield, interval,
 npredict)

where:

display

Keyword

Specifies which values to display for rows of output that represent existing data. Valid values are:

  • INPUT_FIELD. This displays the original field values for rows that represent existing data.
  • MODEL_DATA. This displays the calculated values for rows that represent existing data.

Note: You can show both types of output for any field by creating two independent COMPUTE commands in the same request, each with a different display option.

infield
Is any numeric field. It can be the same field as the result field, or a different field. It cannot be a date-time field or a numeric field with date display options.
interval
Is the increment to add to each sort field value (after the last data point) to create the next value. This must be a positive integer. To sort in descending order, use the BY HIGHEST phrase. The result of adding this number to the sort field values is converted to the same format as the sort field.

For date fields, the minimal component in the format determines how the number is interpreted. For example, if the format is YMD, MDY, or DMY, an interval value of 2 is interpreted as meaning two days. If the format is YM, the 2 is interpreted as meaning two months.

npredict
Is the number of predictions for FORECAST to calculate. It must be an integer greater than or equal to zero. Zero indicates that you do not want predictions, and is only supported with a non-recursive FORECAST.

Example: Calculating a New Linear Regression Field

The following request calculates a regression line using the VIDEOTRK data source of QUANTITY by TRANSDATE. The interval is one day, and three predicted values are calculated.

TABLE FILE VIDEOTRK
SUM QUANTITY
COMPUTE FORTOT=FORECAST_LINEAR(MODEL_DATA,QUANTITY,1,3);
BY TRANSDATE
ON TABLE SET PAGE NOLEAD
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The output is shown in the following image:

Note:

  • Three predicted values of FORTOT are calculated. For values outside the range of the data, new TRANSDATE values are generated by adding the interval value (1) to the prior TRANSDATE value.
  • There are no QUANTITY values for the generated FORTOT values.
  • Each FORTOT value is computed using a regression line, calculated using all of the actual data values for QUANTITY.

    TRANSDATE is the independent variable (x) and QUANTITY is the dependent variable (y). The equation is used to calculate QUANTITY FORECAST trend and predicted values.

The following version of the request charts the data values and the regression line.

GRAPH FILE VIDEOTRK
SUM QUANTITY
COMPUTE FORTOT=FORECAST_LINEAR(MODEL_DATA,QUANTITY,1,3);
BY TRANSDATE
ON GRAPH HOLD FORMAT JSCHART
ON GRAPH SET LOOKGRAPH VLINE
END

The output is shown in the following image.