Calculating Trends and Predicting Values With Multivariate REGRESS

How to:

Reference:

The REGRESS method derives a linear equation that best fits a set of numeric data points, and uses this equation to create a new column in the report output. The equation can be based on one to three independent variables.

This method estimates values by assuming that the dependent variable (y, the new calculated values) and the independent variables (x1, x2, x3) are related by the following linear equation:

y = a1*x1 [+ a2*x2 [+ a3*x3]] + b 

When there is one independent variable, the equation represents a straight line. This produces the same values as FORECAST using the REGRESS method. When there are two independent variables, the equation represents a plane, and with three independent variables, it represents a hyperplane. You should use this technique when you have reason to believe that the dependent variable can be approximated by a linear combination of the independent variables.

REGRESS uses a technique called Ordinary Least Squares to calculate values for the coefficients (a1, a2, a3, and b) that minimize the sum of the squared differences between the data and the resulting line, plane, or hyperplane.


Top of page

x
Syntax: How to Create a Multivariate Linear Regression Column
ON {sortfield} RECAP y[/fmt] = REGRESS(n, x1, [x2, [x3,]] z);

where:

sortfield
Is a field in the data source. It cannot be the same field as any of the parameters to REGRESS. A new linear regression equation is derived each time the sort field value changes.
y
Is the new numeric column calculated by applying the regression equation. You cannot DEFINE or COMPUTE a field with this name.
fmt
Is the display format for y. If it is omitted, the default format is D12.2.
n
Is a whole number from 1 to 3 indicating the number of independent variables.
x1, x2, x3
Are the field names to be used as the independent variables. All of these variables must be numeric and be independent of each other.
z
Is an existing numeric field that is assumed to be approximately linearly dependent on the independent variables and is used to derive the regression equation.

Top of page

x
Reference: Usage Notes for REGRESS


Example: Creating a Multivariate Linear Regression Column

The following request uses the GGSALES data source to calculate an estimated DOLLARS column. The BUDUNITS, UNITS, and BUDDOLLARS fields are the independent variables. The DOLLARS field provides the actual values to be estimated:

DEFINE FILE GGSALES
 YEAR/Y = DATE;
 MONTH/M = DATE;
 PERIOD/I2 = MONTH;
END
TABLE FILE GGSALES
PRINT BUDUNITS UNITS BUDDOLLARS DOLLARS
BY PERIOD
ON PERIOD 
RECAP EST_DOLLARS/F8 = REGRESS(3, BUDUNITS, UNITS, BUDDOLLARS, DOLLARS);
WHERE CATEGORY EQ 'Coffee'
WHERE REGION EQ 'West'
WHERE UNITS GT 1600 AND UNITS LT 1700
END

The output is:

PERIOD Budget Units  Unit Sales  Budget Dollars  Dollar Sales  EST_DOLLARS
------ ------------  ----------  --------------  ------------  -----------
     1         1665        1678           21645         23492            0
               1725        1669           22425         21697            0
     2         1613        1685           22582         18535        13334
               1568        1682           23520         25230        14225
               1847        1668           18470         25020         9607
     3         1646        1656           23044         19872        19872
               1759        1615           17590         17765        17765
               1498        1637           16478         21281        21281
               1653        1694           21489         16940        16940
     4         1457        1671           21855         20052            0
     5         1662        1674           24930         18414            0
     6         1825        1695           23725         25425       -41807
               1870        1620           24310         24300       -50319
               1712        1640           22256         16400       -37004
     7         1727        1623           24178         17853       -25413
               1733        1647           17330         24705        -8127
     8         1830        1652           20130         23128         6021
               1451        1660           17412         19920         7369
               1556        1643           18672         18073         7495
     9         1464        1663           14640         23282        11325
               1463        1663           21945         19956        25036
    10         1464        1667           17568         25005        25005
               1711        1623           20532         22722        22722
               1701        1626           18711         21138        21138
               1473        1616           14730         16160        16160
    11         1403        1601           21045         17611            0
    12         1796        1696           17960         25440            0


Information Builders