Calculating Trends and Predicting Values With Multivariate REGRESS
You can calculate trends and predict values
with multivariate regression. This method derives a linear equation
that best fits a set of numeric data points, and uses this equation to
create a new column in the report output. The equation can be based
on one to three independent variables.
This method estimates values by assuming
that the dependent variable (y, the new calculated values)
and the independent variables (x1, x2, x3) are
related by the following linear equation:
y = a1*x1 [+ a2*x2 [+ a3*x3]] + b
When there is one independent variable, the equation represents
a straight line. This produces the same values as FORECAST using
the REGRESS method. When there are two independent variables, the
equation represents a plane, and with three independent variables,
it represents a hyperplane. You should use this technique when you
have reason to believe that the dependent variable can be approximated
by a linear combination of the independent variables.
REGRESS uses a technique called Ordinary Least Squares to calculate
values for the coefficients (a1, a2, a3, and b)
that minimize the sum of the squared differences between the data
and the resulting line, plane, or hyperplane.
x
Procedure: How to Create a Multivariate Linear Regression Column Using Report Painter
-
In the
Report Painter main window, select a numeric BY field.
-
Click
the Forecast button.
-
From
the Step 1: Choose a Method drop-down list, select the Multivariate
Regression option.
-
From
the Step 2: Choose the First Independent Variable drop-down list, select
a variable. You have the option to choose additional independent
variables in Step 3 and Step 4.
-
From
the Step 5: Choose the Dependent Variable drop-down list, select a
dependent variable.
-
Click OK.
x
Syntax: How to Create a Multivariate Linear Regression Column Using WebFOCUS Language
ON {sortfield} RECAP y[/fmt] = REGRESS(n, x1, [x2, [x3,]] z);
where:
- sortfield
- Is a field in the data source. It cannot be the same field as
any of the parameters to REGRESS. A new linear regression equation
is derived each time the sort field value changes.
- y
- Is the new numeric column calculated by applying the regression
equation. You cannot DEFINE or COMPUTE a field with this name.
- fmt
- Is the display format for y. If it is omitted, the default format
is D12.2.
- n
- Is a whole number from 1 to 3 indicating the number of independent
variables.
- x1, x2, x3
- Are the field names to be used as the independent variables.
All of these variables must be numeric and be independent of each
other.
- z
- Is an existing numeric field that is assumed to be approximately
linearly dependent on the independent variables and is used to derive
the regression equation.
x
Reference: Usage Notes for REGRESS
- The (By) sort field
used with REGRESS must be in a numeric or date format.
- REGRESS cannot operate
on an ACROSS field.
- If any of the independent
variables are also sort fields, they cannot be referenced in the
request prior to the REGRESS sort field.
- FORECAST and REGRESS
cannot be used in the same request, and only one REGRESS is supported
in a request. Non-REGRESS RECAP commands are supported.
- The RECAP command
used with REGRESS can contain only the REGRESS syntax. REGRESS does
not recognize any syntax after the closing semicolon (;).
- Although you pass
parameters to REGRESS using an argument list in parentheses, REGRESS
is not a function. It can coexist with a user-written subroutine
of the same name, as long as the user-written subroutine is not
specified in a RECAP command.
- BY TOTAL is not supported.
- MORE, MATCH, FOR,
and OVER are not supported.
- The process of generating
the REGRESS values creates extra columns that are not printed in
the report output. The number and placement of these additional
columns varies depending on the specific request. Therefore, use
of column notation is not supported in a request that includes REGRESS.
- SUMMARIZE and RECOMPUTE
are not supported for the same sort field used for REGRESS.
- REGRESS is not supported
for the FOCUS GRAPH facility.
- The left side of
a RECAP command used for REGRESS supports the CURR attribute for
creating a currency-denominated field.
- Fields with missing
values cannot be used in the regression.
- Larger amounts of
data produce more useful results.
Example: Creating a Multivariate Linear Regression Column
The following example uses the GGSALES
data source to calculate an estimated DOLLARS column. The BUDUNITS,
UNITS, and BUDDOLLARS fields are the independent variables. The
DOLLARS field provides the actual values to be estimated.
- Create a new
procedure and open the ggsales.mas file in Report Painter.
- Select Report at
the top of the Report Painter window, then select Define from
the drop-down list.
The Define dialog box opens.
- Enter MONTH
in the Field input area, type M (a date format) in the Format input area,
then double-click DATE in the Fields List to add it to the area
below PERIOD, then click New.
- Enter PERIOD
in the Field input area, type I2 (Integer format) in the Format
input area, then type MONTH in the area below PERIOD, then click OK.
The
Define dialog box closes and you return back to Report Painter.
- Add the PERIOD, BUDUNITS, UNITS, BUDDOLLARS,
and DOLLARS fields to the report.
- Select the PERIOD field,
and click the By button.
- Click the Where/If button.
The
Report Options dialog box opens.
- Enter CATEGORY
EQ 'Coffee' in the Expression field, and click New.
- Enter REGION
EQ 'West' in the Expression field, and click New.
- Enter UNITS
GT 1600 AND UNITS LT 1700 in the Expression field, and
click OK.
- Click the Forecast button.
The
Forecast window opens.
- In the Field
Name field, enter EST_DOLLARS.
- From the Step
1: Choose a Method drop-down list, select Multivariate Regression.
- From the Step
2: Choose the First Independent Variable drop-down list, select BUDUNITS.
- From the Step
3: Choose Another Independent Variable drop-down list, select UNITS.
- From the Step
4: Choose Another Independent Variable drop-down list, select BUDDOLLARS.
- From the Step
5: Choose the Dependent Variable drop-down list, select DOLLARS.
- Click OK.
When
the report is run, the multivariate regression values appear in
the EST_DOLLARS column. The output is:
TABLE FILE GGSALES
PRINT BUDUNITS UNITS BUDDOLLARS DOLLARS
ON TABLE RECAP ESTDOLLARS/F8 = REGRESS(3,BUDUNITS, UNITS, BUDDOLLARS,
DOLLARS);
WHERE CATEGORY EQ 'Coffee'
WHERE REGION EQ 'West'
WHERE UNITS GT 1600 AND UNITS LT 1700
END