Missing Rows of Data in Cross-referenced Tables

This topic describes factors that affect report results when a host row has no corresponding cross-referenced row. The discussion applies to both dynamic and embedded joins.

In this section:

The SET ALL Command

Controlling Outer Join Optimization

Missing Rows in Unique Descendants

Missing Rows in Non-unique Descendants

Summary Chart

Normally, when a row from the host table or view is retrieved, a corresponding row from the cross-referenced table can also be retrieved. If a host row lacks a corresponding cross-referenced row, the retrieval path is called a short path. When there are short paths, the processing of the host row and the report results depend on:

Note:


Top of page

The SET ALL Command

You can include short paths in a report with the FOCUS SET ALL command

SET ALL = {OFF|ON}

where:

OFF
Is the default. In a join, omits host rows from the report if they lack a corresponding cross-referenced row.

ON
Includes all host rows in the report. (This is known as a left outer join.)

Note:


Top of page

Controlling Outer Join Optimization

With the SET SQLJOIN OUTER command you can control when the data adapter optimizes outer joins without affecting the optimization of other operations. This parameter provides backward compatibility with prior releases of the data adapter and enables you to fine-tune your applications.

How to:

Control Outer Join Optimization

Example:

Enabling Left Outer Join Optimization

Reference:

Effects of Combinations of Settings on Outer Join Optimization

When join optimization is in effect, the data adapter generates one SQL SELECT statement that includes every table involved in the join. The RDBMS can then process the join. When join optimization is disabled, the data adapter generates a separate SQL SELECT statement for each table, and FOCUS processes the join.

You can use the SQLJOIN OUTER setting to disable outer join optimization while leaving other optimization enhancements in effect.


Top of page

Syntax: How to Control Outer Join Optimization

SQL sqlengine SET SQLJOIN OUTER {ON|OFF}

where:

sqlengine
Indicates the target RDBMS. Valid values are DB2, SQLDBC, or SQLORA. Omit if you issued the SET SQLENGINE command.

ON
Enables outer join optimization. ON is the default value for Teradata and Oracle.

OFF
Disables outer join optimization. OFF is the default value for DB2.

Note:


Top of page

Reference: Effects of Combinations of Settings on Outer Join Optimization

The following table describes how different combinations of OPTIMIZATION and SQLJOIN OUTER settings affect data adapter behavior. It assumes that SET ALL = ON:

Settings

Results

OPTIMIZATION

SQLJOIN OUTER

Outer Join Optimized?

Other Optimization Features

ON

ON

Yes

Enabled

ON

OFF

No

Enabled

OFF

N/A

No

Disabled

SQL

ON

Yes, in all possible cases

Enabled

SQL

OFF

No

Enabled

FOCUS

ON

Yes if results are equivalent to FOCUS-managed request

Enabled

FOCUS

OFF

No

Enabled


Top of page

Example: Enabling Left Outer Join Optimization

The following request specifies a left outer join between the EMPINFO and FUNDTRAN tables. The SQLJOIN OUTER setting specifies that the left outer join should be optimized, and the SQLAGGR and STMTRACE trace components are activated:

SET TRACEUSER = ON
SET TRACEOFF = ALL
SET TRACEON = SQLAGGR//CLIENT
SET TRACEON = STMTRACE//CLIENT

SET ALL = ON
SQL DB2 SET OPTIMIZATION ON
SQL DB2 SET SQLJOIN OUTER ON
JOIN EMP_ID IN EMPINFO TO ALL WHO IN FUNDTRAN AS J1
TABLE FILE EMPINFO
SUM AVE.CURRENT_SALARY ED_HRS BY WHO BY LAST_NAME
IF DEPARTMENT EQ 'MIS'
END

The following trace is generated. One SELECT statement is generated. The LEFT OUTER JOIN phrase in the FROM predicate specifies the left outer join:

AGGREGATION DONE ...
SELECT T2."EID",T1."LN", AVG(T1."CSAL"), SUM(T1."OJT") FROM (
USER1."EMPINFO" T1 LEFT OUTER JOIN "USER1"."FUNDTRAN" T2 ON
T2."EID" = T1."EID") WHERE (T1."DPT" = 'MIS') GROUP BY
T2."EID",T1."LN" ORDER BY T2."EID",T1."LN" FOR FETCH ONLY;

Top of page

Missing Rows in Unique Descendants

In a unique join, the engine that handles the join controls the output:

  • IF FOCUS handles the join, it treats a unique cross-referenced table as a logical extension of the host or parent table. Since FOCUS never considers a unique cross-referenced row to be missing, it displays short paths regardless of whether SET ALL is ON or OFF. FOCUS handles the join in the following two cases:

Example:

FOCUS Unique Join Processing

Reference:

How FOCUS Processes a Unique Join

How the RDBMS Processes a Unique Join

Note:

The following topics discuss each of these settings.


Top of page

Reference: How FOCUS Processes a Unique Join

FOCUS handles the join when either of the following conditions is true:

FOCUS substitutes default values for any missing cross-referenced data (because it does not consider them to be missing in a unique join). FOCUS recognizes that you have defined the join to be unique and:

The Data Adapter Optimizer, discusses factors that determine whether optimization is disabled. To find out if and why the data adapter disabled OPTIMIZATION for a report request, use the SQLAGGR trace component (see Tracing Data Adapter Processing).


Top of page

Example: FOCUS Unique Join Processing

In the following example, a unique join connects the EMPINFO table, containing employee information, to the FUNDTRAN table, containing direct deposit account information for the employees. (File Descriptions and Tables, provides the Master and Access Files.)

join emp_id in empinfo tag file1 to who in fundtran tag file2 as j1
>
sql db2 set optimization off
>
table file empinfo
print bank_name bank_acct
by department by emp_id
end

NUMBER OF RECORDS IN TABLE= 14 LINES= 14

Since the join is unique and the SET OPTIMIZATION OFF command disables optimization, FOCUS displays one row per employee on the report. If there is no data for the cross-referenced table (FUNDTRAN), FOCUS appropriately substitutes zeros or blanks for its fields.

Two new employees, John Royce (333121200, no department) and Donna Lee (455670000, MIS) have no bank accounts. Six other employees also lack bank accounts:

DEPARTMENT      EMP_ID        BANK_NAME           BANK_ACCT
---------- ------ --------- ---------
. 333121200 0
MIS 112847612 0
117593129 STATE 40950036
219984371 0
326179357 ASSOCIATED 122850108
455670000 0
543729165 0
818692173 BANK ASSOCIATION 163800144
PRODUCTION 071382660 0
119265415 0
119329144 BEST BANK 160633
123764317 ASSOCIATED 819000702
126724188 0
451123478 ASSOCIATED 136500120

Note: Employee 333121200 has not yet been assigned a department; its null value is indicated by the NODATA symbol. However, BANK_NAME and BANK_ACCT are not considered missing because the join is unique.


Top of page

Reference: How the RDBMS Processes a Unique Join

The RDBMS processes the join under the following conditions:

The Data Adapter Optimizer, discusses factors that determine whether optimization is enabled.


Top of page

Missing Rows in Non-unique Descendants

In a non-unique join (JOIN...TO ALL) with missing cross-referenced rows, report results depend entirely on the SET ALL command because:

  • If SET ALL=OFF, the RDBMS and FOCUS both perform an inner join. Therefore, the OPTIMIZATION setting has no effect on the report results. Short paths are omitted from the report; multiple matching rows are included.

Example:

Non-Unique Join Processing With SET ALL = OFF

Non-Unique Join Processing With SET ALL = ON

Reference:

SET ALL=OFF With a Non-unique Join

SET ALL=ON With a Non-unique Join

SET ALL=ON With Screening Conditions

Selective ALL. Prefix


Top of page

Reference: SET ALL=OFF With a Non-unique Join

With SET ALL=OFF, optimization may be enabled or disabled. In both cases, however, the report results are the same: an inner join. Host rows that lack corresponding cross-referenced rows are not included in the report; multiple matching rows are included (even if there is only one).


Top of page

Example: Non-Unique Join Processing With SET ALL = OFF

For each of the following examples, the same non-unique join is in effect. It connects the EMPINFO table, containing employee information, to the DEDUCT table, containing salary deduction information.

The examples also execute the same report request. OPTIMIZATION is set ON and only the SET ALL command changes.

Two employees, John Royce (333121200, no department) and Donna Lee (455670000, MIS) have not been paid yet; therefore, they have no deductions.

When SET ALL is OFF, a host row that lacks a corresponding cross-referenced row is rejected (regardless of whether FOCUS or the RDBMS processes the join):

join clear j1
join emp_id in empinfo tag file1 to all dedeid in deduct tag file2 as j2
set all=off

table file empinfo
print ded_code ded_amt
by department by emp_id by deddate
end

NUMBER OF RECORDS IN TABLE= 448 LINES= 448

The report displays multiple deduction records in the cross-referenced table for each of 12 long-time employees in the host table. John Royce (333121200, no department) and Donna Lee (455670000, MIS department) are not listed in the report, because their corresponding cross-referenced rows do not exist in the DEDUCT table:

DEPARTMENT  EMP_ID     DEDDATE   DED_CODE  DED_AMT
---------- ------ ------- -------- -------
MIS 112847612 82/01/29 CITY $1.43
SAVE $54.60
STAT $20.02
LIFE $13.65
HLTH $22.75
FICA $100.10
FED $121.55
82/02/26 STAT $20.02
SAVE $54.60
LIFE $13.65
.
.
.

Top of page

Reference: SET ALL=ON With a Non-unique Join

With a non-unique join, SET ALL=ON produces a left outer join regardless of whether FOCUS or the RDBMS handles the join:

In both cases the report results are the same.

If there are no cross-referenced rows for a host row, the cross-referenced columns display the NODATA value.


Top of page

Example: Non-Unique Join Processing With SET ALL = ON

The following example executes the same report request as in Non-Unique Join Processing With SET ALL = OFF, except that SET ALL is ON:

set all = on
table file empinfo
print ded_code ded_amt
by department by emp_id by deddate
end
NUMBER OF RECORDS IN TABLE= 450 LINES= 450

John Royce (333121200) is now on Page 1 and Donna Lee (455670000) on Page 8. The department column for John Royce (333121200) displays the NODATA value, since the field has MISSING=ON:

PAGE     1

DEPARTMENT EMP_ID DEDDATE DED_CODE DED_AMT
---------- ------ ------- -------- -------
. 333121200 . . .
MIS 112847612 82/01/29 CITY $1.43
FED $121.55
FICA $100.10
HLTH $22.75
LIFE $13.65
.
.
.

Donna Lee (455670000) works for MIS:

PAGE     8

DEPARTMENT EMP_ID DEDDATE DED_CODE DED_AMT
---------- ------ ------- -------- -------
MIS 326179357 82/07/30 STAT $88.93
82/08/31 CITY $6.35
FED $539.96
FICA $444.67
HLTH $45.37
LIFE $27.22
SAVE $108.90
STAT $88.93
455670000 . . .
543729165 82/04/30 CITY $.72
.
.
.

Note: The report that results from passing a request to the RDBMS with SET ALL=ON may be sorted in a slightly different order from the report produced if FOCUS processes the join; however, both results are equally correct.


Top of page

Reference: SET ALL=ON With Screening Conditions

When SET ALL=ON, screening conditions affect report results for a non-unique join. If a screening test specifies a field from the cross-referenced structure, host rows whose cross-referenced rows were screened out are not represented, regardless of the SET ALL command.

The following example includes a WHERE test to screen for health deduction records:

join emp_id in empinfo tag f1 to all dedeid in deduct tag f2 as join1
set all=on
table file empinfo
sum ded_amt
by department by last_name by first_name by ded_code
where ded_code eq 'hlth'
end

Royce and Lee are omitted as the result of the WHERE test, because they have no deduction code records:

   DEPARTMENT     LAST_NAME    FIRST_NAME     DED_CODE    DED_AMT
---------- --------- ---------- -------- -------
MIS BLACKWOOD ROSEMARIE HLTH $226.87
CROSS BARBARA HLTH $330.21
JONES DIANE HLTH $121.16
MCCOY JOHN HLTH $16.50
SMITH MARY HLTH $181.99
PRODUCTION BANNING JOHN HLTH $41.25
IRVING JOAN HLTH $427.35
MCKNIGHT ROGER HLTH $122.43
ROMANS ANTHONY HLTH $105.60
STEVENS ALFRED HLTH $69.44

Top of page

Reference: Selective ALL. Prefix

Even if SET ALL=OFF, you can apply the effect of the ON setting to specific tables. To do this, add the ALL. prefix to one of the host fields in the request. The ALL. prefix causes FOCUS to process host rows even if they have missing cross-referenced rows. Like SET ALL=ON, the report results depend on whether screening tests exist for the cross-referenced fields.

For example, when the ALL. prefix is applied to the field DEPARTMENT, the report results are the same as for SET ALL=ON. (The report is displayed in Non-Unique Join Processing With SET ALL = ON).

set all=off
table file empinfo
print ded_code ded_amt
by all.department by emp_id by deddate
end

Note: The ALL. prefix is only effective when the SET ALL value is OFF; SET ALL=ON overrides the ALL. prefix.


Top of page

Summary Chart

This summary chart lists possible report results for dynamic joins and multi-table Master Files (embedded joins):

 

RDBMS Behavior
(Optimization Enabled)

FOCUS Behavior
(Optimization Disabled)

 

SET ALL =

SET ALL =

Join Type

ON (SQLJOIN OUTER ON)

OFF

ON

OFF

NON-UNIQUE

Dynamic
(JOIN...TO ALL)
or Embedded
(SEGTYPE=S0 or KL)

Outer Join

Inner Join

Outer join

(Also applies if Optimization Enabled and SQLJOIN OUTER OFF)

Inner join

Short paths

Appear with NODATA (.) value for fields of all missing segments

Do not appear

Appear with NODATA (.) value for fields of all missing segments

Do not appear

More than one matching row

All matching rows appear

All matching rows appear

All matching rows appear

All matching rows appear

UNIQUE

Dynamic
(JOIN...TO)
or Embedded
(SEGTYPE=U or KLU)

Outer join

Inner Join

FOCUS Unique

(Also applies if Optimization Enabled and SQLJOIN OUTER OFF)

FOCUS Unique

Short paths

Appear with NODATA (.) value for fields of all missing segments

Do not appear

Appear with blank or zero for missing values

Appear with blank or zero for missing values

More than one matching row

All matching rows appear

All matching rows appear

Only first matching row appears

Only first matching row appears

Note:


Information Builders