In this section: |
This discussion applies to joins invoked either by the FOCUS dynamic JOIN command (see Advanced Reporting Techniques) or by the embedded join facility (see Multi-Table Structures).
For information about using the JOIN command to construct an outer join and about controlling outer join optimization, see Advanced Reporting Techniques.
To explain how optimization logic is applied to joins, first there is a discussion of how the RDBMS and FOCUS differ in their management of joins.
When the RDBMS manages a join, it effectively first generates a Cartesian product of the tables, then applies any screening conditions to the resulting rows (including any join conditions), then applies the SELECT list, and, finally, calculates column function values and expressions.
In contrast, when FOCUS manages a join, it first reads segment instances from the top to the bottom of the file structure.
Because FOCUS views RDBMS tables as segments, in very rare situations the RDBMS could return more instances of a row than FOCUS would if it were processing similar data from a FOCUS or sequential data source. This difference is called the multiplicative effect.
FOCUS implements a dynamic or embedded join between RDBMS tables (segments) based on a set of pairs of "from/to" fields. The "to" fields are called the foreign key.
The KEYS attribute in the Access File defines the number of primary key fields (columns) in a segment. KEYS = n indicates that the first n fields in the segment comprise the primary key. If the foreign key fields for a child segment or cross-referenced file do not completely cover its primary key, multiple rows in the child segment (table) may correspond to a single row in the parent segment. Therefore, the RDBMS-generated Cartesian product may contain multiple instances of a single parent segment row. The parent segment is multiplied. By definition, all ancestors of the multiplied segment are also multiplied.
A unique segment is defined as having exactly one instance corresponding to an instance of its parent segment. If its foreign key fields do not cover its primary key, this is not necessarily the case, even though the join was specified as unique. A segment in this situation is called misjoined.
Even if a unique segment is misjoined, however, the adapter does not consider it as causing its parent's multiplication. The adapter processes a "unique segment misjoined" condition separately from the multiplicative effect. it issues a warning message, and, depending on the current optimization setting, may or may not disable optimization.
In most cases, the adapter can prevent the multiplicative effect, as described in Optimizing Joins and Optimizing Aggregation.
When you invoke optimization, the adapter attempts to build a single SQL statement. In order for the adapter to create one SQL statement that incorporates an RDBMS join on primary and foreign keys:
SUM or COUNT operations at any level other than the lowest level in the join structure cause the RDBMS to duplicate data. In effect, each host row is replicated for each associated row in the cross-referenced table. This duplication of data is known as the multiplicative effect.
When all segments other than the lowest level segment in the request have primary keys, the adapter passes such joins to the RDBMS with an SQL ORDER BY clause on all columns of each segment's primary key, in top to bottom order. The resulting sort on the returned answer set enables the adapter to eliminate duplicate rows before passing the data to FOCUS. This technique is called adapter-managed native join optimization. When adapter-managed native join optimization cannot be used, SUM or COUNT operations at any level other than the lowest level in the join structure cause optimization to be disabled.
In some cases, adapter-managed joins may be less efficient than FOCUS-managed joins from prior releases. To invoke a FOCUS-managed join, set OPTIMIZATION to OFF.
When the adapter manages join optimization, it does not optimize aggregation or sorting, as described in Optimizing Sorts and Optimizing Aggregation. This behavior is consistent with that of prior releases when the join was not passed. Expressions based on a single segment in the structure, however, are passed (see Optimizing Record Selection and Projection).
If these conditions are satisfied, the adapter generates one SQL SELECT statement that joins all the referenced tables in a single request.
If you set adapter optimization OFF, or if any condition is not satisfied, the adapter generates an individual SQL statement for each table. FOCUS performs the required processing on the returned answer sets.
Note:
Join order should not matter in an optimized join. However, performance may degrade when you join more than three tables.
The adapter can pass a single SELECT statement for all tables referenced in a TABLE request to the RDBMS when all active segments from the RDBMS comprise a contiguous single-path subtree. The adapter can optimize the join between relational tables even when the retrieval path includes segments from different types of data sources (for example, several DB2 segments and an IMS segment).
Single path means no non-unique RDBMS segment can have an RDBMS parent and a non-unique sibling (RDBMS or non-RDBMS). Active segments are those containing fields referenced in the request or fields involved in join operations. Contiguous segments are connected RDBMS segments with a common target RDBMS. There may be segments of different file types above or below the contiguous segments. Segments occupying intermediate positions in a file hierarchy between explicitly active segments are themselves also implicitly active.
Note:
The following example illustrates the SQL request passed to DB2 as the result of a dynamic join between two DB2 tables and an IDMS record. For information about JOIN syntax, see Advanced Reporting Techniques:
JOIN EMP_ID IN EMPINFO TO WHO IN FUNDTRAN AS JOIN1 JOIN EMP_ID IN EMPINFO TO EMP_ID IN IDMSFILE AS JOIN2
With optimization enabled, the adapter produces one SQL SELECT statement that joins the two DB2 tables. The RDBMS processes the join:
> SELECT T1.EID,T2.BN FROM "USER1"."EMPINFO" T1, "USER1"."FUNDTRAN" T2 WHERE (T2.EID = T1.EID) FOR FETCH ONLY;
For each DB2 joined row returned, the adapter uses the join field as input to an IDMS OBTAIN record command utilizing an IDMS index or a Calc key.
With optimization disabled, the adapter generates a separate SQL SELECT statement for each DB2 table:
SELECT T1.EID FROM "USER1"."EMPINFO" T1 FOR FETCH ONLY;
After a row is fetched from table1, the join field is used as input to the second select:
SELECT T2.BN FROM "USER1"."FUNDTRAN" T2 WHERE (T2.EID = ?) FOR FETCH ONLY;
For each DB2 joined row returned, the adapter uses the join field as input to an IDMS OBTAIN record command using an IDMS index or a Calc key.
Information Builders |