Optimizing Joins

This discussion applies to joins invoked either by the FOCUS dynamic JOIN command (see Advanced Reporting Techniques) or by the embedded join facility (see Multi-Table Structures).

In this section:

RDBMS and FOCUS Join Management

Join Optimization Logic

Optimization of Joins Between Heterogeneous Data Sources

Example:

Optimizing Joins Between Heterogeneous Data Sources

For information about using the JOIN command to construct an outer join and about controlling outer join optimization, see Advanced Reporting Techniques.

To explain how optimization logic is applied to joins, first there is a discussion of how the RDBMS and FOCUS differ in their management of joins.


Top of page

RDBMS and FOCUS Join Management

When the RDBMS manages a join, it effectively first generates a Cartesian product of the tables, then applies any screening conditions to the resulting rows (including any join conditions), then applies the SELECT list, and, finally, calculates column function values and expressions.

In contrast, when FOCUS manages a join, it first reads segment instances from the top to the bottom of the file structure.

Because FOCUS views RDBMS tables as segments, in very rare situations the RDBMS could return more instances of a row than FOCUS would if it were processing similar data from a FOCUS or sequential data source. This difference is called the multiplicative effect.

FOCUS implements a dynamic or embedded join between RDBMS tables (segments) based on a set of pairs of "from/to" fields. The "to" fields are called the foreign key.

The KEYS attribute in the Access File defines the number of primary key fields (columns) in a segment; KEYS = n indicates that the first n fields in the segment comprise the primary key. If the foreign key fields for a child segment or cross-referenced file do not completely cover its primary key, multiple rows in the child segment (table) may correspond to a single row in the parent segment. Therefore, the RDBMS-generated Cartesian product may contain multiple instances of a single parent segment row; the parent segment is multiplied. By definition, all ancestors of the multiplied segment are also multiplied.

A unique segment is defined as having exactly one instance corresponding to an instance of its parent segment. If its foreign key fields do not cover its primary key, this is not necessarily the case, even though the join was specified as unique. A segment in this situation is called misjoined.

Even if a unique segment is misjoined, however, the data adapter does not consider it as causing its parent's multiplication. The data adapter processes a "unique segment misjoined" condition separately from the multiplicative effect; it issues a warning message, and, depending on the current optimization setting, may or may not disable optimization.

In most cases, the data adapter can prevent the multiplicative effect, as described in Optimizing Joins and Optimizing Aggregation.


Top of page

Join Optimization Logic

When you invoke optimization, the data adapter attempts to build a single SQL statement. In order for the data adapter create one SQL statement that incorporates an RDBMS join on primary and foreign keys:

If these conditions are satisfied, the data adapter generates one SQL SELECT statement that joins all the referenced tables in a single request.

If you set data adapter optimization OFF, or if any condition is not satisfied, the data adapter generates an individual SQL statement for each table; FOCUS performs the required processing on the returned answer sets.

Note:


Top of page

Optimization of Joins Between Heterogeneous Data Sources

The data adapter can pass a single SELECT statement for all tables referenced in a TABLE request to the RDBMS when all active segments from the RDBMS comprise a contiguous single-path subtree. The data adapter can optimize the join between relational tables even when the retrieval path includes segments from different types of data sources (for example, several DB2 segments and an IMS segment).

Single path means no non-unique RDBMS segment can have an RDBMS parent and a non- unique sibling (RDBMS or non-RDBMS). Active segments are those containing fields referenced in the request or fields involved in join operations. Contiguous segments are connected RDBMS segments with a common target RDBMS. There may be segments of different file types above or below the contiguous segments. Segments occupying intermediate positions in a file hierarchy between explicitly active segments are themselves also implicitly active.

Note:


Top of page

Example: Optimizing Joins Between Heterogeneous Data Sources

The following example illustrates the SQL request passed to DB2 as the result of a dynamic join between two DB2 tables and an IDMS record. For information about JOIN syntax, see Advanced Reporting Techniques:

 JOIN EMP_ID IN EMPINFO TO WHO IN FUNDTRAN AS JOIN1
JOIN EMP_ID IN EMPINFO TO EMP_ID IN IDMSFILE AS JOIN2

With optimization enabled, the data adapter produces one SQL SELECT statement that joins the two DB2 tables. The RDBMS processes the join:

>       SELECT T1.EID,T2.BN FROM "USER1"."EMPINFO" T1,
"USER1"."FUNDTRAN" T2 WHERE (T2.EID = T1.EID) FOR FETCH ONLY;

For each DB2 joined row returned, the data adapter uses the join field as input to an IDMS OBTAIN record command utilizing an IDMS index or a Calc key.

With optimization disabled, the data adapter generates a separate SQL SELECT statement for each DB2 table:

      SELECT T1.EID FROM "USER1"."EMPINFO" T1 FOR FETCH ONLY;

After a row is fetched from table1, the join field is used as input to the second select:

      SELECT T2.BN FROM "USER1"."FUNDTRAN" T2 WHERE (T2.EID = ?) FOR
FETCH ONLY;

For each DB2 joined row returned, the data adapter uses the join field as input to an IDMS OBTAIN record command using an IDMS index or a Calc key.


Information Builders