Optimizing Joins

In this section:

This discussion applies to joins invoked either by the FOCUS dynamic JOIN command (see Advanced Reporting Techniques) or by the embedded join facility (see Multi-Table Structures).

For information about using the JOIN command to construct an outer join and about controlling outer join optimization, see Advanced Reporting Techniques.

To explain how optimization logic is applied to joins, first there is a discussion of how the RDBMS and FOCUS differ in their management of joins.


Top of page

x
RDBMS and FOCUS Join Management

When the RDBMS manages a join, it effectively first generates a Cartesian product of the tables, then applies any screening conditions to the resulting rows (including any join conditions), then applies the SELECT list, and, finally, calculates column function values and expressions.

In contrast, when FOCUS manages a join, it first reads segment instances from the top to the bottom of the file structure.

Because FOCUS views RDBMS tables as segments, in very rare situations the RDBMS could return more instances of a row than FOCUS would if it were processing similar data from a FOCUS or sequential data source. This difference is called the multiplicative effect.

FOCUS implements a dynamic or embedded join between RDBMS tables (segments) based on a set of pairs of "from/to" fields. The "to" fields are called the foreign key.

The KEYS attribute in the Access File defines the number of primary key fields (columns) in a segment. KEYS = n indicates that the first n fields in the segment comprise the primary key. If the foreign key fields for a child segment or cross-referenced file do not completely cover its primary key, multiple rows in the child segment (table) may correspond to a single row in the parent segment. Therefore, the RDBMS-generated Cartesian product may contain multiple instances of a single parent segment row. The parent segment is multiplied. By definition, all ancestors of the multiplied segment are also multiplied.

A unique segment is defined as having exactly one instance corresponding to an instance of its parent segment. If its foreign key fields do not cover its primary key, this is not necessarily the case, even though the join was specified as unique. A segment in this situation is called misjoined.

Even if a unique segment is misjoined, however, the adapter does not consider it as causing its parent's multiplication. The adapter processes a "unique segment misjoined" condition separately from the multiplicative effect. it issues a warning message, and, depending on the current optimization setting, may or may not disable optimization.

In most cases, the adapter can prevent the multiplicative effect, as described in Optimizing Joins and Optimizing Aggregation.


Top of page

x
Join Optimization Logic

When you invoke optimization, the adapter attempts to build a single SQL statement. In order for the adapter to create one SQL statement that incorporates an RDBMS join on primary and foreign keys:

If these conditions are satisfied, the adapter generates one SQL SELECT statement that joins all the referenced tables in a single request.

If you set adapter optimization OFF, or if any condition is not satisfied, the adapter generates an individual SQL statement for each table. FOCUS performs the required processing on the returned answer sets.

Note:


Top of page

x
Optimization of Joins Between Heterogeneous Data Sources

The adapter can pass a single SELECT statement for all tables referenced in a TABLE request to the RDBMS when all active segments from the RDBMS comprise a contiguous single-path subtree. The adapter can optimize the join between relational tables even when the retrieval path includes segments from different types of data sources (for example, several DB2 segments and an IMS segment).

Single path means no non-unique RDBMS segment can have an RDBMS parent and a non-unique sibling (RDBMS or non-RDBMS). Active segments are those containing fields referenced in the request or fields involved in join operations. Contiguous segments are connected RDBMS segments with a common target RDBMS. There may be segments of different file types above or below the contiguous segments. Segments occupying intermediate positions in a file hierarchy between explicitly active segments are themselves also implicitly active.

Note:



Example: Optimizing Joins Between Heterogeneous Data Sources

The following example illustrates the SQL request passed to DB2 as the result of a dynamic join between two DB2 tables and an IDMS record. For information about JOIN syntax, see Advanced Reporting Techniques:

 JOIN EMP_ID IN EMPINFO TO WHO IN FUNDTRAN AS JOIN1
 JOIN EMP_ID IN EMPINFO TO EMP_ID IN IDMSFILE AS JOIN2

With optimization enabled, the adapter produces one SQL SELECT statement that joins the two DB2 tables. The RDBMS processes the join:

>       SELECT T1.EID,T2.BN FROM "USER1"."EMPINFO" T1,
    "USER1"."FUNDTRAN" T2 WHERE (T2.EID = T1.EID) FOR FETCH ONLY;

For each DB2 joined row returned, the adapter uses the join field as input to an IDMS OBTAIN record command utilizing an IDMS index or a Calc key.

With optimization disabled, the adapter generates a separate SQL SELECT statement for each DB2 table:

      SELECT T1.EID FROM "USER1"."EMPINFO" T1 FOR FETCH ONLY;

After a row is fetched from table1, the join field is used as input to the second select:

      SELECT T2.BN FROM "USER1"."FUNDTRAN" T2 WHERE (T2.EID = ?) FOR
     FETCH ONLY;

For each DB2 joined row returned, the adapter uses the join field as input to an IDMS OBTAIN record command using an IDMS index or a Calc key.


Information Builders