In this section: |
This section describes how joins can be optimized for DML Generation using SQL Translation and Hub Servers. This is particularly important when joining heterogeneous data sources across servers.
SQL Translation offers two types of join strategy in DML Generation: nested loop and sort/ merge. To set a preferred join strategy, issue the following command in the Hub Server profile.
SQL EDA SET JOINTYPE {NESTEDLOOP|SORTMERGE}
where:
If JOINTYPE is not set or is set to NESTEDLOOP, DML Generation performs a set of optimization functions:
DML Generation uses the following methodology:
If it finds an equijoin (that is, WHERE clauses of the form fld1 = fld2, where fld1 and fld2 are columns from two of the FROM clause tables), DML Generation determines if either fld1 or fld2 is joinable.
COST=1 | An equijoin to a joinable column (the to column is joinable and the columns have compatible formats). |
COST=16 | An equijoin to a non-joinable column (the to column is not joinable or the columns do not have compatible formats). |
COST=256 | A non-equijoin or unrestricted Cartesian product (there is no such relationship present in the WHERE clause). |
After it builds the matrix, DML Generation tries the following join strategies:
DML Generation tries to respect the order of tables in the FROM clause. Therefore, optimal performance is achieved by making the most restricted table the outer table-that is, first in the FROM clause.
In any case, the least expensive join based on these rules is constructed and the rows returned are the same.
All data adapters allow joins of columns of unequal length. With this capability, DML Generation assigns a cost of 1 to alphabetic joins of unequal length and between short-packed (8-byte) and long-packed (16-byte) columns.
DML Generation considers the following columns to be joinable:
Host Column |
Target Column |
A (any length) |
A (any length) |
short P |
short P |
Long P |
long P |
short P |
long P |
Long P |
short P |
I |
I |
D |
D |
F |
F |
new date |
new date |
old date (I) |
old date (I) |
new date |
old date (I) |
old date (I) |
new date |
Note: Old dates of alpha format are not supported. If you have such a column, it should be defined as alpha.
Some data adapters for legacy files allow a generic join, which violates relational join rules by including extra columns that do not strictly match the select list. For those data adapters, DML Generation reapplies the screening criteria after the records are retrieved to ensure strict relational compliance.
For all signed numeric columns, the handling of different values for the + and - is performed by the data adapter, not DML Generation. All values corresponding to + and - should be retrievable in a join.
Join optimization is controlled by the SET JOINOPT setting in the server profile. It is turned ON by default. For more information on SET JOINOPT, see the Server Administration manual.
Example: |
DML Generation also clones WHERE clause conditions, where appropriate, for optimal join performance. This optimization is performed for both nested loop and sort/merge joins when there is explicit value selection criteria on one of the columns.
DML Generation builds a table of equivalent columns, which have the same value for all rows in the answer set. The equivalencies are found by searching the WHERE clause for predicates that equate one database column to another (for example, A = B). Equijoins always fit this pattern. Next, all WHERE clause predicates on columns in the equivalence table are duplicated, and the column name in the predicate is replaced with its equivalent column.
The following example illustrates an optimization procedure using a nested loop strategy. In this example:
For this SELECT statement
SELECT * FROM A,B WHERE B.key=A.key AND B.key='SMITH'
DML Generation clones the selection criteria for 'SMITH' by also applying it to A.key. Logically, the SELECT is transformed as follows:
SELECT * FROM A,B WHERE B.key=A.key AND B.key='SMITH' AND A.key='SMITH'
The subsequent processing involves the following operations:
The outer table SELECT returns 100 rows in one DBMS event. The inner SELECT is executed 100 times and returns one row for 100 DBMS events. The total is 200 reads, 101 DBMS events, and 100 rows returned for the query.
If DML Generation did not clone the WHERE predicate, the outer SELECT would return 100,000 rows, 99,900 of which would be rejected once the inner SELECT was applied using key values not equal to 'SMITH', which would return no rows. The total would have been 100,001 reads, 100,001 DBMS events, and 100 rows returned for the query.
Thus, with WHERE predicate cloning, maximum efficiency is reached: 200 reads instead of 100,001. WHERE clause cloning provides join optimization, and makes this operation almost as efficient, in this case, as the second example.
The following example illustrates an optimization procedure using a sort/merge strategy. In this example:
For this SELECT statement
SELECT * FROM A,B WHERE A.key=B.key AND B.key='SMITH'
DML Generation clones the selection criteria for 'SMITH' by applying it to A.key. Logically, the SELECT is transformed as follows:
SELECT * FROM A,B WHERE B.key=A.key AND B.key='SMITH' AND A.key='SMITH'
The subsequent processing involves the following operations:
The first SELECT would return 100 rows for one DBMS event. The second would return one row for one DBMS event. The total is 101 reads, two DBMS events, and 100 rows returned for the query.
If DML Generation did not clone the WHERE predicate, there would have been 100,001 reads.
In each case, this transformation yields a query semantically equivalent to the original because A.key and B.key must be equal at all times to satisfy the preexisting equijoin criteria. In any retrieval scenario, the number of rows returned from table A will be reduced by the added predicate.
iWay Software |