Full Outer Joins for Relational Data Sources

How to:

The FOCUS join command and conditional join command have a FULL OUTER join option.

A full outer join returns all rows from the source data source and all rows from the target data source. Where values do not exist for the rows in either data source, null values are returned. FOCUS substitutes default values on the report output (blanks for alphanumeric columns, the NODATA symbol for numeric columns).

The full outer join is only supported for use with those relational data sources that support this type of join, in which case the FOCUS join syntax is optimized (translated to the full outer join SQL syntax supported by the RDBMS). Use of this syntax for any data source that does not support a full outer join, or the failure of the request to be optimized to the engine, produces an error message.

Note: The command SET SHORTPATH = SQL must be in effect in order to issue a full outer join.

Syntax: How to Specify a Full Outer Join

The following syntax generates a full outer equijoin based on real fields:

JOIN FULL_OUTER hfld1 [AND hfld2 ...] IN table1 [TAG tag1] TO {UNIQUE|MULTIPLE} cfld [AND cfld2 ...] IN table2 [TAG tag2] [AS joinname]
END

where:

hfld1

Is the name of a field in the host table containing values shared with a field in the cross-referenced table. This field is called the host field.

AND hfld2...

Can be an additional field in the host table. The phrase beginning with AND is required when specifying multiple fields.

  • For relational adapters that support multi-field and concatenated joins, you can specify up to 20 fields. See your adapter documentation for specific information about supported join features.
IN table1

Is the name of the host table.

TAG tag1

Is a tag name of up to 66 characters (usually the name of the Master File), which is used as a unique qualifier for fields and aliases in the host table.

The tag name for the host table must be the same in all the JOIN commands of a joined structure.

TO [UNIQUE|MULTIPLE] crfld1

Is the name of a field in the cross-referenced table containing values that match those of hfld1 (or of concatenated host fields). This field is called the cross-referenced field.

Note: UNIQUE returns only one instance and, if there is no matching instance in the cross-referenced table, it returns null values.

Use the MULTIPLE parameter when crfld1 may have multiple instances in common with one value in hfld1. Note that ALL is a synonym for MULTIPLE, and omitting this parameter entirely is a synonym for UNIQUE.

AND crfld2...

Is the name of a field in the cross-referenced table with values in common with hfld2.

Note: crfld2 may be qualified. This field is only available for adapters that support multi-field joins.

IN crfile

Is the name of the cross-referenced table.

TAG tag2

Is a tag name of up to 66 characters (usually the name of the Master File), which is used as a unique qualifier for fields and aliases in cross-referenced tables. In a recursive join structure, if no tag name is provided, all field names and aliases are prefixed with the first four characters of the join name.

The tag name for the host table must be the same in all the JOIN commands of a joined structure.

AS joinname

Is an optional name of up to eight characters that you may assign to the join structure. You must assign a unique name to a join structure if:

  • You want to ensure that a subsequent JOIN command does not overwrite it.
  • You want to clear it selectively later.
  • The structure is recursive.

Note: If you do not assign a name to the join structure with the AS phrase, the name is assumed to be blank. A join without a name overwrites an existing join without a name.

END

Required when the JOIN command is longer than one line. It terminates the command and must be on a line by itself.

The following syntax generates a DEFINE-based full outer join:

JOIN FULL_OUTER deffld WITH host_field ... 
     IN table1 [TAG tag1]
     TO [UNIQUE|MULTIPLE] 
     cr_field IN table2 [TAG tag2] [AS joinname]
END

where:

deffld

Is the name of a virtual field for the host file (the host field). The virtual field can be defined in the Master File or with a DEFINE command.

WITH host_field

Is the name of any real field in the host segment with which you want to associate the virtual field. This association is required to locate the virtual field.

The WITH phrase is required unless the KEEPDEFINES parameter is set to ON and deffld was defined prior to issuing the JOIN command.

To determine which segment contains the virtual field, use the ? DEFINE query after issuing the DEFINE command.

IN table1

Is the name of the host table.

TAG tag1

Is a tag name of up to 66 characters (usually the name of the Master File), which is used as a unique qualifier for fields and aliases in host tables.

The tag name for the host table must be the same in all JOIN commands of a joined structure.

TO [UNIQUE|MULTIPLE] crfld1

Is the name of a real field in the cross-referenced table whose values match those of the virtual field. This must be a real field declared in the Master File.

Note: UNIQUE returns only one instance and, if there is no matching instance in the cross-referenced table, it returns null values.

Use the MULTIPLE parameter when crfld1 may have multiple instances in common with one value in hfld1. Note that ALL is a synonym for MULTIPLE, and omitting this parameter entirely is a synonym for UNIQUE.

IN crfile

Is the name of the cross-referenced table.

TAG tag2

Is a tag name of up to 66 characters (usually the name of the Master File), which is used as a unique qualifier for fields and aliases in cross-referenced tables. In a recursive joined structure, if no tag name is provided, all field names and aliases are prefixed with the first four characters of the join name.

The tag name for the host file must be the same in all JOIN commands of a joined structure.

AS joinname

Is an optional name of up to eight characters that you may assign to the joined structure. You must assign a unique name to a join structure if:

  • You want to ensure that a subsequent JOIN command does not overwrite it.
  • You want to clear it selectively later.
  • The structure is recursive, and you do not specify tag names.

If you do not assign a name to the joined structure with the AS phrase, the name is assumed to be blank. A join without a name overwrites an existing join without a name.

END

Required when the JOIN command is longer than one line. It terminates the command and must be on a line by itself.

The following syntax generates a full outer conditional join:

JOIN FULL_OUTER FILE table1 AT hfld1 [WITH hfld2] [TAG tag1]
     TO {UNIQUE|MULTIPLE} 
     FILE table2 AT crfld [TAG tag2] [AS joinname]
     [WHERE expression1;
     [WHERE expression2;
     ...]
END

where:

table1

Is the host Master File.

AT

Links the correct parent segment or host to the correct child or cross-referenced segment. The field values used as the AT parameter are not used to cause the link. They are used as segment references.

hfld1

Is the field name in the host Master File whose segment will be joined to the cross-referenced table. The field name must be at the lowest level segment in its data source that is referenced.

tag1

Is the optional tag name that is used as a unique qualifier for fields and aliases in the host table.

hfld2

Is a table column with which to associate a DEFINE-based conditional JOIN. For a DEFINE-based conditional join, the KEEPDEFINES setting must be ON, and you must create the virtual fields before issuing the JOIN command.

MULTIPLE

Specifies a one-to-many relationship between table1 and table2. Note that ALL is a synonym for MULTIPLE.

UNIQUE

Specifies a one-to-one relationship between table1 and table2. Note that ONE is a synonym for UNIQUE.

Note: The join to UNIQUE will return only one instance of the cross-referenced table, and if this instance does not match based on the evaluation of the WHERE expression, null values are returned.

crfile

Is the cross-referenced Master File.

crfld

Is the join field name in the cross-referenced Master File. It can be any field in the segment.

tag2

Is the optional tag name that is used as a unique qualifier for fields and aliases in the cross-referenced table.

joinname

Is the name associated with the joined structure.

expression1, expression2

Are any expressions that are acceptable in a DEFINE FILE command. All fields used in the expressions must lie on a single path.

END

The END command is required to terminate the command and must be on a line by itself.

Example: Optimizing a Full Outer Join of Db2 Tables

The following requests generate two Db2 tables to join, and then issues a request against the join.

The following request generates the WF_SALES table. The field ID_PRODUCT will be used in the full outer join command. The generated table will contain ID_PRODUCT values from 2150 to 4000:

TABLE FILE WFLITE
SUM GROSS_PROFIT_US PRODUCT_CATEGORY PRODUCT_SUBCATEG
BY ID_PRODUCT
WHERE ID_PRODUCT FROM 2150 TO 4000
ON TABLE HOLD AS WF_SALES FORMAT DB2
END

The following request generates the WF_PROD table. The field ID_PRODUCT will be used in the full outer join command. The generated table will contain ID_PRODUCT values from 3000 to 5000:

TABLE FILE WFLITE
SUM PRICE_DOLLARS PRODUCT_CATEGORY PRODUCT_SUBCATEG PRODUCT_NAME
BY ID_PRODUCT
WHERE ID_PRODUCT FROM 3000 TO 5000
ON TABLE HOLD AS WF_PROD FORMAT DB2
END

The following request issues the SET SHORTPATH = SQL and JOIN commands and displays values from the joined tables:

SET SHORTPATH = SQL
SET TRACEUSER=ON
SET TRACESTAMP=OFF
SET TRACEOFF=ALL
SET TRACEON = STMTRACE//CLIENT
JOIN FULL_OUTER ID_PRODUCT IN WF_PROD TAG T1
 TO ALL ID_PRODUCT IN WF_SALES TAG T2
END 
TABLE FILE WF_PROD
PRINT T1.ID_PRODUCT AS 'Product ID' 
PRICE_DOLLARS AS Price
T2.ID_PRODUCT AS 'Sales ID' 
GROSS_PROFIT_US
BY T1.ID_PRODUCT NOPRINT
ON TABLE SET PAGE NOPAGE
ON TABLE SET STYLE *
GRID=OFF,$
ENDSTYLE
END

The trace shows that the full outer join was optimized (translated to SQL) so that SQL Server could process the join:

SELECT   
T1."ID_PRODUCT",  
T1."PRICE_DOLLARS",  
T2."ID_PRODUCT",  
T2."GROSS_PROFIT_US"  
FROM   
( WF_PROD T1  
FULL OUTER JOIN   
WF_SALES T2  
ON T2."ID_PRODUCT" = T1."ID_PRODUCT" )  
ORDER BY   
T1."ID_PRODUCT";  

The output has a row for each ID_PRODUCT value that is in either table. Rows with ID_PRODUCT values from 2150 to 2167 are only in the WF_SALES table, so the columns from WF_PROD display the NODATA symbol. Rows with ID_PRODUCT values above 4000 are only in the WF_PROD table, so the columns from WF_SALES display the NODATA symbol. Rows with ID_PRODUCT values from 2000 to 4000 are in both tables, so all columns have values, as shown in the following image.


Information Builders