Improving Lookup Performance

The function DB_LOOKUP provides an alternative to a SQL JOIN to look up a value in one (or more) columns of a table. When matching values are found, the value of another column is returned.

For example, a lookup table may contain a list of state codes and names. If you look up the state code NY, you get back the state name New York.

This process can be expedited. By loading the pairs of values into memory (called a cache) you can reduce the number of file or database read operations.

If each input value is only looked up once, there is no advantage to using a cache. If the same input values are looked up multiple times, throughput can be improved because the return value is read from the cache instead of from the database.

For database lookup tables, a system setting controls whether the values in the lookup table are loaded into the cache.

Syntax: How to Enable the Lookup Cache

ENGINE INT CACHE {SET {ON [LIMIT  nn] | OFF } | STAT}

where:

ON: Enables caching.
LIMIT nn: Limits the number of rows added from cache to nn.
OFF: Disables caching.
STAT: Displays statistics for cache.

Top of page

Example: Using the Lookup Cache

To enable the lookup cache for a data flow, create a stored procedure with the following line:

ENGINE INT CACHE SET ON

From the Process Flow tab for the flow, drag the stored procedure into the workspace. Connect the Start object to the stored procedure, and connect that to the data flow.

Process Flow Example

To see how the cache was used, create another stored procedure with the following line:

ENGINE INT CACHE STAT

Drag it onto the process flow after the Data Flow. When you submit the flow, you will see lines, (like those shown in the following image), confirming that the cache was used.

Cache Confirmation