DENSE and SPARSE Ranking of Sort Field Values

The WebFOCUS sort phrases RANK BY and BY {HIGHEST|LOWEST} n sort the report output and assign rank numbers to the sequence of data values. When assigning a rank to a data value, by default, WebFOCUS does not skip rank numbers. This means that even when multiple data values are assigned the same rank, the rank number for the next group of values is the next sequential integer. This method of assigning rank numbers is called dense.

Some of the relational engines assign rank numbers using a method called sparse. With sparse ranking, if multiple data values are assigned the same rank number, the next rank number will be the previous rank number plus the number of multiples.

You can use the WebFOCUS RANK parameter to control the type of ranking done by WebFOCUS. In addition, if you are accessing a relational data source, you can set the ranking method to the type of ranking done by your relational engine so that the rank calculation can be optimized. Some relational engines have functions for both dense and sparse ranking. In this case, either setting can be optimized.

Reference: Optimizing Ranking

In order to pass rank processing to a relational engine your request must:

Use the SUM (or WRITE or ADD) command to aggregate values.
Specify the number of rank categories to be displayed. That is, you must specify a value for n:
```
[RANKED] BY [HIGHEST] n
```

Syntax: How to Control the Ranking Method

SET RANK={DENSE|SPARSE}

where:

DENSE: Specifies dense ranking. With this method, each rank number is the next sequential integer, even when the same rank is assigned to multiple data values. DENSE is the default value.
SPARSE: Specifies sparse ranking. With this method, if the same rank number is assigned to multiple data values, the next rank number will be the previous rank number plus the number of multiples.

Then, in your request, use one of the following forms of the BY phrase:

RANKED BY {HIGHEST|LOWEST} [n] sortfield [AS 'text']

BY {HIGHEST|LOWEST} n sortfield [AS 'text']

where:

n: Is the highest rank number to display on the report output when the RANKED BY phrase is used. When RANKED is not used, it is the number of distinct sort field values to display on the report output when SET RANK=DENSE, and the total number of lines of output for the sort field when SET RANK=SPARSE.
sortfield: Is the name of the sort field.
text: Is the column heading to be used for the sort field column on the report output.

Reference: Usage Notes for SET RANK

The RNK. prefix operator is not affected by the RANK parameter.
The rank numbers propagated to a HOLD file depend on the RANK parameter setting.

Top of page

Example: Ranking Values in a FOCUS Data Source

The following request against the EMPDATA data source ranks salaries in descending order by division. The RANK parameter is set to DENSE (the default).

SET RANK = DENSE
TABLE FILE EMPDATA
PRINT LASTNAME FIRSTNAME
RANKED BY HIGHEST 12 SALARY
BY DIV
ON TABLE SET PAGE NOPAGE
END

On the output, six employees are included in rank number 6. With dense ranking, the next rank number is the next highest integer, 7.

RANK           SALARY  DIV   LASTNAME         FIRSTNAME
----           ------  ---   --------         ---------
   1      $115,000.00  CE    LASTRA           KAREN
   2       $83,000.00  CORP  SANCHEZ          EVELYN
   3       $80,500.00  SE    NOZAWA           JIM
   4       $79,000.00  CORP  SOPENA           BEN
   5       $70,000.00  WE    CASSANOVA        LOIS
   6       $62,500.00  CE    ADAMS            RUTH
                       CORP  CVEK             MARCUS
                             WANG             JOHN
                       NE    WHITE            VERONICA
                       SE    BELLA            MICHAEL
                             HIRSCHMAN        ROSE
   7       $58,800.00  WE    GOTLIEB          CHRIS
   8       $55,500.00  CORP  VALINO           DANIEL
                       NE    PATEL            DORINA
   9       $54,100.00  CE    ADDAMS           PETER
                       WE    FERNSTEIN        ERWIN
  10       $52,000.00  NE    LIEBER           JEFF
  11       $50,500.00  SE    LEWIS            CASSANDRA
  12       $49,500.00  CE    ROSENTHAL        KATRINA
                       SE    WANG             KATE

Running the same request with SET RANK=SPARSE produces the following output. Since rank category 6 includes six employees, the next rank number is 6 + 6.

RANK           SALARY  DIV   LASTNAME         FIRSTNAME
----           ------  ---   --------         ---------
   1      $115,000.00  CE    LASTRA           KAREN
   2       $83,000.00  CORP  SANCHEZ          EVELYN
   3       $80,500.00  SE    NOZAWA           JIM
   4       $79,000.00  CORP  SOPENA           BEN
   5       $70,000.00  WE    CASSANOVA        LOIS
   6       $62,500.00  CE    ADAMS            RUTH
                       CORP  CVEK             MARCUS
                             WANG             JOHN
                       NE    WHITE            VERONICA
                       SE    BELLA            MICHAEL
                             HIRSCHMAN        ROSE
  12       $58,800.00  WE    GOTLIEB          CHRIS

Top of page

Example: Limiting the Number of Sort Field Values

The following request against the EMPDATA data source sorts salaries in descending order by division and prints the 12 highest salaries. The RANK parameter is set to DENSE (the default).

SET RANK = DENSE
TABLE FILE EMPDATA
PRINT LASTNAME FIRSTNAME
BY HIGHEST 12 SALARY
BY DIV
ON TABLE SET PAGE NOPAGE
END

On the output, 12 distinct salary values are displayed, even though some of the employees have the same salaries.

     SALARY  DIV   LASTNAME         FIRSTNAME
     ------  ---   --------         ---------
$115,000.00  CE    LASTRA           KAREN
 $83,000.00  CORP  SANCHEZ          EVELYN
 $80,500.00  SE    NOZAWA           JIM
 $79,000.00  CORP  SOPENA           BEN
 $70,000.00  WE    CASSANOVA        LOIS
 $62,500.00  CE    ADAMS            RUTH
             CORP  CVEK             MARCUS
                   WANG             JOHN
             NE    WHITE            VERONICA
             SE    BELLA            MICHAEL
                   HIRSCHMAN        ROSE
 $58,800.00  WE    GOTLIEB          CHRIS
 $55,500.00  CORP  VALINO           DANIEL
             NE    PATEL            DORINA
 $54,100.00  CE    ADDAMS           PETER
             WE    FERNSTEIN        ERWIN
 $52,000.00  NE    LIEBER           JEFF
 $50,500.00  SE    LEWIS            CASSANDRA
 $49,500.00  CE    ROSENTHAL        KATRINA
             SE    WANG             KATE

Running the same request with SET RANK=SPARSE produces the following output. Since six employees have salary $62,500, that value is counted 6 times so that only 12 lines (seven distinct salary values) display on the output.

     SALARY  DIV   LASTNAME         FIRSTNAME
     ------  ---   --------         ---------
$115,000.00  CE    LASTRA           KAREN
 $83,000.00  CORP  SANCHEZ          EVELYN
 $80,500.00  SE    NOZAWA           JIM
 $79,000.00  CORP  SOPENA           BEN
 $70,000.00  WE    CASSANOVA        LOIS
 $62,500.00  CE    ADAMS            RUTH
             CORP  CVEK             MARCUS
                   WANG             JOHN
             NE    WHITE            VERONICA
             SE    BELLA            MICHAEL
                   HIRSCHMAN        ROSE
 $58,800.00  WE    GOTLIEB          CHRIS

Top of page

Example: Optimizing RANK Processing

This example will show that the RANKED BY phrase can be passed to DB2 for processing in a request. It uses the WebFOCUS Retail Demo database. You can create this sample data source for a relational adapter by right-clicking the application in which you want to place this sample, and selecting New and then Samples from the context menu. Then, select WebFOCUS - Retail Demo from the Sample procedures and data for drop-down list and click Create.

The following request contains a RANKED BY phrase and a SET RANK=SPARSE command:

SET TRACEUSER = ON
SET TRACEOFF = ALL
SET TRACEON = STMTRACE//CLIENT
SET TRACESTAMP=OFF
SET XRETRIEVAL = OFF
SET RANK = SPARSE
TABLE FILE WF_RETAIL
SUM REVENUE
RANKED BY HIGHEST 12 TIME_DATE
BY PRODUCT_CATEGORY
END

The trace output shows that the RANKED BY phrase was passed to DB2 as a call to the DB2 RANK function.

SELECT
SK001,
SK002,
VB001
FROM (
SELECT
T4."TIME_DATE" AS SK001,
T2."PRODUCT_CATEGORY" AS SK002,
SUM(T1."Revenue") AS VB001,
RANK() OVER( ORDER BY T4."TIME_DATE" DESC NULLS LAST) AS
RANK001
FROM
wfr_fact_sales T1,
wfr_product T2,
wfr_dim_time T4
WHERE
(T2."ID_PRODUCT" = T1."ID_PRODUCT") AND
(T4."ID_TIME" = T1."ID_TIME")
GROUP BY
T4."TIME_DATE",
T2."PRODUCT_CATEGORY"
) X
WHERE
RANK001 <= 12
FOR FETCH ONLY;

Running the same request with a SET RANK=DENSE command produces the following trace output. Since DB2 has a DENSE RANK function, the RANKED BY phrase is passed as a call to this function.

SELECT
SK001,
SK002,
VB001
FROM (
SELECT
T4."TIME_DATE" AS SK001,
T2."PRODUCT_CATEGORY" AS SK002,
SUM(T1."Revenue") AS VB001,
DENSE_RANK() OVER( ORDER BY T4."TIME_DATE" DESC NULLS LAST) AS
RANK001
FROM
wfr_fact_sales T1,
wfr_product T2,
wfr_dim_time T4
WHERE
(T2."ID_PRODUCT" = T1."ID_PRODUCT") AND
(T4."ID_TIME" = T1."ID_TIME")
GROUP BY
T4."TIME_DATE",
T2."PRODUCT_CATEGORY"
) X
WHERE
RANK001 <= 12
FOR FETCH ONLY;