Estimating PMF Database Size

In this section: |

In nearly every real life situation, the database size needed for PMF is determined by the size of the core MEASURES table in the PMF Data Mart. In PMF, the MEASURES table stores the basic facts about your Measures. The actual formula for determining the size of the MEASURES table is:

SIZE in bytes = (average number of bytes per row) * (count of rows)

Determining Average Number of Bytes Per Row

To determine the size of a row in the Measures table requires some deep knowledge about your particular RDBMS. The most simplistic method is to add up the number of bytes used by each column in the table and any indexes, and then add a few more bytes for overhead.

As of PMF 5.2.2, if you allow 4 bytes per Integer field and 8 bytes per floating point field, the MEASURES table is slightly over 200 bytes per row. Additionally:

- Modern RDBMSs incorporate sophisticated compression techniques for storing data, so your actual size could be less.
- There are many database tuning techniques for optimizing space usage, such as adjusting the block size of your underlying tablespaces. You will need to consult your RDBMS documentation to determine your average expected row size.

How Much Data Is Stored (Number of Rows)

The number of rows in the PMF MEASURES table in the Data Mart is determined by your business requirements. For some requirements, a few thousand rows of data is sufficient; other business requirements may require millions of rows.

If you are unsure of your business requirements at the time you are performing an estimate, the recommended approach would be to:

- Load several time periods of Measure data. If you are not ready to load all of your Measures, you could load a sample cross section of Measures.
- Determine how many Measure entries you are loading per time period.
- Multiply the rows for one time period by the total number of time periods you anticipate storing in your PMF system. If you have loaded a cross section of Measures, you must also multiply to adjust for the total number of Measures.

Example: Estimating Rows

Suppose you are measuring your metrics at the monthly level. You have a total of 100 Measures, you want to store three years of data, and you are loading a sample cross section of 20 Measures. After your initial loads, you can run this query on your PMF Data Mart from a SQL tool or from WebFOCUS:

TABLE FILE PMF_CUBE_VW WRITE CNT.MEASURE_ID BY TIME_LEVEL01_VALUE BY TIME_LEVEL02_VALUE BY TIME_LEVEL03_VALUE END

Your results would look something like this:

MEASURE_ID Year Quarter Month COUNT ---- ------- ----- ---------- 2010 1 03 101817 2 04 111662

This indicates that you are loading an average of 105K rows per month. Since your sample cross section was 20% of the total Measure population, multiply that result by 5 to get the total average number of rows per month.

You would then multiply that by 36 since you are planning to store three years worth of data. This yields 105K * 5 * 36 = 18900K rows of data - or 18 million rows.

Allocating Enough Space for Data Loads

You need to make sure that enough disc storage space is allocated on your WebFOCUS server to allow PMF data loads. Typically, you need to make sure there are about 32GB of free storage space to avoid having any problems. If you want to calculate the amount of storage space needed more precisely, use the following formula:

[MAX Number of rows for Measure] x [MAX Number of linked Dimensions] x [MAX levels] x 30 x 4 bytes

For example, if the largest Measure load is 2 million rows, linked to 16 Dimensions that each have 16 levels:

[2M] x [16] x [16] x 30 x 4 bytes = 62GB bytes of temp space needed

Note: After each load, the WebFOCUS server clears this temporary space so you do not need to multiply by the number of measures.

WebFOCUS |