Cleansing Code Naming Conventions

Write all explanations in uppercase, without spaces. Instead of spaces, always use the underscore (_) character. This convention is important for further analysis of cleansing codes, as the space character is commonly used for their tokenization.

The following is the supported character set:

[A-Z][0-9]_

Each step (algorithm) has a list of predefined cleansing codes (CC). For example:

Steps (algorithms) are normally used several times in a Plan. It is a best practice to use the "Explain As" option and define your own cleansing code for each step usage to identify the exact situation. In your Plan, indicate where the problem was detected.

If possible, use the ATTRIBUTE_PROBLEM_DESCRIPTION structure for naming your own cleansing codes. This enables you to sort cleansing codes according to attribute, while examining the statistical results of cleansed data. For example:

If the same situation is detected by different steps, you can distinguish among the situations. Add STEP as a prefix to the cleansing code. Use the ATTRIBUTE_PROBLEM_DESCRIPTION_STEP structure.

For example:

To display the list of CCs used in a Plan, right-click anywhere in the work area and click Show used scores. This feature enables you to sort the list. It also provides an overview about the CCs in use and the scores.

Keep the CC as short as possible while preserving its meaningfulness. For example, if the name has more patterns, use NAME_MORE_PATTERNS instead of NAME_HAS_MORE_PATTERNS or NHMP).

Take into account the following:


iWay Software