Column Naming Conventions

In this section:

The output file can have numerous columns during the processing of data in iWay DQC. It is a best practice to group columns according to their content.

The following guidelines apply:

The structure of the attribute name is

prefix_attribute meaning_suffix

where:

suffix

Is optional.

Prefixes and suffixes are described in the following tables.

Prefixes

Attribute prefixes and suffixes that are in bold in the tables are frequently used. Rarely used prefixes and suffixes are included in the tables to avoid their possible misuse.

The order of the prefixes in the table defines the recommended order in all the input steps (for example, Text File Reader, Integration Input, and Alter Format).

Attribute Prefix

Description

Additional Information

src_xxx

Source input values

Without any transformation on it

dec_xxx

Decoded source input values

Pre-cleansed data with a single form for null values (for example, NULL, N/A, and N/K are transformed into null)

meta_xxx

Source input metadata

pur_xxx

Operational columns (pre-cleansed values)

Very often used during cleansing of attributes

cyr_xxx

Operational columns (attribute analysis of Cyrillic characters)

Special attributes for different characters

lat_xxx

Operational columns (attribute analysis of Latin characters)

pat_xxx

Attribute structure description (patterns)

adr_xxx

Operational columns (address etalon data in general); formerly, operational columns for Czech environment (those are now cpo_xxx)

Used for any environment

cpo_xxx

Operational columns (pre-cleansed address data), where cpo represents Czech Post Office etalon

For Czech environment only

uir_xxx

Operational columns (address etalon data)

std_xxx

Attribute standardized values

Only structure valid values

cln_xxx

Attribute cleansed/normalized values

Value compared against etalon

out_xxx

Both standardized/cleansed and non-cleansed values

Given by business rules (will be std, cln, src, or other)

score_xxx

Attribute score (highest number means the worst data, 0 means perfect data)

Attribute/instance data quality description

score_instance

Instance score (the sum of attribute scores per single record)

exp_xxx

Quality explanation; cleansing codes for each attribute

cleansing_code

Instance-level cleansing code (list of error messages); aggregated attribute explanations

matching_xxx

Attribute matching values

Contains std or cln values (if available), or pur or src data (depending on the business need), all without accents and in uppercase

matching_key

Matching key (obsolete)

uni_can_id

Candidate group ID

For match and merge process only

uni_can_id_old

Candidate group ID (old, that is, ID assigned within the last unification process)

uni_cli_id

Client group ID

uni_cli_id_old

Client group ID (old, that is, ID assigned within the last unification process)

ins_uni_role

Instance unification role (for example, Master or Slave)

ins_msr_role

Merge surviving instance role

uni_rule

Name of the applied unification rule

grp_can_role

Group unification role (A, C, M, U) for candidate group

grp_cli_role

Group unification role (A, C, M, U) for client group

pri_xxx

Operational columns (primary unification)

Hierarchical match and merge attributes

sec_xxx

Operational columns (secondary unification)

len_xxx

Operational columns (attribute length analysis; formerly known as length_xxx )

Attributes for analytical purposes only (mainly used for so-called ABCDX profiling)

Can be placed between meta_xxx and pur_xxx

char_xxx

Operational columns (attribute char analysis)

word_xxx

Operational columns (attribute word analysis)

qma_xxx

Operational columns (attribute quality mark - ABCDX)

qme_xxx

Operational columns (instance quality mark - ABCDX)

qex_xxx

Operational columns (quality explanation column for the whole instance)

tmp_xxx

Operational columns (temporary columns)

Can be placed anywhere; typically used in cleansing processes after pur_xxx values

aux_xxx

Operational columns (auxiliary columns)

cnt_xxx

Operational columns (counters)

rpl_can_xxx

Replacement candidates (incorrect data)

Rarely used attributes

cor_xxx

Operational columns (auxiliary pre-cleansed values)

bin_xxx

Operational columns (dust bin for waste text)

Suffixes

Attribute Suffix

Description

Additional Information

xxx_rpl

Data prepared for replacement

xxx_pat

Data prepared for parsing

Usually data after replacement

xxx_id

Attribute IDs

xxx_orig

Original values found during parsing (for example, pur_first_name_orig)

For example, used by generic parser step

Using Prefixes

Attributes with the prefix src_xxx (source values) or dec_xxx (decoded source values) are read only (dec_xxx is set only once at the beginning).

Use columns with the prefix std_xxx or cln_xxx for the standardized or cleansed values only. To store all the values in one column (both cleansed and non-cleansed values), use the out_xxx column prefix.

How do you handle std_xxx and cln_xxx? Typically, you want to store the data in the right column, according to the transformation used (standardization or cleansing). If doing so may cause a problem, you will not want to make a distinction. In that case, use std_xxx for both standardized and cleansed values.

If required or intended by the user, you can use cln_xxx for making a distinction. The std_xxx would store only standardized values, not values cleansed against the dictionary.



x
Source Value Mapping and Data Flow

For an iWay DQC project, use the common interface between source systems and iWay DQC. The best practice is to use the canonical interface.

When you use the canonical interface, two possible situations can exist:

It is a best practice to use the following structure for the column name:

prefix + attribute_description

The proper names for other processing will be derived from this structure as required.

Examples:

If the meaning of the attribute is the same during cleansing, do not change the name of the column. You can change only the prefix.


iWay Software