Column Naming Conventions

The output file can have numerous columns during the processing of data in iWay DQC. It is a best practice to group columns according to their content.

Attribute prefixes and suffixes that are in bold in the tables are frequently used. Rarely used prefixes and suffixes are included in the tables to avoid their possible misuse.

The order of the prefixes in the table defines the recommended order in all the input steps (for example, Text File Reader, Integration Input, and Alter Format).

Attribute Prefix	Description	Additional Information
src_xxx	Source input values	Without any transformation on it
dec_xxx	Decoded source input values	Pre-cleansed data with a single form for null values (for example, NULL, N/A, and N/K are transformed into null)
meta_xxx	Source input metadata
pur_xxx	Operational columns (pre-cleansed values)	Very often used during cleansing of attributes
cyr_xxx	Operational columns (attribute analysis of Cyrillic characters)	Special attributes for different characters
lat_xxx	Operational columns (attribute analysis of Latin characters)
pat_xxx	Attribute structure description (patterns)
adr_xxx	Operational columns (address etalon data in general); formerly, operational columns for Czech environment (those are now cpo_xxx)	Used for any environment
cpo_xxx	Operational columns (pre-cleansed address data), where cpo represents Czech Post Office etalon	For Czech environment only
uir_xxx	Operational columns (address etalon data)
std_xxx	Attribute standardized values	Only structure valid values
cln_xxx	Attribute cleansed/normalized values	Value compared against etalon
out_xxx	Both standardized/cleansed and non-cleansed values	Given by business rules (will be std, cln, src, or other)
score_xxx	Attribute score (highest number means the worst data, 0 means perfect data)	Attribute/instance data quality description
score_instance	Instance score (the sum of attribute scores per single record)
exp_xxx	Quality explanation; cleansing codes for each attribute
cleansing_code	Instance-level cleansing code (list of error messages); aggregated attribute explanations
matching_xxx	Attribute matching values	Contains std or cln values (if available), or pur or src data (depending on the business need), all without accents and in uppercase
matching_key	Matching key (obsolete)
uni_can_id	Candidate group ID	For match and merge process only
uni_can_id_old	Candidate group ID (old, that is, ID assigned within the last unification process)
uni_cli_id	Client group ID
uni_cli_id_old	Client group ID (old, that is, ID assigned within the last unification process)
ins_uni_role	Instance unification role (for example, Master or Slave)
ins_msr_role	Merge surviving instance role
uni_rule	Name of the applied unification rule
grp_can_role	Group unification role (A, C, M, U) for candidate group
grp_cli_role	Group unification role (A, C, M, U) for client group
pri_xxx	Operational columns (primary unification)	Hierarchical match and merge attributes
sec_xxx	Operational columns (secondary unification)
len_xxx	Operational columns (attribute length analysis; formerly known as length_xxx )	Attributes for analytical purposes only (mainly used for so-called ABCDX profiling) Can be placed between meta_xxx and pur_xxx
char_xxx	Operational columns (attribute char analysis)
word_xxx	Operational columns (attribute word analysis)
qma_xxx	Operational columns (attribute quality mark - ABCDX)
qme_xxx	Operational columns (instance quality mark - ABCDX)
qex_xxx	Operational columns (quality explanation column for the whole instance)
tmp_xxx	Operational columns (temporary columns)	Can be placed anywhere; typically used in cleansing processes after pur_xxx values
aux_xxx	Operational columns (auxiliary columns)
cnt_xxx	Operational columns (counters)
rpl_can_xxx	Replacement candidates (incorrect data)	Rarely used attributes
cor_xxx	Operational columns (auxiliary pre-cleansed values)
bin_xxx	Operational columns (dust bin for waste text)

Attribute Prefix

Description

Additional Information

src_xxx

Source input values

Without any transformation on it

dec_xxx

Decoded source input values

Pre-cleansed data with a single form for null values (for example, NULL, N/A, and N/K are transformed into null)

meta_xxx

Source input metadata

pur_xxx

Operational columns (pre-cleansed values)

Very often used during cleansing of attributes

cyr_xxx

Operational columns (attribute analysis of Cyrillic characters)

Special attributes for different characters

lat_xxx

Operational columns (attribute analysis of Latin characters)

pat_xxx

Attribute structure description (patterns)

adr_xxx

Operational columns (address etalon data in general); formerly, operational columns for Czech environment (those are now cpo_xxx)

Used for any environment

cpo_xxx

Operational columns (pre-cleansed address data), where cpo represents Czech Post Office etalon

For Czech environment only

uir_xxx

Operational columns (address etalon data)

std_xxx

Attribute standardized values

Only structure valid values

cln_xxx

Attribute cleansed/normalized values

Value compared against etalon

out_xxx

Both standardized/cleansed and non-cleansed values

Given by business rules (will be std, cln, src, or other)

score_xxx

Attribute score (highest number means the worst data, 0 means perfect data)

Attribute/instance data quality description

score_instance

Instance score (the sum of attribute scores per single record)

exp_xxx

Quality explanation; cleansing codes for each attribute

cleansing_code

Instance-level cleansing code (list of error messages); aggregated attribute explanations

matching_xxx

Attribute matching values

Contains std or cln values (if available), or pur or src data (depending on the business need), all without accents and in uppercase

matching_key

Matching key (obsolete)

uni_can_id

Candidate group ID

For match and merge process only

uni_can_id_old

Candidate group ID (old, that is, ID assigned within the last unification process)

uni_cli_id

Client group ID

uni_cli_id_old

Client group ID (old, that is, ID assigned within the last unification process)

ins_uni_role

Instance unification role (for example, Master or Slave)

ins_msr_role

Merge surviving instance role

uni_rule

Name of the applied unification rule

grp_can_role

Group unification role (A, C, M, U) for candidate group

grp_cli_role

Group unification role (A, C, M, U) for client group

pri_xxx

Operational columns (primary unification)

Hierarchical match and merge attributes

sec_xxx

Operational columns (secondary unification)

len_xxx

Operational columns (attribute length analysis; formerly known as length_xxx )

Attributes for analytical purposes only (mainly used for so-called ABCDX profiling)

Can be placed between meta_xxx and pur_xxx

char_xxx

Operational columns (attribute char analysis)

word_xxx

Operational columns (attribute word analysis)

qma_xxx

Operational columns (attribute quality mark - ABCDX)

qme_xxx

Operational columns (instance quality mark - ABCDX)

qex_xxx

Operational columns (quality explanation column for the whole instance)

tmp_xxx

Operational columns (temporary columns)

Can be placed anywhere; typically used in cleansing processes after pur_xxx values

aux_xxx

Operational columns (auxiliary columns)

cnt_xxx

Operational columns (counters)

rpl_can_xxx

Replacement candidates (incorrect data)

Rarely used attributes

cor_xxx

Operational columns (auxiliary pre-cleansed values)

bin_xxx

Operational columns (dust bin for waste text)

Attribute Suffix	Description	Additional Information
xxx_rpl	Data prepared for replacement
xxx_pat	Data prepared for parsing	Usually data after replacement
xxx_id	Attribute IDs
xxx_orig	Original values found during parsing (for example, pur_first_name_orig)	For example, used by generic parser step

Attribute Suffix

Description

Additional Information

xxx_rpl

Data prepared for replacement

xxx_pat

Data prepared for parsing

Usually data after replacement

xxx_id

Attribute IDs

xxx_orig

Original values found during parsing (for example, pur_first_name_orig)

For example, used by generic parser step

Source Value Mapping and Data Flow

For an iWay DQC project, use the common interface between source systems and iWay DQC. The best practice is to use the canonical interface.

When you use the canonical interface, two possible situations can exist:

The canonical interface is defined by a third party, and you cannot change the naming conventions. You must remap all the canonical interface columns to the iWay DQC source columns to retain the given conventions for iWay DQC Plans.
Remapping may involve the addition of the correct prefix to the canonical attribute name, or changing the attribute name to comply with the naming conventions for common attributes used in iWay DQC Plans. For example, assume that the project-specific name for the attribute that stores last name is C27LN. It is a better practice to map C27LN to src_last_name, instead of using src_c27ln throughout the configuration.

These mappings are defined in the Alter Format and the various Reader/Writer steps.
The canonical interface is defined by the people involved, and you can choose the naming conventions. The best practice is to use the same naming conventions as those used for iWay DQC source column names (src_xxx).

It is a best practice to use the following structure for the column name:

prefix + attribute_description

The proper names for other processing will be derived from this structure as required.

Examples:

Canonical interface: src_first_name / third-party column name (for example, firstName or FNAME)
Source column: src_first_name
Decoded column: dec_first_name
“pur” pre-cleansed column: pur_first_name (read/write)
Both standardized and cleansed value: std_first_name
The following are optional:
- Standardized value column: std_first_name
- Cleansed value column: cln_first_name
Output column: out_first_name

If the meaning of the attribute is the same during cleansing, do not change the name of the column. You can change only the prefix.