Detailed Description of Value Replacer

Implements a step that transforms the input property "column" in two steps by string replacement. First, well known word phrases are replaced, the result string is transformed to the canonical form. The string is then replaced as a whole if an additional translation definition exists. The term canonical form denotes the transformation of all alphanumeric characters to uppercase and finalization of each sequence of special characters (other than alphanumeric) by a space character ' '. A configurable tokenizer is used for parsing the text to words.

Assume the following partial translation:

Value

Translation

'aabbcc'

'abc'

'dfe'

'def'

'14325'

'12345'

and the full translation:

Value

Translation

'abc def'

'abeceda'

'abc 12345'

'alfabeta'

'a# b# c#'

'abc'

The step performs the following replacements:

Input

Partial Translation

Output

'aabbcc dfe'

'abc def'

'abeceda'

'abc 14325'

'abc 12345'

'alfabeta'

'a#b#c#'

'a# b# c#'

'abc'



Example: Example
<step id='replacer' className='cz.adastra.cif.tasks.clean.ValueReplacer'>
        <properties diaInsensitive='false' caseInsensitive='true'>
                <column>text</column>
                <partialReplacements>part.cif</partialReplacements>
                <fullReplacements>full.cif</fullReplacements>
                <tokenizer whiteSpaceDefinition="[:white:]">
                        <types>
                                <type tokenCharacters="[:letter:]" tokenStartCharacters="[:letter:]" />
                                <type tokenCharacters="[:digit:]" tokenStartCharacters="[:digit:]" />
                        </types>
                </tokenizer>
                <scorer explanationColumn="explanation">
                        <scoringEntries>
                                <scoringEntry key='VR_REPLACEMENT' score='100' explain='true' />
                        </scoringEntries>
                </scorer>
        </properties>
</step>

iWay Software