Detailed Description of Selective Transliterate

Transforms a specified set of characters from the input column "in" to another character set, and sends the result text to the output column "out". The transformation is applied to words. A new word starts either at the beginning of the whole input text (token) or after a switch between a sequence of digits+characters and a sequence of special symbols (or vice versa) was detected. A word is then defined as a successive sequence of digits+characters or as a successive sequence of special symbols. Transformation replaces characters occurring in the "from" string by characters at the corresponding positions in the "to" string. The words which are either shorter than the "minWordLength" parameter value or where transformation would break conditions defined by "maxChangeRatio" and "maxConsecutiveChanges", stay unchanged.


Top of page

Example: Example
<step id='alg' className='cz.adastra.cif.tasks.clean.SelectiveTransliterateAlgorithm'>
      <properties>
        <in>popis</in>
        <out>dummy</out>
        <from>ašètm)ó/4q/2áíhc)</from>
        <to>x234567890</to>
        <maxConsecutiveChanges>2</maxConsecutiveChanges>
        <maxChangeRatio>0.5</maxChangeRatio>
        <minWordLength>3</minWordLength>
        <scorer explanationColumn='explanation'>
          <scoringEntries>
            <scoringEntry key='STL_CHANGED' score='100' explain='true' /></scoringEntries>
        </scorer>
      </properties>
</step>

The example above executes these transformations:

These words are not transformed:


iWay Software