Detailed Description of Word Analyzer

This step identifies words by parsing input column values using a specified separator and replaces them with symbols - names of dictionaries specified by step properties. Furthermore these created strings can be matched against patterns. If some pattern is matched then this pattern goes to the output instead of the created string. If the string is not found, there are three possible actions and three corresponding output columns (plus a trash column):

Unknown strings can be copied into the trash column. The step is repeated for each pair of source and destination columns.


Top of page

Example: Word Analyzer Example
<step id='wordanalyzer' className='cz.adastra.cif.tasks.text.WordAnalyzer'>
        <properties>
                <analyzedColumns>
                        <analyzedColumn src='name' dest='filtered_name'>
                            <patterns>
                                <pattern defintion="m f" name="mf"/>
                            </patterns>
                                <scorer explanationColumn="expl">
                                        <scoringEntries>
                                                <scoringEntry key='WM_UNKNOWN_WORD' score='1' explain='true' />
                                                <scoringEntry key='WM_NULL_INPUT' score='100' explain='true' />
                                        </scoringEntries>
                                </scorer>
                        </analyzedColumn>
                </analyzedColumns>
                <mlListOfValues>
                        <mlListOfValue symbol='[givenname]' fileName='data/ciselniky/first_names_ml.cif' />
                        <mlListOfValue symbol='[surname]' fileName='data/ciselniky/last_names_ml.cif' />
                </mlListOfValues>
                <slListOfValues>
                        <slListOfValue symbol='[title]' fileName='data/ciselniky/titles_sl.cif' />
                </slListOfValues>
                <separator> </separator>
                <copySeparators>true</copySeparators>
                <symbolForUnidentifiedWords>[x]</symbolForUnidentifiedWords>
                <trashSeparator> </trashSeparator>
        </properties>
</step>

iWay Software