Detailed Description of Strip Titles

This step removes titles (academic, social, etc.) from person names. If specified, removed titles are stored in the output "titlesOut" and separated by a delimiter. Known titles (title dictionary) are stored in a dictionary file which is defined by the "titleLookupFileName" property.

Consider the following dictionary (the bullet character is used to emphasize the dot):

title

translation

wd•

with a dot

nd

no dot

abc | def

abc or def

The following table clarifies the meaning of the parameter specialsSensitive It summarizes both settings (denoted as T and F in parentheses) and their effect on various input strings.

input

translation (F)

translation(T)

output(F)

output(T)

wd•

with a dot

with a dot

nd

no dot

no dot

wd

with a dot

wd

nd•

no dot

no dot

abc | def

abc or def

abc or def

abc % def

abc % def

abc % def



Example: Example
<step id='alg' className='cz.adastra.cif.tasks.clean.StripTitlesAlgorithm'>
        <binding name='in' column='text' />
        <binding name='out' column='dummy' />
        <binding name='titlesOut' column='text2' />
        <properties>
                <titleLookupFileName>titles_lookup_cz.cif</titleLookupFileName>
                <separator>-</separator>
                <scorer explanationColumn='explanation'>
                        <scoringEntries>
                                <scoringEntry key='ST_CHANGED' score='100' explain='true' />
                        </scoringEntries>
                </scorer>
                <minWordCount>2</minWordCount>
        </properties>
</step>

iWay Software