This step removes titles (academic, social, etc.) from person names. If specified, removed titles are stored in the output "titlesOut" and separated by a delimiter. Known titles (title dictionary) are stored in a dictionary file which is defined by the "titleLookupFileName" property.
Consider the following dictionary (the bullet character is used to emphasize the dot):
|
title |
translation |
|---|---|
|
wd• |
with a dot |
|
nd |
no dot |
|
abc | def |
abc or def |
The following table clarifies the meaning of the parameter specialsSensitive It summarizes both settings (denoted as T and F in parentheses) and their effect on various input strings.
|
input |
translation (F) |
translation(T) |
output(F) |
output(T) |
|---|---|---|---|---|
|
wd• |
with a dot |
with a dot | ||
|
nd |
no dot |
no dot | ||
|
wd |
with a dot |
wd | ||
|
nd• |
no dot |
no dot |
• | |
|
abc | def |
abc or def |
abc or def | ||
|
abc % def |
abc % def |
abc % def |
<step id='alg' className='cz.adastra.cif.tasks.clean.StripTitlesAlgorithm'>
<binding name='in' column='text' />
<binding name='out' column='dummy' />
<binding name='titlesOut' column='text2' />
<properties>
<titleLookupFileName>titles_lookup_cz.cif</titleLookupFileName>
<separator>-</separator>
<scorer explanationColumn='explanation'>
<scoringEntries>
<scoringEntry key='ST_CHANGED' score='100' explain='true' />
</scoringEntries>
</scorer>
<minWordCount>2</minWordCount>
</properties>
</step>
| iWay Software |