This step removes titles (academic, social, etc.) from person names. If specified, removed titles are stored in the output "titlesOut" and separated by a delimiter. Known titles (title dictionary) are stored in a dictionary file which is defined by the "titleLookupFileName" property.
Consider the following dictionary (the bullet character is used to emphasize the dot):
title |
translation |
---|---|
wd• |
with a dot |
nd |
no dot |
abc | def |
abc or def |
The following table clarifies the meaning of the parameter specialsSensitive It summarizes both settings (denoted as T and F in parentheses) and their effect on various input strings.
input |
translation (F) |
translation(T) |
output(F) |
output(T) |
---|---|---|---|---|
wd• |
with a dot |
with a dot | ||
nd |
no dot |
no dot | ||
wd |
with a dot |
wd | ||
nd• |
no dot |
no dot |
• | |
abc | def |
abc or def |
abc or def | ||
abc % def |
abc % def |
abc % def |
<step id='alg' className='cz.adastra.cif.tasks.clean.StripTitlesAlgorithm'> <binding name='in' column='text' /> <binding name='out' column='dummy' /> <binding name='titlesOut' column='text2' /> <properties> <titleLookupFileName>titles_lookup_cz.cif</titleLookupFileName> <separator>-</separator> <scorer explanationColumn='explanation'> <scoringEntries> <scoringEntry key='ST_CHANGED' score='100' explain='true' /> </scoringEntries> </scorer> <minWordCount>2</minWordCount> </properties> </step>
iWay Software |