Detailed Description of Guess Name Surname

The step identifies a first name and a last name from specified data input. This identification and parsing is dependent on dictionaries that contain a list of known first names and last names (see the properties).

It is also possible to specify that in case diacritics (accents) within the found first name or last name are different from the source value, then the original diacritics are retained (preserved).

This step uses a parser to examine the input string. For more details about it please see the description of Generic Parser. Besides standard components there are the following predefined ones available. These components are verified against corresponding dictionaries and can be configured by wordDefinition, multiWordDefinition and interlacedWordDefinition properties.

Components MULTI_FIRST_NAME and MULTI_LAST_NAME consider following word separators (parameter wordSeparators): -'`"~


Top of page

Example: Guess Name Surname Example
<step id='alg' className='cz.adastra.cif.tasks.clean.GuessNameSurnameAlgorithm'>
        <properties>
                <in>name</in>
                <firstName>out_name</firstName>
                <lastName>out_surname</lastName>
                <firstNameOrig>out_name_orig</firstNameOrig>
                <lastNameOrig>out_surname_orig</lastNameOrig>
                <patternName>out_ptn</patternName>
                <components>
                <!-- when passing " as a valid separator, it must be escaped using double quotes (i.e., "" ) and be defined
                         using html entities to suppres its meaning: &quot;&quot;
                         &apos; is an entity for single quote: ' -->
                        <component name='RESIDUUM'
                                   definition='{MULTIWORD:wordSeparators="&quot;&quot;&apos;-"}'
                                           storeInto='parsed'>
                                <verifier fileName="dictionaries/some_lookup.cif" type="matchingLookup"/>
                        </component>
                        <component name='RESIDUUM_NC' definition='{MULTIWORD}' storeInto='parsed' />
                        <component name='INFIX' definition='{WORD} {WORD}' storeInto='parsed' />
                </components>
                <patternGroups>
                        <patternGroup>
                                <patterns>
                                        <pattern name='F!L!' definition="{FIRST_NAME!} {LAST_NAME!}" />
                                        <pattern name='MF!ZML' definition="{MULTI_FIRST_NAME!} {RESIDUUM} {MULTI_LAST_NAME!}" />
                                        <pattern name='MF!ML!' definition="{MULTI_FIRST_NAME!} {MULTI_LAST_NAME!}" />
                                </patterns>
                        </patternGroup>
                </patternGroups>
                <firstNameLookupFileName>dictionaries/first_names.cif</firstNameLookupFileName>
                <lastNameLookupFileName>dictionaries/last_names.cif</lastNameLookupFileName>
                <multiFirstNameLookupFileName>dictionaries/multi_firstnames_lookup.cif</multiFirstNameLookupFileName>
                <multiLastNameLookupFileName>ciselniky/multi_lastnames_lookup.cif</multiLastNameLookupFileName>
                <wordDefinition>{WORD}</wordDefinition>
                <multiWordDefinition>{MULTIWORD:wordSeparators=&quot;-&#39;`&quot;&quot;~&quot;}</multiWordDefinition>
                <interlacedWordDefinition>{INTERLACED_WORD}</interlacedWordDefinition>
                <scorer explanationColumn='explanation'>
                        <scoringEntries>
                                <scoringEntry key='NM_NO_PATTERN' score='100' explain='true' />
                                <scoringEntry key='NM_PART_PATTERN' score='200' explain='true' />
                                <scoringEntry key='NM_MORE_PATTERNS' score='400' explain='true' />
                                <scoringEntry key='NM_DIFFERENT' score='800' explain='true' />
                                <scoringEntry key='NM_HINT' score='1600' explain='true' />
                        </scoringEntries>
                </scorer>
                <hints>
                        <hint ruleNames='F!L!, MF!ML!' preferedRule='F!L!' />
                        <hint name='MF!ML! x MF!Z ML!' ruleNames='MF!ML!, MF!Z ML!'
                              preferedRule='MF!ML!' />
                </hints>
        </properties>
</step>

iWay Software