Detailed Description of Update Gender

Based on a specified first name and last name, this step determines gender value and verifies it against any provided input value. The final result can be one of the following:

Determination of the gender value is dictionary based. Dictionaries contain known first and last names together with information about a ratio in which the name is represented within men and women. The final decision about the gender value is then based on the threshold, which is 51 percent by default (i.e., only first names and last names with a minimum of 51 percent ratio each are considered to confirm the gender value unambiguously). The default value can be changed using the properties nameSurenessLevel and surnameSurenessLevel (a percentage; the step accepts values between 51 and 100).

Note: statistical data with first names and last names for Czech Republic were used as a resource for the dictionaries.

The original gender value is replaced only if the derived gender value is determined identically by both first and last name (both satisfy the defined threshold). In other cases the original (input) value is stored in the output. If the input value is incorrect (and/or not found in the dictionary), it is considered to be empty and the step sets the scoring flag GNDR_GENDER_MISMATCH.

This step supports also single-based verification - using the last name only, but in this case it only confirms the gender value according to the last name or it sets a scoring flag estimatedGender - a string indicating the estimated result. However, the original value is retained and stored in the output inGender (but a replacement in this situation can be forced using the the property overwriteIfGuess).

The same situation occurs (with the same result) when determination of the gender value is not definite, however one of the components suggests what the probable gender value is (i.e., only one of the components satisfies the threshold in the appropriate dictionary).

Because the step uses dictionaries with exact forms of first names and last names, input first name and last name need to be cleansed and identified beforehand (e.g. using the Guess Name Surname step).

Comparision of the determined and input gender value is case-insensitive.


Top of page

Example: Example

The following table shows scoring flags set by the step for combinations of different input values. If no input gender value is specified, the step has no data to compare the result against, instead the values from "F","M" rows and columns (blue) are used.

If an input gender value is provided, the columns input gender value opposite a input gender value equals (green). The rest of table values (black) are not dependent on the input gender value, i.e. they are the same for both cases. Scoring flags conform to the overwriteIfGuess=false status. If this property is set to true, the scoring flag CHANGED applies (instead of CHANGE_SUGGESTION).

Last name

First name

Not provided

Not found

Not conclusive

F(female)

M(male)

Input gender value opposite

Input gender value equals

 

Not provided

UNDECIDABLE

UNDECIDABLE

NAME_UNKNOWN

UNDECIDABLE

CHANGE_SUGGESTION

CHANGE_SUGGESTION

CHANGE_SUGGESTION

CONFIRMED

Not found

UNDECIDABLE

SURNAME_UNKNOWN

UNDECIDABLE

NAME_UNKNOWN

SURNAME_UNKNOWN

UNDECIDABLE

SURNAME_UNKNOWN

CHANGE_SUGGESTION

SURNAME_UNKNOWN

CHANGE_SUGGESTION

SURNAME_UNKNOWN

CHANGE_SUGGESTION

SURNAME_UNKNOWN

CONFIRMED

SURNAME_UNKNOWN

Not conclusive

UNDECIDABLE

UNDECIDABLE

NAME_UNKNOWN

UNDECIDABLE

CHANGE_SUGGESTION

CHANGE_SUGGESTION

CHANGE_SUGGESTION

CONFIRMED

F(female)

CHANGE_SUGGESTION

CHANGE_SUGGESTION

NAME_UNKNOWN

CHANGE_SUGGESTION

CHANGED

MISMATCH

  

M(male)

CHANGE_SUGGESTION

CHANGE_SUGGESTION

NAME_UNKNOWN

CHANGE_SUGGESTION

MISMATCH

CHANGED

  

Input gender value opposite

CHANGE_SUGGESTION

CHANGE_SUGGESTION

NAME_UNKNOWN

CHANGE_SUGGESTION

  

CHANGED

MISMATCH

Input gender value equals

CONFIRMED

CONFIRMED

NAME_UNKNOWN

CONFIRMED

  

MISMATCH

CONFIRMED

<step id='alg' className='cz.adastra.cif.tasks.clean.UpdateGenderAlgorithm'>
        <properties>
                <inGender>ingender</inGender>
                <inSurname>insurname</inSurname>
                <inName>inname</inName>
                <estimatedGender>estgender</estimatedGender>
                <firstNameRatioLookupFileName>ratio_names.cif</firstNameRatioLookupFileName>
                <surnameRatioLookupFileName>ratio_surnames.cif</surnameRatioLookupFileName>
                <maleDefinition>M</maleDefinition><!-- opt : M -->
                <femaleDefinition>F</femaleDefinition><!-- opt : F -->
                <nameSurenessLevel>99</nameSurenessLevel><!-- opt : 51-->
                <surnameSurenessLevel>99</surnameSurenessLevel><!-- opt : 51 -->
                <overwriteIfGuess>false</overwriteIfGuess>
                <scorer explanationColumn='expl'>
                        <scoringEntries>
                                <scoringEntry key='UG_CONFIRMED' score='0' explain='true' />
                                <scoringEntry key='UG_CHANGED' score='500' explain='true' />
                                <scoringEntry key='UG_MISMATCH' score='300' explain='true' />
                                <scoringEntry key='UG_UNDECIDABLE' score='700' explain='true' />
                                <scoringEntry key='UG_CHANGE_SUGGESTION' score='500' explain='true'/>
                                <scoringEntry key='UG_GENDER_MISMATCH'  score='200' explain='true'/>
                                <scoringEntry key='UG_NAME_UNKNOWN' score='100' explain='true'/>
                                <scoringEntry key='UG_SURNAME_UNKNOWN' score='100' explain='true'/>
                        </scoringEntries>
                </scorer>
        </properties>
</step>

iWay Software