Element Test

In this section:

Test for comparison of two values (usually strings).

Name

Type

Required

Description

Case Insensitive

Boolean

Yes

Case (upper/lower) of letters is ignored in comparison of two strings. Default value: false

Dia Insensitive

Boolean

Yes

Diacritics are ignored in comparison of two strings. Default value: false

Expression

String

Yes

Expression passed as argument for the test function (usually just the name of the column).

Function

String

Yes

Name of comparison function. See list of matching functions.

Limits

String

No

Comma delimited list of numbers. Defines mapping of test results from interval to integer. This is best explained through example: Example: Limits="3,5,10" defines the following intervals [0,3] (3,6] (6,10] (10,inf).. The results of the test function are mapped the following ways: results 0,1,2,3 are mapped to 0; results 4,5,6 are mapped to 1; results 7,8,9,10 are mapped to 2 and numbers greater then 10 are mapped to 3. Note that the test weight is used after this mapping.

Relative

Boolean

Yes

Relative version of test function is used (if such version exists). Default value: false

Result For Null Arguments

Double

No

Test result value if at least one of the arguments is null. Test evaluation ends in such cases.

Weight

Double

Yes

Weight of test by which the test result is multiplied. Default value: 1



x
Detailed Description of Unification Extended

List of comparison functions

hamming

Function returns the Hamming distance between two strings. The function has a relative variant where the result is divided by the length of the longer string.

levenshtein

Function returns the Levenshtein distance between two strings. The function has a relative variant where the result is divided by the length of the longer string.

editDistance

Function returns the edit distance between two strings. The difference between Levenshtein and Edit Distance lies in the definition of the distance of the two switched adjacent characters. Levenshtein considers the switch as two changes whereas editDistance considers the switch to be one change.

For example editDistance("edit", "edti") = 1 versus levenshtein("edit", "edti") = 2. The function has a relative variant where the result is divided by the length of the longer string.

symmetricDifference

Compared strings are split into two sets of words (space is used as a separator character). Function returns the number of words contained only in one set, e.g. cardinality of set (A \ B) U (B \ A). The function has a relative variant where the result is divided by the cardinality of the union of the sets.

Example: for strings "JOHN SMITH" and "SMITH JOHN GEORGE JOHN MARTIN" the sets are: A={ JOHN, SMITH } B={ JOHN, SMITH, GEORGE, MARTIN } result = |(A \ B) U (B \ A)| = |{ GEORGE, MARTIN}| = 2 relative result = 2 / |(A U B)| = 2 / |{ JOHN, SMITH, GEORGE, MARTIN}| = 2 / 4 = 0.5

symmetricDifferenceExt

The same as the symmetricDifference function, but when no word is common for both sets (A & B = empty), the result of the function is "very big number" (VBN=1000000). The function has a relative variant.

Example: symmetricDifferenceExt("JOHN SMITH", "GEORGE MARTIN") = VBN versus symmetricDifference("JOHN SMITH", "GEORGE MARTIN") = 2.

symmetricDifferenceMultiSet

Similar to the symmetricDifference function, but repeated words in each string are assumed to be different. The function has a relative variant where the result is divided by the cardinality of the union of the sets (again, with respect to repeating words).

Example: for strings "JOHN SMITH" and "SMITH JOHN GEORGE JOHN MARTIN" the sets are: A={ JOHN, SMITH } B={ JOHN, JOHN(second), SMITH, GEORGE, MARTIN } result = |(A \ B) U (B \ A)| = |{ GEORGE, MARTIN, JOHN(second) }| = 3 relative result = 3 / |(A U B)| = 3 / |{ JOHN, JOHN(second), SMITH, GEORGE, MARTIN }| = 3 / 5 = 0.6

symmetricDifferenceMultiSetExt

The same as the symmetricDifferenceMultiSet function, but when no word is common for both sets (A & B = empty), the result of the function is "very big number" (VBN=1000000). The function has a relative variant.

notSubset

Compared strings are split into two sets of words (space is used as a separator character). Function returns 0 if one of the set is subset of the other one, otherwise it returns 1.

Example: for strings "JOHN SMITH" and "JOHN BROWN" returns 1 (not subset). for strings "JOHN SMITH" and "JOHN JOHN" returns 0.

notSubsetMultiSet

Similar to the notSubset function, but repeated words in each string are assumed to be different.

Example: it returns 1 for the strings "JOHN SMITH" and "JOHN JOHN" because of the two JOHN's in the second string.

numberDistance

Function returns the absolute difference between two integers, e.g. |a - b|. There is no relative variant for this function.

anyIsTrue

Returns 1 if at least one boolean value is true. There is no relative variant for this function.

masterIsTrue

Returns 1 if the boolean value for the center record is true. The value for slave record is ignored. There is no relative variant for this function.

slaveIsTrue

Returns 1 if the boolean value for the slave record is true. The value for center record is ignored. There is no relative variant for this function.

bothAreNull

Returns 1 if both values are null. There is no relative variant for this function.


iWay Software