In this section: |
Test for comparison of two values (usually strings).
Name |
Type |
Required |
Description |
---|---|---|---|
Case Insensitive |
Boolean |
Yes |
Case (upper/lower) of letters is ignored in comparison of two strings. Default value: false |
Dia Insensitive |
Boolean |
Yes |
Diacritics are ignored in comparison of two strings. Default value: false |
Expression |
String |
Yes |
Expression passed as argument for the test function (usually just the name of the column). |
Function |
String |
Yes |
Name of comparison function. See list of matching functions. |
Limits |
String |
No |
Comma delimited list of numbers. Defines mapping of test results from interval to integer. This is best explained through example: Example: Limits="3,5,10" defines the following intervals [0,3] (3,6] (6,10] (10,inf).. The results of the test function are mapped the following ways: results 0,1,2,3 are mapped to 0; results 4,5,6 are mapped to 1; results 7,8,9,10 are mapped to 2 and numbers greater then 10 are mapped to 3. Note that the test weight is used after this mapping. |
Relative |
Boolean |
Yes |
Relative version of test function is used (if such version exists). Default value: false |
Result For Null Arguments |
Double |
No |
Test result value if at least one of the arguments is null. Test evaluation ends in such cases. |
Weight |
Double |
Yes |
Weight of test by which the test result is multiplied. Default value: 1 |
List of comparison functions
hamming |
Function returns the Hamming distance between two strings. The function has a relative variant where the result is divided by the length of the longer string. |
levenshtein |
Function returns the Levenshtein distance between two strings. The function has a relative variant where the result is divided by the length of the longer string. |
editDistance |
Function returns the edit distance between two strings. The difference between Levenshtein and Edit Distance lies in the definition of the distance of the two switched adjacent characters. Levenshtein considers the switch as two changes whereas editDistance considers the switch to be one change. For example editDistance("edit", "edti") = 1 versus levenshtein("edit", "edti") = 2. The function has a relative variant where the result is divided by the length of the longer string. |
symmetricDifference |
Compared strings are split into two sets of words (space is used as a separator character). Function returns the number of words contained only in one set, e.g. cardinality of set (A \ B) U (B \ A). The function has a relative variant where the result is divided by the cardinality of the union of the sets. Example: for strings "JOHN SMITH" and "SMITH JOHN GEORGE JOHN MARTIN" the sets are: A={ JOHN, SMITH } B={ JOHN, SMITH, GEORGE, MARTIN } result = |(A \ B) U (B \ A)| = |{ GEORGE, MARTIN}| = 2 relative result = 2 / |(A U B)| = 2 / |{ JOHN, SMITH, GEORGE, MARTIN}| = 2 / 4 = 0.5 |
symmetricDifferenceExt |
The same as the symmetricDifference function, but when no word is common for both sets (A & B = empty), the result of the function is "very big number" (VBN=1000000). The function has a relative variant. Example: symmetricDifferenceExt("JOHN SMITH", "GEORGE MARTIN") = VBN versus symmetricDifference("JOHN SMITH", "GEORGE MARTIN") = 2. |
symmetricDifferenceMultiSet |
Similar to the symmetricDifference function, but repeated words in each string are assumed to be different. The function has a relative variant where the result is divided by the cardinality of the union of the sets (again, with respect to repeating words). Example: for strings "JOHN SMITH" and "SMITH JOHN GEORGE JOHN MARTIN" the sets are: A={ JOHN, SMITH } B={ JOHN, JOHN(second), SMITH, GEORGE, MARTIN } result = |(A \ B) U (B \ A)| = |{ GEORGE, MARTIN, JOHN(second) }| = 3 relative result = 3 / |(A U B)| = 3 / |{ JOHN, JOHN(second), SMITH, GEORGE, MARTIN }| = 3 / 5 = 0.6 |
symmetricDifferenceMultiSetExt |
The same as the symmetricDifferenceMultiSet function, but when no word is common for both sets (A & B = empty), the result of the function is "very big number" (VBN=1000000). The function has a relative variant. |
notSubset |
Compared strings are split into two sets of words (space is used as a separator character). Function returns 0 if one of the set is subset of the other one, otherwise it returns 1. Example: for strings "JOHN SMITH" and "JOHN BROWN" returns 1 (not subset). for strings "JOHN SMITH" and "JOHN JOHN" returns 0. |
notSubsetMultiSet |
Similar to the notSubset function, but repeated words in each string are assumed to be different. Example: it returns 1 for the strings "JOHN SMITH" and "JOHN JOHN" because of the two JOHN's in the second string. |
numberDistance |
Function returns the absolute difference between two integers, e.g. |a - b|. There is no relative variant for this function. |
anyIsTrue |
Returns 1 if at least one boolean value is true. There is no relative variant for this function. |
masterIsTrue |
Returns 1 if the boolean value for the center record is true. The value for slave record is ignored. There is no relative variant for this function. |
slaveIsTrue |
Returns 1 if the boolean value for the slave record is true. The value for center record is ignored. There is no relative variant for this function. |
bothAreNull |
Returns 1 if both values are null. There is no relative variant for this function. |
iWay Software |