Name |
Type |
Required |
Description |
---|---|---|---|
Digits |
Boolean |
Yes |
Defines whether digits should serve as delimiters. Default value: false |
Lower Case Letters |
Boolean |
Yes |
Defines whether lowercase letters should serve as delimiters. Default value: false |
Upper Case Letters |
Boolean |
Yes |
Defines whether uppercase letters should serve as delimiters. Default value: false |
Include Separators |
String |
No |
Defines characters (exceptions) that should serve as separators irrespective of the value of the properties digits, lowerCaseLetters and upperCaseLetters. |
Exclude Separators |
String |
No |
Defines characters (exceptions) that should not serve as separators irrespective of the value of the properties digits, lowerCaseLetters and upperCaseLetters. |
Consider the following input string:
value |
---|
'Abcdef Ghi2jkl-Mnop' |
and one of the following separator configurations:
example |
separator configuration |
description |
---|---|---|
Separator_1 |
empty |
All characters except letters and digits serve as a separator |
Separator_2 |
<separatorConfig excludeSeparators='-' /> |
All characters except letters and digits and except the character '-' serve as a separator |
Separator_3 |
<separatorConfig lowerCaseLetters='true' /> |
All characters except uppercase letters and digits serve as a separator |
By applying these three configurations to the input string, three sets of words are created:
separator definition used |
output words |
---|---|
Separator_1 |
'Abcdef', 'Ghi2jkl', 'Mnop' |
Separator_2 |
'Abcdef', 'Ghi2jkl-Mnop' |
Separator_3 |
'A', 'G', '2', 'M' |
<step id='splitter' className='cz.adastra.cif.tasks.text.Splitter'> <binding name='allSentenceColumn' column='all_value' /> <binding name='oneWordColumn' column='words' /> <properties> <separatorConfig excludeSeparators='-.' /> </properties> </step>
Consider an input with the following columns and records:
id | all_value | words |
---|---|---|
2547 | Jan Novak | |
2548 | Marie P. Dvorakova-Fialova |
after applying the Splitter step with the configuration above, the following output is generated:
id | all_value | words |
---|---|---|
2547 | Jan Novak | Jan |
2547 | Jan Novak | Novak |
2548 | Marie P. Dvorakova-Fialova | Marie |
2548 | Marie P. Dvorakova-Fialova | P. |
2548 | Marie P. Dvorakova-Fialova | Dvorakova-Fialova |
iWay Software |