|
Name |
Type |
Required |
Description |
|---|---|---|---|
|
Digits |
Boolean |
Yes |
Defines whether digits should serve as delimiters. Default value: false |
|
Lower Case Letters |
Boolean |
Yes |
Defines whether lowercase letters should serve as delimiters. Default value: false |
|
Upper Case Letters |
Boolean |
Yes |
Defines whether uppercase letters should serve as delimiters. Default value: false |
|
Include Separators |
String |
No |
Defines characters (exceptions) that should serve as separators irrespective of the value of the properties digits, lowerCaseLetters and upperCaseLetters. |
|
Exclude Separators |
String |
No |
Defines characters (exceptions) that should not serve as separators irrespective of the value of the properties digits, lowerCaseLetters and upperCaseLetters. |
Consider the following input string:
|
value |
|---|
|
'Abcdef Ghi2jkl-Mnop' |
and one of the following separator configurations:
|
example |
separator configuration |
description |
|---|---|---|
|
Separator_1 |
empty |
All characters except letters and digits serve as a separator |
|
Separator_2 |
<separatorConfig excludeSeparators='-' /> |
All characters except letters and digits and except the character '-' serve as a separator |
|
Separator_3 |
<separatorConfig lowerCaseLetters='true' /> |
All characters except uppercase letters and digits serve as a separator |
By applying these three configurations to the input string, three sets of words are created:
|
separator definition used |
output words |
|---|---|
|
Separator_1 |
'Abcdef', 'Ghi2jkl', 'Mnop' |
|
Separator_2 |
'Abcdef', 'Ghi2jkl-Mnop' |
|
Separator_3 |
'A', 'G', '2', 'M' |
<step id='splitter' className='cz.adastra.cif.tasks.text.Splitter'>
<binding name='allSentenceColumn' column='all_value' />
<binding name='oneWordColumn' column='words' />
<properties>
<separatorConfig
excludeSeparators='-.' />
</properties>
</step>Consider an input with the following columns and records:
| id | all_value | words |
|---|---|---|
| 2547 | Jan Novak | |
| 2548 | Marie P. Dvorakova-Fialova |
after applying the Splitter step with the configuration above, the following output is generated:
id | all_value | words |
|---|---|---|
| 2547 | Jan Novak | Jan |
| 2547 | Jan Novak | Novak |
| 2548 | Marie P. Dvorakova-Fialova | Marie |
| 2548 | Marie P. Dvorakova-Fialova | P. |
| 2548 | Marie P. Dvorakova-Fialova | Dvorakova-Fialova |
| iWay Software |