Element Separator Config

Name

Type

Required

Description

Digits

Boolean

Yes

Defines whether digits should serve as delimiters. Default value: false

Lower Case Letters

Boolean

Yes

Defines whether lowercase letters should serve as delimiters. Default value: false

Upper Case Letters

Boolean

Yes

Defines whether uppercase letters should serve as delimiters. Default value: false

Include Separators

String

No

Defines characters (exceptions) that should serve as separators irrespective of the value of the properties digits, lowerCaseLetters and upperCaseLetters.

Exclude Separators

String

No

Defines characters (exceptions) that should not serve as separators irrespective of the value of the properties digits, lowerCaseLetters and upperCaseLetters.



Example: Example

Consider the following input string:

value

'Abcdef Ghi2jkl-Mnop'

and one of the following separator configurations:

example

separator configuration

description

Separator_1

empty

All characters except letters and digits serve as a separator

Separator_2

<separatorConfig excludeSeparators='-' />

All characters except letters and digits and except the character '-' serve as a separator

Separator_3

<separatorConfig lowerCaseLetters='true' />

All characters except uppercase letters and digits serve as a separator

By applying these three configurations to the input string, three sets of words are created:

separator definition used

output words

Separator_1

'Abcdef', 'Ghi2jkl', 'Mnop'

Separator_2

'Abcdef', 'Ghi2jkl-Mnop'

Separator_3

'A', 'G', '2', 'M'

<step id='splitter' className='cz.adastra.cif.tasks.text.Splitter'>
        <binding name='allSentenceColumn' column='all_value' />
        <binding name='oneWordColumn' column='words' />
        <properties>
                <separatorConfig
                        excludeSeparators='-.' />
        </properties>
</step>

Consider an input with the following columns and records:

id

all_value

words

2547

Jan Novak

2548

Marie P. Dvorakova-Fialova

after applying the Splitter step with the configuration above, the following output is generated:

id

all_value

words

2547

Jan Novak

Jan

2547

Jan Novak

Novak

2548

Marie P. Dvorakova-Fialova

Marie

2548

Marie P. Dvorakova-Fialova

P.

2548

Marie P. Dvorakova-Fialova

Dvorakova-Fialova


iWay Software