Element Separator Config

Name	Type	Required	Description
Digits	Boolean	Yes	Defines whether digits should serve as delimiters. Default value: false
Lower Case Letters	Boolean	Yes	Defines whether lowercase letters should serve as delimiters. Default value: false
Upper Case Letters	Boolean	Yes	Defines whether uppercase letters should serve as delimiters. Default value: false
Include Separators	String	No	Defines characters (exceptions) that should serve as separators irrespective of the value of the properties digits, lowerCaseLetters and upperCaseLetters.
Exclude Separators	String	No	Defines characters (exceptions) that should not serve as separators irrespective of the value of the properties digits, lowerCaseLetters and upperCaseLetters.

Top of page

Example: Example

Consider the following input string:

value
'Abcdef Ghi2jkl-Mnop'

and one of the following separator configurations:

example	separator configuration	description
Separator_1	empty	All characters except letters and digits serve as a separator
Separator_2	<separatorConfig excludeSeparators='-' />	All characters except letters and digits and except the character '-' serve as a separator
Separator_3	<separatorConfig lowerCaseLetters='true' />	All characters except uppercase letters and digits serve as a separator

By applying these three configurations to the input string, three sets of words are created:

separator definition used	output words
Separator_1	'Abcdef', 'Ghi2jkl', 'Mnop'
Separator_2	'Abcdef', 'Ghi2jkl-Mnop'
Separator_3	'A', 'G', '2', 'M'

<step id='splitter' className='cz.adastra.cif.tasks.text.Splitter'>
        <binding name='allSentenceColumn' column='all_value' />
        <binding name='oneWordColumn' column='words' />
        <properties>
                <separatorConfig
                        excludeSeparators='-.' />
        </properties>
</step>

Consider an input with the following columns and records:

id	all_value	words
2547	Jan Novak
2548	Marie P. Dvorakova-Fialova

after applying the Splitter step with the configuration above, the following output is generated:

id	all_value	words
2547	Jan Novak	Jan
2547	Jan Novak	Novak
2548	Marie P. Dvorakova-Fialova	Marie
2548	Marie P. Dvorakova-Fialova	P.
2548	Marie P. Dvorakova-Fialova	Dvorakova-Fialova

iWay Software