Detailed Description of Data Sampler

This step is used to obtain a representative data sample from input data. It computes sizes of groups of records with the same keys. For each cluster of same-sized groups it selects a number of groups based on the percentage given. The sample groups are selected uniformly from the cluster in order to obtain a uniform data distribution across the whole input data source. If the percentage is too low to cover at least one whole group, only a number of records from the beginning of the chosen group is selected. More grouping rules can be defined to be applied to the records. Same-sized groups within a cluster can optionally be sorted.

Top of page

Example: Example

<step id='alg' className='cz.adastra.cif.tasks.experimental.datasampler.DataSamplerAlgorithm'>
        <properties>
                <groups>
                        <dataSamplerGroup name='group_a' when='system="alfa"' percentage="30" >
                                <keyComponents>
                                        <keyComponent expression="key1"></keyComponent>
                                        <keyComponent expression="key2"></keyComponent>
                                </keyComponents>
                        </dataSamplerGroup>
                        <dataSamplerGroup name='group_b' when='system="beta"' percentage="15" >
                                <keyComponents>
                                        <keyComponent expression="key1"></keyComponent>
                                </keyComponents>
                                <sorting>
                                        <orderBy expression="key1"/>
                                        <orderBy expression="key2"/>
                                </sorting>
                        </dataSamplerGroup>
                </groups>
        </properties>
</step>

iWay Software