This step is used to obtain a representative data sample from input data. It computes sizes of groups of records with the same keys. For each cluster of same-sized groups it selects a number of groups based on the percentage given. The sample groups are selected uniformly from the cluster in order to obtain a uniform data distribution across the whole input data source. If the percentage is too low to cover at least one whole group, only a number of records from the beginning of the chosen group is selected. More grouping rules can be defined to be applied to the records. Same-sized groups within a cluster can optionally be sorted.
<step id='alg' className='cz.adastra.cif.tasks.experimental.datasampler.DataSamplerAlgorithm'> <properties> <groups> <dataSamplerGroup name='group_a' when='system="alfa"' percentage="30" > <keyComponents> <keyComponent expression="key1"></keyComponent> <keyComponent expression="key2"></keyComponent> </keyComponents> </dataSamplerGroup> <dataSamplerGroup name='group_b' when='system="beta"' percentage="15" > <keyComponents> <keyComponent expression="key1"></keyComponent> </keyComponents> <sorting> <orderBy expression="key1"/> <orderBy expression="key2"/> </sorting> </dataSamplerGroup> </groups> </properties> </step>
iWay Software |