This step is capable of multiple analytical operations in a single pass over multiple columns of input data. All date types supported by DQC can be used as long as they correspond to the applied date operations.
|
FLOAT, INTEGER, LONG |
DAY, DATETIME |
BOOLEAN |
STRING |
---|---|---|---|---|
Record Count |
yes |
yes |
yes |
yes |
Null Count |
yes |
yes |
yes |
yes |
Not Null Count |
yes |
yes |
yes |
yes |
Distinct Value Count |
yes |
yes |
yes |
yes |
Unique Value Count |
yes |
yes |
yes |
yes |
Sum |
yes |
- |
yes |
- |
Average |
yes |
yes |
yes |
- |
Median |
yes |
yes |
yes |
yes |
Standard Deviation |
yes |
yes |
- |
- |
Variance |
yes |
yes |
- |
- |
Fist X values |
yes |
yes |
yes |
yes |
Last X values |
yes |
yes |
yes |
yes |
Minimum length of sequence |
- |
- |
- |
yes |
Minimum length of non-empty sequence |
- |
- |
- |
yes |
Average length of sequence |
- |
- |
- |
yes |
Median length of sequence |
- |
- |
- |
yes |
Maximum length of sequence |
- |
- |
- |
yes |
Quantile Value |
yes |
yes |
yes |
yes |
Step outputs:
For each statistic computed by DQC a single row is returned, as long as the statistic does not have the parameter count, or an input supplied by the parameter count.
<step id='alg' className='cz.adastra.cif.tasks.analysis.statistics.StatisticsAlgorithm'> <properties> <statName>stat_name</statName> <statDistinction>stat_distinction</statDistinction> <defaultLocale>cs_CZ</defaultLocale> <statistics> <statistic> <expression>numeric_value</expression> <columnStatistics> <columnStatistic name="Record Number(Count)" type="count" /> <columnStatistic name="Sum" type="sum" /> <columnStatistic name="Median" type="median" /> <columnStatistic name="Average" type="avg" /> <columnStatistic name="Variace" type="var" /> <columnStatistic name="Standard Deviation" type="std" /> <columnStatistic name="Distinct Count" type="distinct" /> <columnStatistic name="Unique Count" type="unique" /> <columnStatistic name="Null Count" type="count_nulls" /> <columnStatistic name="Not-Null Count" type="count_not_nulls" /> <columnStatistic name="First 3" type="first_x" count="3" /> <columnStatistic name="Last 5" type="last_x" count="5" /> </columnStatistics> </statistic> <statistic locale="en_US"> <expression>string_value</expression> <columnStatistics> <columnStatistic name="Median Seq Length" type="median_length"/> <columnStatistic name="Average Seq Length" type="avg_length"/> <columnStatistic name="20 40 60 80 Percentile" type="percentiles" count="4"/> </columnStatistics> </statistic> </statistics> </properties> </step>
iWay Software |