This step is capable of multiple analytical operations in a single pass over multiple columns of input data. All date types supported by DQC can be used as long as they correspond to the applied date operations.
|
|
FLOAT, INTEGER, LONG |
DAY, DATETIME |
BOOLEAN |
STRING |
|---|---|---|---|---|
|
Record Count |
yes |
yes |
yes |
yes |
|
Null Count |
yes |
yes |
yes |
yes |
|
Not Null Count |
yes |
yes |
yes |
yes |
|
Distinct Value Count |
yes |
yes |
yes |
yes |
|
Unique Value Count |
yes |
yes |
yes |
yes |
|
Sum |
yes |
- |
yes |
- |
|
Average |
yes |
yes |
yes |
- |
|
Median |
yes |
yes |
yes |
yes |
|
Standard Deviation |
yes |
yes |
- |
- |
|
Variance |
yes |
yes |
- |
- |
|
Fist X values |
yes |
yes |
yes |
yes |
|
Last X values |
yes |
yes |
yes |
yes |
|
Minimum length of sequence |
- |
- |
- |
yes |
|
Minimum length of non-empty sequence |
- |
- |
- |
yes |
|
Average length of sequence |
- |
- |
- |
yes |
|
Median length of sequence |
- |
- |
- |
yes |
|
Maximum length of sequence |
- |
- |
- |
yes |
|
Quantile Value |
yes |
yes |
yes |
yes |
Step outputs:
For each statistic computed by DQC a single row is returned, as long as the statistic does not have the parameter count, or an input supplied by the parameter count.
<step id='alg' className='cz.adastra.cif.tasks.analysis.statistics.StatisticsAlgorithm'>
<properties>
<statName>stat_name</statName>
<statDistinction>stat_distinction</statDistinction>
<defaultLocale>cs_CZ</defaultLocale>
<statistics>
<statistic>
<expression>numeric_value</expression>
<columnStatistics>
<columnStatistic name="Record Number(Count)" type="count" />
<columnStatistic name="Sum" type="sum" />
<columnStatistic name="Median" type="median" />
<columnStatistic name="Average" type="avg" />
<columnStatistic name="Variace" type="var" />
<columnStatistic name="Standard Deviation" type="std" />
<columnStatistic name="Distinct Count" type="distinct" />
<columnStatistic name="Unique Count" type="unique" />
<columnStatistic name="Null Count" type="count_nulls" />
<columnStatistic name="Not-Null Count" type="count_not_nulls" />
<columnStatistic name="First 3" type="first_x" count="3" />
<columnStatistic name="Last 5" type="last_x" count="5" />
</columnStatistics>
</statistic>
<statistic locale="en_US">
<expression>string_value</expression>
<columnStatistics>
<columnStatistic name="Median Seq Length" type="median_length"/>
<columnStatistic name="Average Seq Length" type="avg_length"/>
<columnStatistic name="20 40 60 80 Percentile" type="percentiles" count="4"/>
</columnStatistics>
</statistic>
</statistics>
</properties>
</step>
| iWay Software |