This profiling step is used in statistical analysis of data. For each data column this step will compute statistics (values) such as minimum, maximum, standard and error values.
The profiling step is capable of multiple analytical operations in a single pass over multiple columns of input data.
All date types supported by DQC can be specified as long as they correspond to the applied date operations.
|
INTEGER, LONG |
DAY, DATETIME |
BOOLEAN |
STRING |
---|---|---|---|---|
Data Count |
yes |
yes |
yes |
yes |
Null Count |
yes |
yes |
yes |
yes |
Not Null Count |
yes |
yes |
yes |
yes |
Different Value Count |
yes |
yes |
yes |
yes |
Unique Value Count |
yes |
yes |
yes |
yes |
Sum |
yes |
- |
yes |
- |
Variance Definition . Result for DAY/DATETIME values are in squared days. |
yes |
yes |
- |
- |
Standard Deviation Definition . Result for DAY/DATETIME values are in days. |
yes |
yes |
- |
- |
Average |
yes |
yes |
yes |
- |
Median |
yes |
yes |
yes |
yes |
Quantile |
yes |
yes |
yes |
yes |
Maximum |
yes |
yes |
yes |
yes |
Minimum |
yes |
yes |
yes |
yes |
First X Values |
yes |
yes |
yes |
yes |
Last X Values |
yes |
yes |
yes |
yes |
<?xml version='1.0' encoding='UTF-8'?> <step className="cz.adastra.cif.tasks.profiling.ProfilingAlgorithm" id="pa"> <properties outputFile="output.profile" defaultLocale="en_US"> <inputs> <profilingInput name="party"> <dataToProfile> <profiledData expression="name"> <frequencyAnalysis calculate="true" mask="true" /> <standardStats extremeCount="5" quantilesCount="10" calculateAggegated="true" calculate="true"/> </profiledData> </dataToProfile> <bussinesRules> <rule name="prods_ok" expression="products_count > 0" /> </bussinesRules> <pkAnalysis> <item name="party_pk"> <components> <item expression="party_id" /> </components> </item> </pkAnalysis> <fkAnalysis> <item name="party_prod" parentInputName="product"> <components> <item localColumn="prod_id" parentColumn="product_id" /> </components> </item> </fkAnalysis> <rollUps> <item name="by_dept" expression="dept_id" /> </rollUps> </profilingInput> <profilingInput name="product"> <dataToProfile> <profiledData expression="branch"> <frequencyAnalysis calculate="true" mask="false" /> <standardStats calculateAggegated="false" calculate="false"/> </profiledData> </dataToProfile> </profilingInput> </inputs> </properties> </step>
iWay Software |