Column Analysis

In this section:

The Data Analysis tab presents statistical analyses and pattern information about the data. Each column in the input data is listed as a row in the table, which presents information such as data type, value counts, and minimum/maximum values.


Top of page

x
Basic Analysis

The Basic tab provides simple statistics about the data that has been profiled and shows a chart of duplicate and distinct data as a percentage of the whole.

Some of this data is shown in the Basic tab of the Profile viewer.



x
Interpreting Counts

The Counts table lists the following values:

The following data table is an example that illustrates the meaning of the above values.

Record No.

Value

1

John Smith

2

John Smith

3

Rebecca Davis

4

Paul Adams

5

The following table shows the Counts for this data.

Type

Count

Records

Explanation

Null

1

Record 5

The last record is empty.

Non-null

4

Records 1-4

The first four records contain data.

Duplicate

1

Record 2

There is one duplicate of the John Smith record.

Distinct

3

Records 1, 3, 4

Non-unique

1

Record 1

John Smith has a duplicate record, and is therefore not unique.

Unique

2

Records 3 and 4

Rebecca Davis and Paul Adams appear only once in the list. They have no duplicates.



x
Frequency

The Frequency Analysis tab shows the number of times each value in the data occurs (both as an absolute count as a percentage of the whole).


Top of page

x
Domains

This is an analysis to determine the likely type of the data in each column (for example, whether the data is text, a number, or a date). The probable types are listed, along with exceptions (for example, a text string found in a list of dates).


Top of page

x
Mask Analysis

The Mask Analysis tab shows the syntactic patterns of data, for example. the structure of the data rather than the content of the data. Codes (masks) are used to describe these patterns. For example, the code W is used by default to represent a word (the number of letters required to make a word can be defined in the Profiling Step properties), while L is used to represent a letter. This type of analysis can be useful when, for example, looking at a column of names where one or two words are common, but single letters and numbers are not. Finding unexpected patterns in the data can provide information about the overall level of quality of the data.

The following image shows a sample Mask analysis output.


Top of page

x
Quantiles

The quantiles displays the data values that occur at designated intervals in the ordered data set. The first value in the list is at 0% and the last value is at 100%. The median value is at the 50% marker.


Top of page

x
Groups

The Groups tab presents a different analysis of the data in the Frequency tab. It shows the number of times that each non-null frequency count is repeated. If all values are unique, the group size will be 1, as there are no duplicate values. Each time a value is repeated, it forms a new group.


iWay Software