Column Analysis

Basic Analyses

The Basic tab provides simple statistics about the data that has been profiled and shows a chart of duplicate and distinct data as a percentage of the whole, as shown in the following image.

Top of page

Interpreting Counts

The Counts table lists the following values:

Null. Includes all data that is empty or has a value of null.
Non-null. Includes all data that is not empty or does not have a value of null. This data is divided into two categories, duplicate and distinct.
- Duplicate. Specifies the number of values that are the same as other values in the list.
- Distinct. Specifies the number of non-null values that are different from each other. This data is divided into two categories, non-unique, and unique.
  - Non-unique. Specifies the number of values that have at least one duplicate in the list.
  - Unique. Specifies the number of values that have no duplicates.

To understand the meaning of these values, consider the data in the following table.

Record No.

Value

1

John Smith

2

John Smith

3

Rebecca Davis

4

Paul Adams

5

Record No.	Value
1	John Smith
2	John Smith
3	Rebecca Davis
4	Paul Adams
5

The Counts table for this data is represented in the following table.

Type

Count

Record

Description

Null

1

Record 5

The last record is empty.

Non-null

4

Records 1-4

The first 4 records contain data.

Duplicate

1

Record 2

There is one duplicate of the John Smith record.

Distinct

3

Records 1, 3, 4

There are 3 non-null values that are different from each other.

Non-unique

1

Record 1

John Smith has a duplicate record. Therefore, it is not unique.

Unique

2

Records 3 and 4

Rebecca Davis and Paul Adams appear only once in the list; they have no duplicates.

Type	Count	Record	Description
Null	1	Record 5	The last record is empty.
Non-null	4	Records 1-4	The first 4 records contain data.
Duplicate	1	Record 2	There is one duplicate of the John Smith record.
Distinct	3	Records 1, 3, 4	There are 3 non-null values that are different from each other.
Non-unique	1	Record 1	John Smith has a duplicate record. Therefore, it is not unique.
Unique	2	Records 3 and 4	Rebecca Davis and Paul Adams appear only once in the list; they have no duplicates.

Frequency Analysis

The Frequency tab displays the Frequency Analysis. It shows the number of times that each value in the data occurs. This statistic is shown as both an absolute count and as a percentage of the whole, as shown in the following image.

Domain Analysis

The Domains tab displays the Domain Analysis. This analysis determines the type of data likely to be found in each column, for example, whether the data is text, a number, or a date. The probable types are listed, along with exceptions. An example of an exception is a text string that occurs in a list of dates.

The following image shows a sample of Domain Analysis.

Mask Analysis

The Mask tab displays the Mask Analysis. It shows the syntactic patterns of the data, that is, the structure of the data rather than the content of the data. Codes (masks) are used to describe these patterns.

For example, the code W is used by default to represent a word, while the code L is used to represent a letter. You can define the number of letters required to make a word in the Profiling step properties.

This type of analysis can be useful when, for example, you are looking at a column of names, in which one or two words are common, but single letters and numbers are not. Finding unexpected patterns in the data can provide information about the overall level of quality of the data.

The following image shows a sample of Mask Analysis.

Quantiles

The Quantiles tab shows Quantile Samples (based on n values). These are the data values that occur at designated intervals in the ordered data set. The first value in the list is at 0%, and the last value is at 100%. The median value is at the 50% marker.

The following image is an example of Quantile Samples.

Group Frequency Analysis

The Groups tab displays the Group Frequency Analysis. This presents a different analysis of the data in the Frequency tab. Group Frequency Analysis shows the number of times that each non-null frequency count is repeated. If all values are unique, the group size is 1, as there are no duplicate values. Each time a value is repeated, it forms a new group.

The following image shows a sample of Group Frequency Analysis.