Column Analysis

Basic Analysis

The Basic tab provides simple statistics about the data that has been profiled and shows a chart of duplicate and distinct data as a percentage of the whole.

Some of this data is shown in the Basic tab of the Profile viewer.

Top of page

Interpreting Counts

The Counts table lists the following values:

Null. This value refers to all data that is empty or have Null as their value.
Non-null. This value refers to all data that is not empty or null (duplicate + distinct).
Duplicate. This shows the number of values that are the same as other values in the list.
Distinct. This refers to the number of non-null values that are different from each other (non-unique + unique).
Non-unique. This is the number of values that have at least one duplicate in the list.
Unique. This is the number of values that have no duplicates.

The following data table is an example that illustrates the meaning of the above values.

Record No.

Value

1

John Smith

2

John Smith

3

Rebecca Davis

4

Paul Adams

5

Record No.	Value
1	John Smith
2	John Smith
3	Rebecca Davis
4	Paul Adams
5

The following table shows the Counts for this data.

Type

Count

Records

Explanation

Null

1

Record 5

The last record is empty.

Non-null

4

Records 1-4

The first four records contain data.

Duplicate

1

Record 2

There is one duplicate of the John Smith record.

Distinct

3

Records 1, 3, 4

Non-unique

1

Record 1

John Smith has a duplicate record, and is therefore not unique.

Unique

2

Records 3 and 4

Rebecca Davis and Paul Adams appear only once in the list. They have no duplicates.

Type	Count	Records	Explanation
Null	1	Record 5	The last record is empty.
Non-null	4	Records 1-4	The first four records contain data.
Duplicate	1	Record 2	There is one duplicate of the John Smith record.
Distinct	3	Records 1, 3, 4
Non-unique	1	Record 1	John Smith has a duplicate record, and is therefore not unique.
Unique	2	Records 3 and 4	Rebecca Davis and Paul Adams appear only once in the list. They have no duplicates.

Frequency

The Frequency Analysis tab shows the number of times each value in the data occurs (both as an absolute count as a percentage of the whole).

Domains

This is an analysis to determine the likely type of the data in each column (for example, whether the data is text, a number, or a date). The probable types are listed, along with exceptions (for example, a text string found in a list of dates).

Mask Analysis

The Mask Analysis tab shows the syntactic patterns of data, for example. the structure of the data rather than the content of the data. Codes (masks) are used to describe these patterns. For example, the code W is used by default to represent a word (the number of letters required to make a word can be defined in the Profiling Step properties), while L is used to represent a letter. This type of analysis can be useful when, for example, looking at a column of names where one or two words are common, but single letters and numbers are not. Finding unexpected patterns in the data can provide information about the overall level of quality of the data.

The following image shows a sample Mask analysis output.

Quantiles

The quantiles displays the data values that occur at designated intervals in the ordered data set. The first value in the list is at 0% and the last value is at 100%. The median value is at the 50% marker.

Groups

The Groups tab presents a different analysis of the data in the Frequency tab. It shows the number of times that each non-null frequency count is repeated. If all values are unique, the group size will be 1, as there are no duplicate values. Each time a value is repeated, it forms a new group.