Candidate Groups

In this section:

There are four methods for establishing candidate groups. Each method defines one or more keys for each record. A key can be composed of one or more components that are the result of expressions evaluated on the record. Keys are assumed to be empty if all their components are null, or according to a special no-key condition.

Each candidate group is identified by a number called a Candidate ID.


Top of page

x
Basic Method: SimpleKey

The candidate group consists of records with the same single key.

Definition:

Records Z and Y belong to one candidate group, when key(Z) = key(Y) and this key is not empty.

Example: The following illustrate the basic method.

Key

Group

Paris

1

London

2

New York

3

London

2



x
Symmetric Merging Method: Union

There are several defined keys and each of them has the no-key condition. The candidate groups consist of records that have at least one equal key and are non-empty.

Definition:

Assume keyn(Z) is the nth key of record Z. Then records Z and Y belong to one candidate group when keyi(Z) = keyi(Y) and this key is non-empty for some values of i.

The previous SimpleKey method can be considered a special case of the Union method with just one key.

Example: The following illustrate the symmetric merging method.

Key 1

Key 2

Group

John

Smith

1

George

Smith

1

Isaac

Newton

2

George

Washington

1



x
Hierarchical Merging Method: Hierarchical / ClassicHierarchical

For this method, there are two defined keys, the primary key and secondary key. There are no-key conditions for both of them. This method is intended for widening primary groups (based on the primary key) with additional records having an empty primary key, but belonging to the same secondary group (based on the secondary key) as a record from the primary group.

Note: In this context, the term primary key means the key that determines the primary grouping. The usual meaning is the unique key of a particular record in a database.

Definition:

Assume that P(Z) is the primary key and S(Z) is the secondary key of record Z, and G(prim=p) is a candidate group for the non-empty primary key p. The following apply:

This method has two variants that differ in the way that the primary and secondary keys and no-key conditions are defined. The Hierarchical variant defines general keys, which can be assembled from any components and general no-key conditions. The ClassicHierarchical variant is based on common usage of a hierarchical method, when the primary and secondary groups are candidate or client groups of two preceding unifications and no-key conditions are firmly derived from related unification roles.

Example: This following illustrate the hierarchical merging method.

Primary Key

Secondary Key

Group

Note

Spanish

Mexico

1

English

Canada

2

Mexico

1

Appended to Spanish by Mexico.

French

Canada

3

Canada

4

Cannot append by Canada, ambiguous English x French.

Spanish

1

Grouping by primary even though the secondary is empty.

English

USA

2

Grouping by primary even though the secondaries are different.



x
Hierarchical With Union Merging Method: HierarchicalUnion

This method is a modification of the Hierarchical Merging method. It defines one primary key but several secondary keys, which are likewise used in the Union method to assemble the secondary group. According to the second condition of the Hierarchical Merging method, the record with the empty primary key can be appended to a primary group if there is a chain of such records, with each having another equal secondary key and the chain leads unambiguously to the primary group.

Example: The following illustrate the hierarchical with union merging method.

Primary Key

Secondary Key

Group

Note

Madrid

Spain

1

 

Toledo

Corrida, Spain

2

 

Cow, Bull

2

Appended to Toledo by chain Bull-Corrida.

Sevilla

 

 

Spain, Flamingo

3

 

Flamingo

3

Appended to Sevilla by Flamingo.

Bull, Corrida

2

Appended to Toledo by Corrida.

Spain

4

Cannot append by Spain, ambiguous.


iWay Software