Candidate Groups

There are four methods for establishing candidate groups. Each method defines one or more keys for each record. A key can be composed of one or more components that are the result of expressions evaluated on the record. Keys are assumed to be empty if all their components are null, or according to a special no-key condition.

Basic Method: SimpleKey

The candidate group consists of records with the same single key.

Definition:

Records Z and Y belong to one candidate group, when key(Z) = key(Y) and this key is not empty.

Example: The following illustrate the basic method.

Key	Group
Paris	1
London	2
New York	3
London	2

Symmetric Merging Method: Union

There are several defined keys and each of them has the no-key condition. The candidate groups consist of records that have at least one equal key and are non-empty.

Definition:

Assume keyn(Z) is the nth key of record Z. Then records Z and Y belong to one candidate group when keyi(Z) = keyi(Y) and this key is non-empty for some values of i.

The previous SimpleKey method can be considered a special case of the Union method with just one key.

Example: The following illustrate the symmetric merging method.

Key 1	Key 2	Group
John	Smith	1
George	Smith	1
Isaac	Newton	2
George	Washington	1

Hierarchical Merging Method: Hierarchical / ClassicHierarchical

For this method, there are two defined keys, the primary key and secondary key. There are no-key conditions for both of them. This method is intended for widening primary groups (based on the primary key) with additional records having an empty primary key, but belonging to the same secondary group (based on the secondary key) as a record from the primary group.

Note: In this context, the term primary key means the key that determines the primary grouping. The usual meaning is the unique key of a particular record in a database.

Definition:

Assume that P(Z) is the primary key and S(Z) is the secondary key of record Z, and G(prim=p) is a candidate group for the non-empty primary key p. The following apply:

All records Z with P(Z) = p belong to G(prim=p).
Record Z having empty P(Z) belongs to G(prim=p) if S(Z) is non-empty, and there is at least one record Y having P(Y) = p and S(Y) = S(Z), and there is no other record X having S(X) = S(Z), but P(X) is not equal to p (that is, the secondary key unambiguously connects records to only one primary group).
Records Z with empty P(Z), and non-empty S(Z) that equals s, which do not satisfy the rest of the previous rule, are collected into candidate group G(sec=s).

This method has two variants that differ in the way that the primary and secondary keys and no-key conditions are defined. The Hierarchical variant defines general keys, which can be assembled from any components and general no-key conditions. The ClassicHierarchical variant is based on common usage of a hierarchical method, when the primary and secondary groups are candidate or client groups of two preceding unifications and no-key conditions are firmly derived from related unification roles.

Example: This following illustrate the hierarchical merging method.

Primary Key	Secondary Key	Group	Note
Spanish	Mexico	1
English	Canada	2
English	Mexico	1	Appended to Spanish by Mexico.
French	Canada	3
French	Canada	4	Cannot append by Canada, ambiguous English x French.
Spanish		1	Grouping by primary even though the secondary is empty.
English	USA	2	Grouping by primary even though the secondaries are different.

Hierarchical With Union Merging Method: HierarchicalUnion

This method is a modification of the Hierarchical Merging method. It defines one primary key but several secondary keys, which are likewise used in the Union method to assemble the secondary group. According to the second condition of the Hierarchical Merging method, the record with the empty primary key can be appended to a primary group if there is a chain of such records, with each having another equal secondary key and the chain leads unambiguously to the primary group.

Example: The following illustrate the hierarchical with union merging method.

Primary Key	Secondary Key	Group	Note
Madrid	Spain	1
Toledo	Corrida, Spain	2
Toledo	Cow, Bull	2	Appended to Toledo by chain Bull-Corrida.
Sevilla	Spain, Flamingo	3
	Flamingo	3	Appended to Sevilla by Flamingo.
	Bull, Corrida	2	Appended to Toledo by Corrida.
	Spain	4	Cannot append by Spain, ambiguous.