Abstract element Entity Component

In this section:

Definition of the component. This is the base definition common to all entity component types (see dictionary entity component, regular expression entity component, etc.).

Name

Type

Required

Description

Congruent

Boolean

Yes

Indicates that this component's value can be evaluated for proximity to nearby values in the input text if not found exactly or approximatively. This property currently works only for Czech PS values.

Input

Boolean

Yes

Indicates that this component is used in input examination, i.e. the component is used in searching for known values. Note that false values do not remove the component from examination other ways, e.g., searching using regular expressions.

Approximative

Boolean

Yes

Indicates that this component can approximately accept input text.

Common Name

String

Yes

Common name of the component. Common name means the name that generally specifies a group of components that have similar meaning (e.g. PS and ZIP). This name is used in pattern guessing.

Id

String

Yes

Identifier of the component.

Contains Numbers

Boolean

Yes

Flag indicating that this component may contain values composed solely from digits and that this component dictionary won't be used during the first step (examination), where an Aho-Corasick automaton is used. These components are supposed to be used only in the comparison phase and values for the supporting vector search are provided by means of simple scanning of the input text. Whole numbers are returned even if they possibly form a sequence of two or more existing numbers such as in 'Krizikova 20855 Praha'. In those cases the step is unable to split the number and must work with it as is. To find the numbers, you must define another component that doesn't have this flag set. However, since dictionary numbers tend to exist in substrings, you may get many matching vectors, which can slow down the step processing significantly.



x
Implementations

Implementation

Description

Dictionary Entity Component

Regexp Entity Component

Dictionary Regular Expression Component

Union Entity Component


iWay Software