Avro File Iterator Service

Syntax:

com.ibi.agents.XDIterAvroFile

Description:

The Avro File Iterator opens an Avro container in the input document or in a file, and returns the objects it contains one by one. The object is converted to XML according to the rules documented for the Avro File Read service. Since the iterated document will only contain one object, the root element will always be av:item.

Avro requires the presence of a schema. The Avro File Iterator service can use the schema always stored in the container, or it can specify a reader schema, in which case Avro will do its best to reconcile the two schemas. The effective schema is stored in the output document, so it can serve as a default for the Avro File Emit service.

The path to the Avro Schema or the Avro Data File can be a regular path in the file system, or a URL starting with hdfs://, which indicates the file is in the Hadoop file system. When the Hadoop file system is used, the parameters Hadoop Configuration and Default File System can be optionally specified, otherwise they are ignored.

The Output Document parameter determines the final document. It can be a status document, the original document, the result of the last iteration or an accumulation. When accumulating, the Accumulation Root parameter determines the root element. Select av:avro when accumulating av:item elements into an XML document that represents an Avro container. This is the case with Iterated Accumulation (because the iterator always produces av:item elements) or with Loop Accumulation when the loop does not modify the root element. Selecting av:avro will also set the schema in the final document. That schema can serve as a default in an Avro File Emit service.

Parameters:

The following table lists and describes the parameters of the Avro File Iterator service.

Parameter

Description

Avro Schema

Path to the Avro Schema file. If absent, the schema stored with the data will be used.

Input Source

Whether the Avro data is in the input document or in a file.

Avro Data File

Path to the Avro data file. Ignored if the Input Source is Input Document.

Hadoop Configuration

Path to the Hadoop configuration file, normally core-site.xml

Default File System

In some Hadoop environments, this should be specified as the URI of the namenode, for example:

hdfs://[your namenode]

Output Document

The final document emitted is a status document, which is the original document or the result of the last iteration or an accumulation. Accumulations are memory intensive.

Accumulation Root

Determines the root element of an accumulation. Select av:avro when accumulating av:item elements. Otherwise, select accumulation. This is ignored if the Output Document is not accumulating.


iWay Software