Magnify Protocols for Indexing Documents

When sending data to Magnify for indexing, it must be formatted as an XML document following a specific feed protocol. The xml document contains a header and record section.

The following image illustrates the header and record elements in the XML document.

Note: The IEI Feed Agent and FORMAT MAGNIFY command generates the final document in adherence with the Magnify protocols. However, the developer must prepare the data in accordance with these protocols.

The header section contains the following document-level Magnify feed properties:

The record element contains attributes that define record-level Magnify feed properties and the content being indexed. The information contained in the record element varies for each protocol. However, the record element defines the following for each protocol:

The content document is generated using the iWay Process Transformation described in Creating the HTML Document or the FORMAT MAGNIFY alias naming.

Depending on the type of data being indexed within the content document, Magnify requires the data to be packaged following a specific protocol. Magnify uses the following protocols for accepting documents from a feed process:

Reference: Record Protocol

The record protocol is used for structured and semi-structured data sources like database records. It is required that the record attribute, mime type, be set to text/plain. The document inserted into the content section is also an XML document with a Target_Root element and a HEAD section. The following HEAD section are:

TITLE. Is the text assigned as the Search Results main link text. This can be enriched with HTML
META TAG. Is the field name and its value stored in the index with the search result. For more information on the available meta tags, see Creating the HTML Document.
BODY. Content indexed and made available for searching.

Note: The BODY element is stored as IBI_CONTENT in the Magnify index library, which can be accessed using tools, such as Lucene Luke.

The following image illustrates a decoded document that can be indexed using the record protocol.

Note: Magnify requires base64 encoded content and an encoded record URL.

Reference: URL Protocol

The URL protocol is used for web-accessible files and is recommended for larger files. Magnify fetches the document and reads in and indexes the content. It is required that the record attribute, mime type, be set to application/openurl. Magnify locates the file based on the URL attribute value of the record. If a URL cannot be accessed or indexed, it is logged in the application server log files. The document inserted into the content section is also an XML document with an ENCODEDDOCUMENT root element containing HEAD, DOCUMENT, and AUTHENTICATION sections.

The following elements are contained in the HEAD section:

TITLE. Is the text assigned as the Search Results main link text. This can be enriched with HTML.
META TAG. Is the field name and its value stored in the index with the search result. For more information on the available meta tags, see Creating the HTML Document.

The Document section contains the following attributes:

Password. Required if the file is password protected. The password is used to read the file for indexing and is optional.
Mimetype. Must be set to file/auto. The document is passed to the Magnify parser to process various file types based on information found in the document header.

The content element is empty, since Magnify fetches the content based on the URL attribute value of the record.

The Authentication section contains the wwwauthenticateuserid and wwwauthenticatepassword attributes, which are used to access the domain where the document is located.

The contents of the document indexed are stored as IBI_CONTENT in the Magnify index library which can be accessed using tools such as Lucene Luke.

The following image illustrates a decoded document that can be indexed using the URL protocol.

Note: Magnify requires base64 encoded content and an encoded record URL.

Reference: Document Protocol

The document protocol is used when files can be embedded into the document that is being indexed. Magnify reads in and indexes the content of the document. The mime type record attribute must be set to application/encodeddocument.

The document inserted into the content section is an XML document with an ENCODEDDOCUMENT root element containing a HEAD and DOCUMENT section.

The following elements are contained in the HEAD section:

TITLE. Is the text assigned as the Search Results main link text. This can be enriched with HTML.
META TAG. Is the field name and its value stored in the index with the search result. For more information on the available meta tags, see Creating the HTML Document.

The DOCUMENT section contains attributes about the embedded file within the Document tags. Encoding is required to be set to base64binary. The mimetype must be set to file/auto. The fetched document is passed to the Magnify parser to process various file types based on information natively found in the document header. A password is required if the file is password protected. The password is used to read the file for indexing and is optional.

The contents of the document indexed are stored as IBI_CONTENT in the Magnify index library which can be accessed using tools such as Lucene Luke.

The following image illustrates a decoded document that can be indexed using the Document protocol.

Note: Magnify requires base64 encoded content and an encoded record URL.

Embedding files into the Magnify feed document can be done using the file object in the iWay Service Manager tool. For more information on the iWay Process Flow, see Creating the Indexing Process Flow. The embedded file must be base64 encoded. This results in a double-encoded document prior to being sent to Magnify as part of the base64 encoding of the document within the content section.