XDNode and XDDom

Background

XML is represented in memory by trees of values and their characteristics, including the name by which the value is known and attributes associated with that value. The path to the value can uniquely select that value from amongst any others in the tree, or it can identify sets of related values.

The actual representation of the XML tree in memory is constructed by a parser that accepts a serialized tree and converts it into the memory representation. Nothing in the XML specification requires that a particular shape for the tree be implemented; usually the tree shape is dependent upon its use. In iWay core, the tree is represented by related elements called XDNodes, each containing one name and possibly one value, along with any associated attributes. The XDNodes are consistent, and differ only in the chaining of their relationships (i.e. is there a child?) and the values and attributes stored in the node.

An alternative implementation of nodes, which generally goes by the name DOM, represents the tree in a different manner. In the DOM representation nodes of different types are constructed differently, and each component of the tree is represented by a related node of the proper type.

The XDNode is particularly simple for use in Java programs, however over time the DOM representation has become a standard. It is used as a common form for passing an XML tree between unrelated components. Without the ability to directly pass a DOM tree, components that need this canonical form must parse a serialized XML document themselves. This leads to slower operation and usually increased memory use as the XDNode tree must be flattened and reparsed to obtain the needed DOM. In some cases the reverse operation must be performed when the generalized component has completed its work and the XDNode tree needs to be constructed to represent that result.

To reduce this overhead, iWay Software developed several components that apply standard DOM operations directly to XDNode trees. The most common of these is the xpath and XSLT operation. However, in some situation, it makes more sense to call the standard implementation of a standard rather than re-implement it with XDNode. For example, it is better to call the XML Digital Signature implementation bundled with JDK 1.6 rather than re-implement it from scratch. What is needed is a way to call the externally developed component based on the DOM without incurring all the overhead.

Accordingly, iWay provides a DOM interface to the XDNode tree. DOM itself is simply an interface and does not mandate any specific memory object structure. Every effort was made to insure that XDNode performance was not compromised, and that when operating as a DOM the performance be at least a good as that of Java's standard DOM. The result of this is the XDDom facility.

XDDom supports a large portion of the DOM specification both on the read and write side. Technically, The iWay DOM implements most DOM Level3 methods on DOM Level1 interfaces (i.e. no Level3 features like load and save). The few methods that were not implemented will throw when called.. In practice, this is rarely a limitation since Level 2 and 3 classes are rarely used. Performance is much better than the standard DOM implementation because XDDom assumes the caller is always right.

Use of external DOM components (such as XSLT and XMLDSig) has demonstrated that XDDom does meet the DOM specification. In the future, it's possible that situations may arise that require implementation of some portion of the interface that has not yet been covered.

DOM Implementation

Once you have an XDNode, you can ask for the DOM Node corresponding to this XDNode. Method name conflicts makes it undesirable to implement the DOM interface directly on the XDNode, therefore the DOM Node is a separate Java Object.

Node domNode = xdnode.getDomNode();

The relationship is maintained between the XDNode and its DOM Node. Changes you make in one will be reflected in the other and vice-versa. The way we achieve this is to store all the information in the XDNode. The DOM implementation is mostly accessors inside the XDNode. The DOM Node is created on demand and is remembered. You will get back the same DOM Node if you call getDomNode() repeatedly on the same XDNode, keeping referential identity. This is efficient if you need to call multiple DOM methods because the DOM objects will only be created once and only for the parts of the tree that are actually traversed.

The XDNode API supports an optional string value per Element. In the DOM, a string value is represented as a Text or CDATASection child Node. We go through great effort to make these two views consistent. Assigning a value to an XDNode will create a Text or CDATASection child Node in the DOM depending on the isCDATA() flag. Conversely, inserting a Text or CDATASection as the first child of a DOM Node will modify the XDNode value and its isCDATA() flag. Assigning a new string value to that first child will also change the XDNode value. We call this effect " Implied Text". The XML serialization is identical for both views:

XDNode xdNode = new XDNode("elem");
	Node domNode = xdNode.getDomNode();
	xdNode.setValue("xdValue");
	// domNode.getNodeValue() returns "xdValue"
	domNode.setNOdeValue("domValue");
	// xdNode.getValue() returns "domValue"
	// xdNode now flattens to <elem>domValue</elem>

DOM Mixed Content

The DOM has some concepts that did not exist in XDNode. Examples are mixed element content and the DOM Document. These require special considerations.

In a business transaction environment where iWay products operate, mixed element content is rare. Nevertheless, some APIs like XML Digital Signatures must preserve that information. XDNode can support a rigid form of mixed content where the appearance of the nodes is dictated by the type of Node: comments before processing instructions before single string value before child Elements. The DOM is more general and allows us to intersperse all these nodes in any order.

The existing code using the XDNode API does not expect to see anything other than Elements in the children list. To preserve that contract and the fantastic performance of the get accessors, we maintain the same pointers as before: firstchild, lastchild, left and right. This doubly linked list contains child Elements exclusively. To store the mixed content we add 4 new pointers: mixedFirst, mixedLast, mixedLeft and mixedRight. This new doubly linked list contains the complete mixed content in the order it appears. The items in the new list are XDNodes or one of the 4 new subclasses: XDNodeText, XDNodeCDATA, XDNodeComment, XDNodeProcInst. New get and set accessors were added for the 4 new pointers.

XDNodeText xdText = new XDNodeText("some text value");
	XDNodeComment xdComment = new XDNodeComment("some comment");
	XDNodeProcInst xdPI = new XDNodeProcInst("target", "data");
	XDNodeCDATA xdCDATA = new XDNodeCDATA("some cdata value");
	XDNode xdMixedElem = new XDNode("mixedElem");
	xdMixedElem.setMixedLast(xdText);
	xdMixedElem.setMixedLast(xdComment);
	xdMixedElem.setMixedLast(xdPI);
	xdMixedElem.setMixedLast(xdCDATA);
	// BEWARE IMPLIED TEXT BREAKS REFERENTIAL IDENTITY IN THE XDNODE VIEW
	// xdMixedElem.getMixedFirst() returns an XDNodeImpliedText instance, not XDText
	// xdMixedElem.getMixedLast() returns xdCDATA
	// xdPI.getMixedLeft() returns xdComment
	// xdPI.getMixedRight() returns xdCDATA

Most of the complexity of XDNode mixed content can be avoided if all you want to do is add some comments or processing instructions. The old methods are still there but they were modified to mean append to the end of the mixed content list. Beyond the obvious impact on the ordering, this also means a node maintains the comments and processing instructions inside of it, not above. New accessors have been added to make it more efficient to create the nodes.

xdNode.setComment("<!-- inefficient comment -->");
	xdNode.setProcessingInstruction("<?piTarget inefficient pi?>");
	xdNode.addText("some text");
	xdNode.addCdata("some cdata");
	xdNode.addComment("a comment");
	xdNode.addProcessingInstruction("piTarget", "piData");

If addText() is called before any other content is added, it will set the string value of the XDNode. If addText() is called when there is already a child node, it will create an XDNodeText child node instead, faithfully representing the mixed content created. Similarly, addCdata() will set the string value or create an XDNodeCDATA node depending on when it is called.

We expect the new pointers will be rarely accessed directly except internally through the DOM interface. Indeed, mixed content is mostly for comments and processing instructions which can be safely ignored (like the XDNode API is optimized to do). One place where it cannot be ignored is flattening. The serialization methods have been improved to serialize mixed content correctly without affecting performance in the normal case.

// flattening xdMixedElem above produces
	<mixedElem>some text value
	    <!-- some comment -->
	    <?target data?><![CDATA[some cdata value]]>
	</mixedElem>

DOM Document

In DOM, the DOM Document contains the information to recreate the XML Declaration, comments and processing instructions that precede the root Element and finally the root Element itself. The parent of the DOM root Element is the DOM Document or null if the root Element is not parented. Many standard APIs out there expect a DOM Document instead of the root Element. In XDNode, the notion of an all encompassing DOM Document is missing. We artificially recreated this notion with an XDNodeRootDoc. The weird thing about an XDNodeRootDoc is that you can add children but the child parent field remains null. Therefore the parent of the XDNode root Element is still null preserving the old XDNode contract. We do not expect users to create an XDNodeRootDoc frequently. Instead, ask for the DOM Document encompassing the XDNode. The method getDomDocument() returns its current DOM Document if it exists. The method getRootDocument() goes up the tree root, and always returns a DOM Document creating one if not found.

// returns the DOM Document above the root element or null if missing
	Document domDoc = xdnode.getDomDocument();
	// always returns a DOM Document, create one if missing
	Document domDoc = xdnode.getRootDocument();

The iWay XML parser creates an XDNodeRootDoc when it parses an XML document. It will actually contain the information preceding the root element as you expect. XDHandler.getResult() returns the root element as before. You can retrieve the XDNodeRootDoc with one of the methods above.

XDParser parser = new XDParser();
       	parse.parseIt(xmlStringInput);
       	XDNode xdNode = parser.getResult();
	XDNodeRootDoc xdRootDoc = xdNode.getRootDocument();

Creating a Tree from DOM

In general, converting a DOM Node to an XDNode requires the expensive serialization and reparse. In the very common case that we know the DOM Node was created by XDDom, we can be a lot faster. You can do this by casting the DOM Node and calling the getXDNode() method. Notice we can convert most DOM Nodes but not Attributes. The reason is simple. XDNodes are never attributes. They contain attributes but they cannot be attributes themselves.

Node domNode = ...
	Attr attrib = ...
	XDNode xdNode = ((XDDomNode)domNode).getXDNode();
	// ((XDDomNode)attrib).getXDNode() always returns null
	// If you convert back to the DOM, you get the same DOM Node
	// xdNode.getDomNode() returns domNode

The DOM is a set of interfaces. It cannot define constructors because it defines no classes. So how do you create a DOM tree? You call the DOM factory methods on a DOM Document. So how do you get a DOM Document to begin with? The DOM Level3 has a solution based on an environment variable but we did not implement it. We are stuck calling the XDDomDocument constructor directly. Please do not call any DOM constructor directly except for XDDomDocument. This is very, very bad style. Do this instead:

XDDomDocument doc = new XDDomDocument();        Element root = doc.createElement("root");        doc.appendChild(root);
        Attr attrib1 = doc.createAttribute("attrib1");
        attrib1.setValue("val1");
        root.setAttributeNode(attrib1);
        root.setAttribute("attrib2", "val2");
        Element elem1 = doc.createElement("elem1");
        Text text1 = doc.createTextNode("text1");
        elem1.appendChild(text1);
        root.appendChild(elem1);
        Comment comment1 = doc.createComment("comment1");
        root.appendChild(comment1);
        Text text2 = doc.createTextNode("text2");
        root.appendChild(text2);
        ProcessingInstruction procInst1 = 
doc.createProcessingInstruction("target1", "data1");        root.appendChild(procInst1);
        Text text3 = doc.createTextNode("text3");
        root.appendChild(text3);
        CDATASection cdata1 = doc.createCDATASection("cdata1");
        root.appendChild(cdata1);
        Text text4 = doc.createTextNode("text4");
        root.appendChild(text4);

It might seem unnatural that DOM does not define constructors but there is another huge advantage. Most if not all libraries using the DOM are built to support any DOM implementation. This means they cannot call constructors either. They must go through the DOM factory methods. All we have to do is pass our XDDomDocument somehow and we know that all DOM objects created by that library will create DOM objects from our own DOM implementation. Because we maintain the relationship between the XDNode view and the DOM View, we know that we can create an XDNode tree, pass the DOM view to a DOM library like XSLT which modifies the tree, safely cast the result to an XDDomNode to get the XDNode back, and voila. The XSLT library manipulated the XDNode tree without knowing it.

XDNode with the DOM

In most cases, conversion between XDNodes and the equivalent objects in DOM is a straightforward matter. The following illustrates the use of XDDom with the standard API for Xpath.

Top of page

XDNode with XPATH

Suppose indoc is an XDDocument that looks like this:

<root> 
<child1> 
        <grandchild>123</grandchild> 
</child1> 
<child1> 
        <grandchild>234</grandchild> 
</child1> 
<iw:child1 xmlns:iw='http://www.iwaysoftware.com'> 
        <iw:grandchild>456</iw:grandchild> 
</iw:child1> 
</root>

We want to use the core Java XPath API (javax.xml.xpath) to access the grandchild element in the iWay namespace. To get the value of the iw:grandchild node, we could write code like this:

String expr = 
"//*[local-name() = 'child1' and namespace-uri() =" + 
"'http://www.iwaysoftware.com']"; 
XPath xpath = XPathFactory.newInstance().newXPath(); 
Node n = indoc.getRoot().getRootDocument(); 
String result = xpath.evaluate(expr, n)

In the fourth line, we obtain the root of the XDDocument as a DOM node by calling XDNode's getRootDocument() method. Unlike the getDomNode() method, this ensures that the returned node is a document and, hence, that absolute XPath statements will treat this node as the root.

Suppose that instead of getting the value of the node as a String, we wanted the node itself. For this, we could change the above example like this:

XDDomNode xddn = 
(XDDomNode) xpath.evaluate(expr, n, XPathConstants.NODE); 
XDNode xdn = xddn.getDomNode();

If we wanted all the child1 nodes in the default namespace, we could do this:

expr = "//child1"; 
NodeList nl = (NodeList) xpath.evaluate(expr, n, XPathConstants.NODESET); 
for (int j = 0; j < nl.getLength(); j++) 
{ 
XDDomNode xddn = (XDDomNode) nl.item(i); 
}

Top of page

XDNode with XSLT

In the past, using XSLT within iSM has required flattening the input document to a string (serializing) and reparsing the result into a new tree (deserializing). With XDDom, it is possible to transform using DOMSource and DOMResult objects to avoid these steps. Suppose the file c:/myxform.xsl contains the transformation we want to apply to XDDocument xddoc:

Transformer t; 
TransformerFactory tf = TransformerFactory.newInstance(); 
.newTransformer( 
t = tfnew StreamSource( 
new FileReader(("c:/myxform.xsl"))
); 
Node inNode = xddoc.getRoot().getRootDocument(); 
DOMSource ds = new DOMSource(inNode); 
Node outDomDoc = new XDDomDocument();
DOMResult dr = new DOMResult(outDomDoc);
t.transform(ds, dr);
XDDomNode domRoot = (XDDomNode)outDomDoc. getDocumentElement();
XDNode result = (domRoot == null) ? null : domRoot.getXDNode();

After setting a Transformer in the usual way, we get the root of our input XDDocument as a DOM Document node just as we did above in the xpath examples and use this in a DOMSource.
Next, we create an "empty" XDDomDocument to use as the target for our transformation result. The root element will be added as a child of this DOM Document. The XDDomDocument is set into a DOMResult.
We apply the transform, using the DOMSource and DOMResult.
To retrieve the result, we first need to access the DOM root element, then get its underlying XDNode. The transform might have returned without creating a root element, hence the check for null.

Other APIs that use DOMSource and DOMResult should work similarly.