The W3C DOM specifies a 16-bit character set, UTF-16, for full physical compliance. This provides for “universal” interchange of XML documents. There is no issue with 8-bit character set documents being read as compliant documents, since the UTF-8 is a subset of UTF-16. Issues arise when the reverse is the case. Documents UTF-16 documents may contain characters which do not convert down to valid UTF-8 characters, and instead, when read as UTF-8, translate to two rather one characters in translation.
At the time of this writing most content in AIR and EATS has been developed and maintained using 8-bit character sets. This includes database files which use 8-bit encoding. EATSv5.2020 will be based on UTF-8 with alerts generated when UTF-16 documents are attempted to be incorporated. This allows for a consistent DOM interaction model to be stabilized first, before the platform as a whole is migrated to UTF-16 for both document exchange and database management functions.
While the DOM specification is defined via Interfaces, there is a singularity of scope engineered into the DOM. The specification defines the abstraction of a singular tree structure. All content is developed and described as components the one tree. DOMDocumentFragment allows disconnected branches to be developed and attached at arbitrary points in the DOMDocument tree. However, the DOMDocumentFragment must first be defined as a node representing an abstract potential branch of a particular DOMDocument tree. It can NOT be freely attached to any tree in a forest.
The data model has a formal physical representation as a serialized character structure, but has no specific internal representation. While no physical implementation is dictated, the IDL is heavily oriented toward certain languages more than others by means of how the IDL specifications define DOM logical properties and methods.
DOM structures and their contents are defined, in the W3C specifications, using IDL (Interface Definition Language) definitions in the form of types and interfaces. EATSv5 defines these as PHP Class definitions.
- DOMString – a sequence of characters116-bit units in the standard specification
- DOMTimestamp – a number of milliseconds
- DOMUserData – a reference to application data
- DOMObject – an object reference
- DOMException – exceptions originating from DOM processing
These are described, along with other standard structures, in the Content material.
EATS has a document centric architecture. AIR is document centric. AIR elements are maintained as serialized XML documents, each a separate instance of a W3C DOM object. EATSv5 also conceptualizes sets of AIR elements described collectively as documents; just as a document, in a more human sense, can be a paper, a pamphlet, a book or a multi-volume edition of a work taken as a whole, such as an encyclopedia. A document is a module, describing an entity as a compositional collection of parts. A paper, composed of paragraphs. A book, with sections and chapters. A DOM document is a module described as a tree structure of elements, with attributes. An AIR Repository is a unified forest of modules, described using DOM framing.
The W3C DOM Core doesn’t define semantics. It defines structural syntax. When extended, in the HTML extension, a layer of semantics is provided and specific element types become meaningful in terms of intent and useful function. AIR, and EATS, has a structural, syntactical, content architecture which has a role similar to the DOM Core specification. It defines the nature of the structural relationship of element types. There is also a semantic architecture described by the EATS MetaModel Specification. EATS semantics are defined as a DSL. The DSL provides a base for semantic extension through the definition of application specific ontologies and element types, similar to the manner in which DOM is extended with HTML, and authors extend DOM via HTML, which is then further extended with custom types to describe meaning around content for browser interaction.
The Architected Futures DOM describes a network graph model which includes cyclic paths and relational structures in a forest of elements, but it can also be unfolded and organized as a forest, or a virtual tree. Physical instantiation comes in two fundamental forms:
- Singular elements, which are serialized as W3C DOM XML strings.
- A data store of such elements with some implied cohesion, usually (at this time) implemented as a set of SQL tables.
Data stores consist of two primary tables:
- An Elements table
- A Relationships table
The Elements table is the only required table, and can be expressed as a collection of raw XML files which need not be in a database. The relationships table is maintained by the EATS infrastructure when using a database as the data store. It provides processing efficiency, and is effectively employed as an inverted index system for the elements. A similar conceptual framework is being used in EATSv5 for the internal representation of the EATSv5 DOM model in executable form.
The Relationships table expresses the relational structure of the content of the element set as understood by EATS. It is expressed using a form of RDF statements where a Subject, and an Object element pair are joined in an association defined by a Predicate element which has defined axioms for use and semantics which are understood by the software. (To HTML, h1 identifies a header of the highest order. To EATS, afa19a1965a1b357aea285c0456de25227e2cd758000007ac6977b identifies a Cardinality Specification.)
In simplified form, the Element set defines a node list defining the nodes of concern. The Relationship set defines a node graph expressing the variety of relationships which exist between the nodes. Any single Element can be used as the root of a tree, but trees are not limited to compositional models. Pulling a recursive “composed of” set of relationships via queries on the Relationships table will extract a Bill of Materials (BOM) compositional model, similar to the meaning defined by the W3C DOM as used in HTML. Pulling a recursive “regulates” set of relationships extract a fishbone (tree) model with distinctly different meaning.
The initial version of AIR was developed using the PHP v4 DOM as the internal scheme for data management. A later version was implemented in Java using Apache Xerces. For EATSv5 we are using PHP 7+ and a proprietary implementation which is better able to be integrated wholistically with the remainder of the EATS framework.
The DOM is defined as a Tree, a formal Tree. A Tree is a Graph with special properties.
Graphs may be directional, or not. They may be circular, or not. Trees are the simplest form of directed graphs where all nodes are part of a single connected graph, only one route exists between any two nodes in the graph, and a single root node leads to all other nodes. Choosing any one branch will never lead to the nodes derived as part of any other branch. It’s one step more complex than a list; which is a specialized Tree with a single root, a single terminal leaf, and all other nodes have one input edge (from their predecessor), and one exit edge (to their successor). Generic graphs are complex entities and not very straight-forward to code, or understand. Trees are very useful, and much easier to understand, and code.
Working with Trees simplifies life; if life and a problem set follow that simple model. Bill of Materials follows that model; if you don’t need to account for the same component used by multiple sub-assemblies. IMS2IMS is a hierarchical database, which was physically implemented as a hierarchical Tree. followed that model (if you ignore phantom pointers) to help send Apollo to the Moon. It works, but it has issues.
Life isn’t hierarchical. Even basic, simple, natural, organic trees aren’t fully Tree structures if you include the root systems. Roots define a Tree in one direction, branches and leaves define a second Tree in an opposite direction, joined at the base of the trunk. If you want the whole tree, you need two Trees; or one non-Tree graph. A Tree can be used to describe assembly (BOM), or distribution; but, typically, not both3Tree edges are not bidirectional. Bidirectional branches and roots would require four Trees, one in each direction for each side.. Generic graphs are more complex to describe and process; but they have greater utility. (For example, flows in both direction from nutrients fed by the roots to grow the branches of the tree, and back flows from photosynthesis by the leaves which helps power the growth of the root system4Not intended as a precisely accurate botanical model. Structurally as many edges as four Trees, but only one graph structure; not four distinct Trees. And only one set of nodes; but with directional flow awareness..)
For a variety of purposes, EATS supports a variety of network modeling techniques which include graph processing algorithms. EATSv5 implements the DOM, internally, as a general graph structure, rather than a simple Tree.
The EATSv5 DOM is structured internally as two lists which follow the architectural model of the AIR knowledgebase.
- A node list – a uniform list of all of the nodes that are part of the document without regard to position, rank or content.
- A node relationship map – a map describing the relationships which exist between the nodes.
Both lists are constructed as doubly-linked lists. The relationship map provides a scheme for the creation of consistent node descriptions and behaviors by reference of common node definitions communicated for use in multiple places within a DOM structure. When painted as a serialized form, base element content specifications are painted multiple times, once at each point in a DOM Tree where specified by the relationship map. The scheme also provides for the assurance of consistent content and definition across multiple documents (e.g., HTML pages, and HTML versus database elements). The graph model allow, but does not force, the ability to create content normalized Trees.
Fundamental operations provided by EATSv5 for manipulation of the lists are based on graph processing requirements, not DOM specifications. DOM features are implemented as a layer on top of the graph processing.