nfoWorks: tools for document interoperability

d140502 nfoWorks devNote
 Annotating XML Documents


 0.04 2017-06-14 20:22

An annotated XML Document is a replica of an XML document in a plaintext or HTML form.  The replica is a faithful rendition of the original XML document, with the neutral introduction of white space, line numbers, and links/anchors that do not alter the essential text.  The purpose is to maintain fidelity with the original XML document while employing forms that can be extracted and/or annotated by surrounding/connected material.  Replica fidelity should be apparent and not obscured by whatever auxiliary annotations are provided.

The nfoWorks procedure for annotating XML documents involves three stages of development.

This folio demonstrates these stages using sample XML documents and available productivity tools for development of the different forms.

-- Dennis E. Hamilton
Seattle, Washington


Replicas will at least have line numberings for use in making extracts and in making references to particular portions of the original XML document.  Because of this additional material, the replica cannot be used in place of the original XML document.  The replicas are more suitable for commentary and annotations applicable to particular portions and referenced by their line numbers. 

The plaintext replica is very useful for clippings incorporated in discussions of the XML document, especially schemas in XML.  It is also valuable as an intermediate form that can be inspected to ensure that there has been no material alteration to the original XML document structure and content.  The plaintext replica is usually the preferred form for insertion of extracts into word-processing documents.  Further annotation depends on the purpose and available features of the word-processing form.

HTML replicas provide for cross-linking in the document and into the document when accessed from a web browser.   In this form, the line numbers are anchors for linking to specific lines.  The line-number anchors are also permalinks for their lines.  This makes it easy to derive links to those very lines in a particular HTML replica.

The baseline HTML document is the equivalent to the plaintext document except for the transformation of the line numbers to be self-permalinks. The different HTML handling of white space, line breaks, fixed-pitch spacing, and characters reserved for HTML markup are adjusted for in derivation of the baseline HTML document. 

In a browser, the baseline HTML document appears the same as the baseline plaintext version, with the line numbers being distinguished as permalinks but otherwise the same.  The baseline HTML document is not modified further.  It serves as an important intermediate form for verification that the derivation of this form is faithful to the plaintext replica but for the transformations necessary to render correctly in web browsers and the addition of the permalinks to the line-number fields.

Further HTML annotation starts with a copy of the baseline HTML, with the baseline preserved intact for verification and auditing of the work.  It remains available for construction of any number of annotations of the same XML document replica.

For annotated HTML forms, a baseline copy can be reduced or expanded in form to suit the purposes of the annotation.  In all cases, the relationships to the baseline HTML document and the corresponding baseline plaintext are readily apparent.  Replica document elements and attributes that refer to other material in the same or other documents can be transformed into links to that associated material.  Places that may be linked to can also be given permalink anchors for linking to those places from elsewhere in the same document or from external material.  The use of a permalink makes it easy to use a web browser to capture the correct URL for linking into the HTML replica at that precise point.

The provenance and authenticity of replica forms can be important where the essential fidelity to the original source is a significant concern.  Provenance is addressed in terms of preservation of the chain of work products that allow the development to be audited and, if necessary, corrected.  Authenticity is based on verification of the chain of custody and of attestations provided by application of digital signatures.

visits to popular nfoWorks pages

Locations of visitors to nfoWorks

Related Material

Hamilton, Dennis E.
Annotating XML Documents.   nfoWorks devNote folio d140502 0.04, June 7, 2014.  Accessed at <>.
Revision History:
0.04 2014-06-07-09:40 Adding Embellishment, Provenance & Authenticity
The additional topics are summarized and linked
0.03 2014-05-12-15:41 Add Plaintext HTML Replica Creation
n140502e is established
0.02 2014-05-11-20:40 Demonstrate Plaintext Baseline Replica Creation
n140502d is established.
0.01 2014-05-11-10:35 Provide Synopsis and Summary
The essential procedure and the nature of the replicas is sketched.
0.00 2014-05-10-16:54 Establish Initial Placeholder for Material
Have just enough document engineering to begin collecting my ideas on how to modularize for composition of components and work integration cases for those components..

Construction Structure (Hard Hat Area)

Creative Commons License You are navigating nfoWorks.
This work is licensed under a
Creative Commons Attribution 2.5 License.

created 2014-05-10-16:54 -0700 (pdt)
$$Author: Orcmid $
$$Date: 17-06-14 20:22 $
$$Revision: 277 $