d140502: Annotating XML Documents

nfoWorks: tools for document interoperability

d140502 nfoWorks devNote
Annotating XML Documents

nfoWorks>dev>
2014>05>

d140502>
0.04 2017-06-14 20:22

An annotated XML Document is a replica of an XML document in a plaintext or HTML form. The replica is a faithful rendition of the original XML document, with the neutral introduction of white space, line numbers, and links/anchors that do not alter the essential text. The purpose is to maintain fidelity with the original XML document while employing forms that can be extracted and/or annotated by surrounding/connected material. Replica fidelity should be apparent and not obscured by whatever auxiliary annotations are provided.

The nfoWorks procedure for annotating XML documents involves three stages of development.

A faithful plaintext replica is created that differs from the visible layout and content of the original XML document by the addition of line numbers at the beginning of each line. This is an useful form for verification, reference, and extractions of plaintext fragments.

A baseline HTML replica is created that, when viewed in a browser, preserves the visible layout and content of the plaintext replica. In this document, the line numbers become permalinks to the lines on which they appear. This is an useful form for verification of fidelity with the plaintext replica, reference, and extraction of HTML fragments for annotation purposes.

Further HTML replicas are derived from the baseline HTML replica for detailed-annotation purposes. Cross-referencing linkage of material and provision of additional permalink anchors can be provided to support a particular annotation purpose.

This folio demonstrates these stages using sample XML documents and available productivity tools for development of the different forms.

-- Dennis E. Hamilton
Seattle, Washington
2014-05-11

Summary

Replicas will at least have line numberings for use in making extracts and in making references to particular portions of the original XML document. Because of this additional material, the replica cannot be used in place of the original XML document. The replicas are more suitable for commentary and annotations applicable to particular portions and referenced by their line numbers.

The plaintext replica is very useful for clippings incorporated in discussions of the XML document, especially schemas in XML. It is also valuable as an intermediate form that can be inspected to ensure that there has been no material alteration to the original XML document structure and content. The plaintext replica is usually the preferred form for insertion of extracts into word-processing documents. Further annotation depends on the purpose and available features of the word-processing form.

HTML replicas provide for cross-linking in the document and into the document when accessed from a web browser. In this form, the line numbers are anchors for linking to specific lines. The line-number anchors are also permalinks for their lines. This makes it easy to derive links to those very lines in a particular HTML replica.

The baseline HTML document is the equivalent to the plaintext document except for the transformation of the line numbers to be self-permalinks. The different HTML handling of white space, line breaks, fixed-pitch spacing, and characters reserved for HTML markup are adjusted for in derivation of the baseline HTML document.

In a browser, the baseline HTML document appears the same as the baseline plaintext version, with the line numbers being distinguished as permalinks but otherwise the same. The baseline HTML document is not modified further. It serves as an important intermediate form for verification that the derivation of this form is faithful to the plaintext replica but for the transformations necessary to render correctly in web browsers and the addition of the permalinks to the line-number fields.

Further HTML annotation starts with a copy of the baseline HTML, with the baseline preserved intact for verification and auditing of the work. It remains available for construction of any number of annotations of the same XML document replica.

For annotated HTML forms, a baseline copy can be reduced or expanded in form to suit the purposes of the annotation. In all cases, the relationships to the baseline HTML document and the corresponding baseline plaintext are readily apparent. Replica document elements and attributes that refer to other material in the same or other documents can be transformed into links to that associated material. Places that may be linked to can also be given permalink anchors for linking to those places from elsewhere in the same document or from external material. The use of a permalink makes it easy to use a web browser to capture the correct URL for linking into the HTML replica at that precise point.

The provenance and authenticity of replica forms can be important where the essential fidelity to the original source is a significant concern. Provenance is addressed in terms of preservation of the chain of work products that allow the development to be audited and, if necessary, corrected. Authenticity is based on verification of the chain of custody and of attestations provided by application of digital signatures.

visits to popular ***nfoWorks*** pages

Related Material

d140502b: Annotating XML Documents [latest]

d140502c: Approach

d140502d: Create Plaintext Baseline Replicas

d140502e: Create HTML Baseline Replicas

d140502f: Embellishing HTML Replicas

d140502g: Provenance & Authentication



n140504: ODF 1.2 Schema Reference
                for replicas produced using this procedure

n140503: ODF 1.2 Text-Document Schema Analysis
                for custom annotations using such replicas

d140502a: Diary & Job Jar

Revision History: 0.04 2014-06-07-09:40 Adding Embellishment, Provenance & Authenticity: The additional topics are summarized and linked
0.03 2014-05-12-15:41 Add Plaintext HTML Replica Creation: n140502e is established
0.02 2014-05-11-20:40 Demonstrate Plaintext Baseline Replica Creation: n140502d is established.
0.01 2014-05-11-10:35 Provide Synopsis and Summary: The essential procedure and the nature of the replicas is sketched.
0.00 2014-05-10-16:54 Establish Initial Placeholder for Material: Have just enough document engineering to begin collecting my ideas on how to modularize for composition of components and work integration cases for those components..

You are navigating nfoWorks.
This work is licensed under a
Creative Commons Attribution 2.5 License.

created 2014-05-10-16:54 -0700 (pdt)
$$Author: Orcmid $
$$Date: 17-06-14 20:22 $
$$Revision: 277 $