d140502f: Embellishing HTML Replicas

nfoWorks: tools for document interoperability

d140502 nfoWorks devNote
Annotating XML Documents
Embellishing HTML Replicas

nfoWorks>dev>
2014>05>

d140502f>
0.00 2017-06-14 20:22

Latest version of Annotating XML Documents: available on the Internet at
<http://nfoWorks.org/dev/2014/05/d140502b.htm>

This Embellishing HTML Replicas version is available at <http://nfoWorks.org/dev/2014/05/d140502f.htm>.

{Ed.Note: This is boilerplate as a placeholder, allowing this page to be linked to before its content is perfected.

The following topics are discussed:

   1. Variations in the derivation of the Baseline HTML, such as the use of classes or in-line styling of various kinds. How the embellishments are very much specific to the nature of the XML document and the intended use of the annotated replica.
   2. Focus on schemas and RNG schemas in this particular case.
   3. Links in external URLs.
   4. Placing permalinks on definitions, observing that normal styles are used.
   5. Adding cross-references to definitions.
   6. Element-name permalinks with emphasis.
   7. Attribute definitions with emphasis as well and taking advantage of line numbers to prevent duplications.
   8. The grouped Attribute case.
   9. Treatment of datatypes and external references.
10. Include references to the editor used and the definitions of the regular expressions and replacement rules.}

The derivation of an HTML rendition of an XML document is complicated by the fact that the XML document has characters with special significance in HTML. In addition, HTML does not preserve whitespace as written in the XML document. It is necessary to make adjustments for preserving spaces and line breaks and also "escape" XML document characters "&", "<", and ">" so they appear literally without recognition as HTML-significant special codes.

It is also a bit complicated to convert the plaintext line-number fields to HTML fields that render as permalinks having the line number as text.

The entire process can be accomplished with scripts and small programs, as is also the case for creation of a baseline plaintext replica.   All of the steps can be automated. Here, a manual procedure is used to demonstrate "by hand" what the critical steps are for any automation.

The procedure given here is a continuation of the manual procedure already-provided for creating baseline plaintext replicas. The same XML document is used by way of example [n140504b3]..The examples begin with the Microsoft Office 2013 Excel spreadsheet that is a by-product of the plaintext replica procedure [n140504d2].

The objective is simple: Provide an HTML text that renders the same as the plaintext rendition (e.g., [n140504d1]), but with the line-number fields replaced by permalinks to those very lines.   Any further annotation is accomplished by editing the baseline HTML replica using appropriate tools.

This procedure illustrates two variations for HTML. The simplest is a version that is preformatted and must be included in HTML <pre> elements in order to present with the original whitespace, including line breaks, as illustrated by [n140504d5]. The second is a version that is ordinary HTML for inclusion in HTML <p> (and equivalent) elements, with space and line breaks preserved using special HTML markup, as illustrated by [n140504d6].

1. Prerequisites
2. The Starting Point
9. References

1. Prerequisites

{Ed.Note: Update to the basic starting point being the baseline HTML. In this particular case, the ordinary HTML version is used. The basic difference is the handling of   versus \x20. This depends on having an editor that provides good regular expression search and replace machinery. Link to the editor used and the regular expression definitions too.}

A line-numbered plaintext replica of the XML document should be available already. It is important as a check on the work and preservation of fidelity with the HTML version. There must be no horizontal tab (HT) codes in the lines, and line breaks should be consistent. Here the convention is to use CR-LF code pairs for line breaks to be compatible with the platform used. It is also the form produced by the saving of Microsoft Office Excel documents as space-delimited formatted text.
It is assumed, for the illustrative manual procedure, below, that a spreadsheet used to produce the plaintext replica is also available. If other tools are used, it is possible to start from the plaintext replica. There is enough detail here to determine what is required for any alternative approach.
In this manual approach, Excel is used to add more repetitive columns and save it as an intermediate plaintext file. To convert the intermediate plaintext file to HTML, a text editor with good full-text search-and-replace functions is required. There must be a means to find all of the line breaks in the plaintext file and insert constant HTML markup in front of them.

2. The Starting Point

{Ed.Note: Update to the basic starting point being the baseline HTML. In this particular case, the ordinary HTML version is used. The basic difference is the handling of   versus \x20. Also, having a practice copy and verifying a search-replace process alongside the draft derived HTML is important.}

There is an existing baseline plaintext replica of the original XML document, in accordance with the procedure for that level of replica. The document was re-indented as necessary and individual line numbers added to the beginning of each line. The file is also named as .txt so that it is not confused with the XML document it replicates (e.g., [n140504d1]).




{Ed.Note: For the visuals, I want before and after illustrations, where possible. It might be illustrated with snippets or with screen captures, with the transformation step in the middle.}

9. References

{Ed.Note: These will be pruned and the actual variant referenced, along with the resources used and where their specifications/documentation is found.}

[n140504d6]

Hamilton, Dennis E. OpenDocument v1.2 Manifest Schema baseline ordinary HTML replica. Derived from [n140504d4] on 2014-05-14. This version results from all of the steps in the procedure here, having the original white-space preserved in ordinary HTML via presence of   character entities and <br /> line-break elements.

Revision History:
0.00 2014-06-07-08:36 Initial Placeholder: Provide initial placeholder content for illustrated embellishing that is carried out in the case of a specific schema.

You are navigating nfoWorks.
This work is licensed under a
Creative Commons Attribution 2.5 License.

created 2014-06-07-08:36 -0700 (pdt)
$$Author: Orcmid $
$$Date: 17-06-14 20:22 $
$$Revision: 32 $