2008-11-19

 

ISO/IEC 29500:2008 OOXML Standard Available

Thanks to the ever-vigilant Alex Brown (and his convenient proximity to European time zones), we now know that IS 29500:2008, the ISO/IEC Standard for Office Open XML File Formats (OOXML to its friends) is now available from ISO.

The 4-part standard and its “electronic inserts” are provided on a CD-ROM.  The purchase price is 342 CHF (about $285 USD or 225.5 EUR).  Not exactly a holiday stocking-stuffer, but there are other ways.

Rumor has it that ECMA will now issue IS 29500:2008 as a revision of ECMA-376, and that will be available for free download (and sometimes there is a CD-ROM compilation made available from ECMA and we could expect OOXML to be included).

Even better than waiting for ECMA, IS 29500 is now on the ITTF list of Publicly Available Standards.  As of today, the individual parts and their electronic inserts are available for download.  Scroll down to the end of the Publicly Available Standards list and you will find the 7 links to the parts and their inserts.  After accepting the license agreement for your personal use of each download, the Zip files will be on their way to your computer:

The Electronic inserts include schemas in both Relax-NG and XML Schema.  The IS 29500-1 inserts also include drawing geometries, spreadsheet styles, and word-processing art borders.  The schemas with IS 29500-1 are for strict OOXML.  The transitional schemas are with IS 29500-4.

One advantage of having these downloads, today, is having desktop search for locating material in them.  That, along with the Acrobat Search for individually-opened PDFs, makes it possible to rely on this material as off-line but on-board references.


Until I stumbled on Alex Brown’s tweet following a very satisfying OASIS ODF Interoperability and Conformance TC coordination call, I had not realized how much I have been suppressing myself in anticipation of the availability of IS 29500 in tangible, public form.  It was a little bit like waiting for US Presidential Election results (which happened much more quickly, although the long run-up was certainly comparable).

I’m not sure why it was like that, since there is plenty of work to do with ODF as part of my attempt to apply the Harmony Principles.  Yet the missing-in-action status of IS 29500 was some sort of cloud over my attention and enthusiasm.   Now, instead, I am suddenly much farther behind in this work than I was just 12 hours ago, and that was behind enough.  Stay tuned …

[update 2008-11-20: I forgot to set categories.  Done now.]

Labels: ,


2008-10-30

 

Cover Pages: W3C Multimodal Architecture and Interfaces

[update 2008-11-06 I don’t know how I failed to see that the very first sentence didn’t carry the sense I intended for it.]

A current weakness in the open-document standards arena is the poorly-specified and tacit coupling of format provisions to behavior in various document processing contexts (creation, viewing, editing/manipulation, and various “final-form” renderings and, these days, interactive performance governed by the document, whether slide-show or something more elaborate).

We’ll get to that some day, and the ways that such aspects are layered into specifications and their allowance for application innovation and conformance novelty remain to be discovered.

This Cover Pages Daily Newslink item from 2008-10-21 leads to an account of the W3C Technical Report on Multimodal Architecture and Interfaces Fifth Working Draft.  

I’m putting down an nfoWorks marker because of these intriguing passages in the Newslink:

“The main difference from the previous draft is the addition of the rules and guidelines which will allow modality experts to describe the features, capabilities and APIs for specific modality components in sufficient detail so that the components will be interoperable in implementations of the Multimodal Architecture. … The specification describes a loosely coupled architecture for multimodal user interfaces, which allows for co-resident and distributed implementations, and focuses on the role of markup and scripting, and the use of well defined interfaces between its constituents.”

I am hesitant about the following:

“This framework places very few restrictions on the individual components or on their interactions with each other, but instead focuses on providing a general means for allowing them to communicate with each other, plus basic infrastructure for application control and platform services … At runtime, the MMI architecture features loosely coupled software constituents that may be either co-resident on a device or distributed across a network. In keeping with the loosely-coupled nature of the architecture, the constituents do not share context and communicate only by exchanging events.”

There are some wise words about keeping straight the different design-time and run-time considerations.

I suspect that this is not going to bear directly on realization of the Harmony Principles, but it might provide useful conceptual underpinnings for an account of the behavioral aspects that are at least as important in document-mediated interoperability as the standard document format.

Labels: , , ,


2008-10-15

 

Content Assembly for nfoWorks?

Also from the Cover Pages: XML Daily Newslink for 2008-10-14, there is word of jCAM, an open-source, Java-based CAM XML Processor and Template Editor.  The SourceForge Project has the software; the web site has more information and tutorials.

The focus of the OASIS Content Assembly Mechanism (CAM) TC is on information management of XML documents used for business transactions.  What attracts my attention is that

“The vision of the CAM work is for describing machine-processable information content flows into and out of XML structures … .”

Some of the lingo is opaque to me, but I do have interest in automated approaches that involve

I may have problems in the validation and filtering of documents and test suites that are not at the proper scale for CAM.  I do think it is worth examining for ideas and applicable techniques.  This is my placeholder reminder for that.

Labels: , ,


 

DITA for Technical Standards Publishing

Technorati Tags: , , ,

From the Cover Pages: XML Daily Newslink for 2008-10-14, there is announcement that the OASIS Darwin Information Typing Architecture (DITA) Technical Committee has formed a new subcommittee.   The DITA for Technical Standards Subcommittee has the ambitious purpose of furthering and promoting DITA use for the creation, maintenance, and support of technical standards specifications.  The idea is to have a “common standard for the creation and publication of … technical standards specifications:” 

“The first effort will be to assess and define common requirements for the maintenance and publication of technical standards.  This will provide the common requirements for the specific capabilities that DITA should provide.  Finally, the group will create necessary enhancements to DITA standards and deliverables, including the DITA Open Toolkit with a Toolkit for Technical Specifications.”

There is more in the announcement of subcommittee formation on the (semi-official?) DITA online community site.  The official subcommittee operation is to be set up on the OASIS DITA TC page.

I’m not sure that this has any near-term benefit, but it does arouse my interest in another way.  I am finding it very difficult to wrap my head around the current and in-progress OpenDocument Format (ODF) and Office Open XML (OOXML) specifications.  I need some way to wrestle out my understanding in a way that allows me to surface a conceptualization of the functions of either in a way that their reconciliation at the Harmony Principles level can be grasped and described in some useful way.

DITA surfaces on my radar from time to time.  It is something I think I should know more about.  I don’t know how to apply it in the context of standard document formats, nor am I clear how it is applicable to the conceptualization and expression of document-format standards.  It does strike me that some help is needed, based on my early efforts in the analysis of ODF specifications.   (At this point, concept-mapping software might be even more useful, and I will look into that as well.)

It is time to dig deeper into DITA to see how it can support a harmonization effort with regard to office document formats and their harmonizable specification.

Labels: , ,


 

Simplifying Speech-Enabled Applications

Via the Cover Pages: XML Daily Newslink for 2008-10-14, I learn that the W3C has standards for speech-enabled/-enabling web applications.  The addition announced today is the W3C Pronunciation Lexicon Specification (PLS) Standard.  This is an accessibility as well as a convenience feature.  PLS is intended to work with Text to Speech (TTS) and VoiceXML applications. 

The PLS lexicon is an XML document and there is allowance for blended use with other namespaces.  This suggests to me that there is prospective use in interchange of office-productivity documents for various purposes.

I don’t expect that this will fit into any foreseeable level of harmonized features.  I am placing this marker because it may well feature in accessibility provisions at some point, even if accomplished via a public-profile agreement involving foreign elements.

The accessibility angle is an important one to keep an eye on for its interoperability, interchange, and preservation potential.

Labels: , ,


2008-09-07

 

Document Interoperability: The Web Lesson

"are there alternatives to google groups search for searching old USENET messages? because groups date fielded search is teh broken."

-- Richard Akerman on Twitter, 2008-08-31 

Be prepared for a dramatic shift in the reality of web-site browsing and the honoring of web-page standards.   The pending release of Microsoft Internet Explorer 8 is going to put the reality of web standards and their loose adherence in our faces.  Although Internet Explorer is indicted as the archetypical contributor to disharmony on the web, Internet Explorer 8 is going to challenge all of us to deal with the reality of our mutual contribution to the current state of affairs.

Here is a lesson, probably many lessons, for document interoperability and the way that standards for document formats evolve and harmonize, or not, over time.

The Web as Clinical Science

The movement from loosely-standard pages and their browsing to strictly-standard pages and standards-mode browsing will illustrate every aspect of the same challenge for office-productivity documents and the office suites that process them. 

Web pages are the experimental drosophilae of digital documents.  All aspects of dynamic convergence on standards, themselves evolving, and the forces of divergence, are demonstrated clearly and rapidly.  I expect it to take Internet generations for significant convergence, with no static level of standards adherence anywhere in sight.  It took us almost 20 years to get to this point on the Web; I figure it will take at least five more to dig out of it far enough to claim that there is a standards-based web in existence and in practice.  I'm optimistic, considering that HTML 5, the great stabilization, is not expected to achieve W3C Recommendation status until 2012.

No document-interoperability convergence effort is anywhere close to the promising situation of the web as Internet Explorer 8, HTML5 implementations, and other compatibility-savvy browsers roll out over the next several years.  It is useful to use that situation to calibrate how convergence and interoperability could work for document interoperability.  There are significant technical barriers.  The non-technical barriers are the most daunting.  That should be no surprise.

Versioning in Document Use

I've written on Orcmid's Lair about the IE 8.0 Disruption.  This involves changes in Internet Explorer 8.0 by which web pages are rendered in standards-mode on the assumption that pages are conformant with applicable web standards.  In the past, it was presumed that pages were loosely-standard and browsers, also loosely-standard, made a kind of best effort to present the page.  The consequences have been explained marvelously in Joel Spolski's post on Martian Headsets.

We are similarly relying on document-format standards as a way to provide for many-to-many interchange and interoperability between different (implementations of versions of) document-format standards and different (implementations of versions of) processors of those digital documents.  That means we have a version of the loosely-standard documents with loosely-standard processing problem.  We can't be strictly standard because the standards can't (and definitely don't) have strict implementations at the moment; and there are many ways that specifications and implementations have been kept loose by design.  Accompanying that looseness by design is the the simple fact of immaturity among the contending document-format standards for office applications, particularly as vehicles for interoperable applications.

For office-productivity documents as we know and love them, there are five, count 'em five "official standards." 

The "Official" Public Standards of Office Documents

For Office Open XML Format (OOXML), there is the ECMA-376 specification of December 2006.  There is also the ISO/IEC 29500:2008 Office Open XML File Formats standard once it is made available.  IS 29500 will have some substantive differences from ECMA-376.  We won't have a solid calibration of the differences until the IS 29500 specifications are available and subject to extensive review.

For the OpenDocument Format, there is the Open Document Format for Office Applications (OpenDocument) v1.0 OASIS Standard issued 1 May 2005.  There is also the ISO/IEC 26300:2006 Open Document For Office Applications (OpenDocument) v1.0 standard (also on the publicly-available listing).  IS 26300 is for the same format as the OASIS v1.0 standard, but it is on a completely-separate standards progression.  Appendix E.3 accounts for the differences of IS 26300 from the text of the May 2005 OASIS Standard.  The first page of the IS 26300:2006 document (page 5 of the PDF) identifies its source as Open Document Format for Office Applications (OpenDocument) v1.0 (Second Edition) Committee Specification 1, dated 19 July 2006, derived from document file OpenDocument-v1.0ed2-cs1.odt; this is not another OASIS Standard, however.

The second and latest OASIS Standard for ODF is Open Document Format for Office Applications (OpenDocument) v1.1 issued 2 February 2007.  This document is derived from OpenDocument v1.0 (Second Edition) Committee Specification 1, the same specification that is the source of content for ISO/IEC 26300:2006.  The changes made to arrive at ODF v1.1 from the v1.0 (Second Edition) committee specification are detailed in Appendix G.4.  There are some mildly-breaking changes from ODF v1.0 to ODF v1.1, mostly of a clarification or correction nature.  There are a few additional features that have no down-level counterparts in ODF v1.0.

A third OASIS Standard, ODF v1.2, is under development.  The current drafts, using a very-different organization from v1.1, are available as pubic documents of the OASIS Open Document TC. 

We can expect to see more versions of ODF and of OOXML at their various standards venues.  We'll be watching here on nfoWorks as the situation becomes even more chaotic.  Notice that this diversity ignores the variety of divergent implementations of the various specifications.

Format Versions that Live Forever

It is possible for one document-format specification to officially supplant another, with the older specification deprecated.  That has not been done so far with any of the five-and-growing document-format specifications, any more than it has been done for most of the versions of HTML specifications that have been recommendations of the W3C (and IETF before the development track moved entirely to W3C). 

For example, the last full-up specification for HTML, the HTML 4.01 W3C Recommendation of 24 December 1999, has this to say about its immediate predecessor: "This document obsoletes previous versions of HTML 4.0, although W3C will continue to make those specifications and their DTDs available at the W3C Web site."  This was possible because HTML 4.0 was young and there were important defects that 4.01 cured.

The HTML 4.01 specification continues with the following recommendation: "W3C recommends that user agents and authors (and in particular, authoring tools) produce HTML 4.01 documents rather than HTML 4.0 documents. W3C recommends that authors produce HTML 4 documents instead of HTML 3.2 documents. For reasons of backward compatibility, W3C also recommends that tools interpreting HTML 4 continue to support HTML 3.2 [W3C Recommendation 14 January 1997] and HTML 2.0 [IETF rfc1866 November 1995 and the IETF-obsoleting rfc2854 June 2000] as well." 

The XHTML branch of specifications, originally derived from HTML 4.01, were intended as the basis for a future generation. 

Meanwhile, there has been work toward both XHTML 2 and HTML 5.0

HTML 5.0 is currently intended to exist alongside XHTML 1.x and its newer arrangements while also absorbing XHTML 1.x to some degree (by having an XML form).  The current HTML 5.0 draft specifies legacy processing (in its HTML-syntax form) for variations of over 60 HTML DOCTYPE DTD flavors, extending back to HTML 1.0 and other variants.  The intention is to converge HTML and XHTML 1.x under a consistent HTML 5 processing model with only no-quirks, some-quirks, and quirks modes.  This is also intended to end the variation and extension of HTML (not XHTML) by capturing <!DOCTYPE HTML> for its own and having a concrete HTML syntax that is fully-divorced from both SGML and XML.  It is important to point out that HTML 5 is not going to eliminate the divergence that browser (user-agent) plug-in models, plug-in implementations and scripting systems (especially client side) bring to the mix.

Document-format versions are not easily abandoned.  Even if production of a format is deprecated, consumption of the format may need to continue into the indefinite future, and certainly so long as emitters of deprecated formats have significant usage.  The W3C progression of HTML is at a point where that is fully-recognized and being honored in reaching toward an HTML 5 plateau sometime in the next decade.

Considering this promising stabilization, when would I manage to change all of my web sites and blogs to clean HTML 5 pages?  Not until I know that visits to those sites are only a small fraction of Internet Explorer versions prior to IE8 (or maybe IE9) and other browsers lacking full-up standards-mode processing.  Fortunately, the HTML 5 specification-effort promises to show me exactly how to do that in a mechanical way.  I am looking forward to automated assistance.  In my case, I'll also have the benefit of my IE 8.0 mitigation effort.  Other web sites may require other approaches, and user browser choice will involve important trade-offs for some time. 

I am surprised by the number of people who operate multiple browsers.  Although I operate multiple products for office applications these days, that's mostly to explore their interoperable use, not to ensure ability to interchange documents (well, not until I joined OASIS and the ODF TC).  I've been a serial adopter of Internet Explorer versions since IE 2.0.  As a typical late-adopter, I may finally branch out now just to have a better calibration of the migration to standards-based sites and browsers for them.

This is an important lesson for the management of the expanding variety of specifications of formats for office-application documents, formats of which HTML packagings are sometimes one of the flavors.

Reconciling office-application document-format versions does not promise to be so easy as the current effort to stabilize HTML for the web.

The Looseness of Document Specifications

Of course, OOXML and ODF are not close dialects off a single family tree, as HTML variants might be treated (and HTML 5 demonstrates, if successful).  In addition, the current specifications are not for same-conformance, interchangeable-everywhere documents:

<office:document-meta
    xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
    xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
    office:version="1.2">
  <office:meta>
    <meta:generator>
        OpenOffice.org/3.0_Beta$Win32 OpenOffice.org_project/300m3$Build-9328
    </meta:generator>
  </office:meta>
</office:document-meta>

This strikes me as even less appealing than the challenge of sites adjusting for browsers and browsers adjusting to HTML DOCTYPE declarations (and their absence).
   
It is not encouraging that the office:version attribute and <meta:generator> element are both optional.  It is unfortunate that the office:version attribute is generally uninformative about the processing requirements for the document file in hand, serving merely as an automatic claim of one specification the document conforms to.  The document is also likely to conform to earlier versions and probably alter later versions, although it is unclear how we can determine that easily for a given document representation.

Prospects for Interoperable Convergence

We already have before us difficulties with interoperable convergence of individual progression of a single standard and its variety of implementation.  This makes the prospect of harmonization between different standard formats rather murky.

Desktop office-application software has more promise with regard to application of Postel's Law, to be liberal in what is accepted and conservative in what is produced.  Unfortunately, the current specifications do not require conservative, interoperable implementations; the current specifications are arguably antagonistic to such an achievement.

I suspect that this is an unintended consequence mixed with some inattention to what it takes for interoperability to be achievable. 

It remains to see how our experience and understanding matures.   We are at the beginning, not the finish.  The journey may seem endless.


The process of IE 8.0 mitigation and preparation for a standards-mode approach to web browsing impacts this site and blog as well as every other web page I have ever posted (somewhere over 120MB worth and climbing).

I'm not going to say anything more about IE 8.0 mitigation and HTML harmonization here.  The overall effort will be tracked in that category of Professor von Clueless posts; that's the place to follow along.  The lesson for document interoperability is something that is definitely appropriate for Pursuing Harmony; there'll be much more to say about that.

Labels: , , , , ,


2008-08-19

 

... Tweaking the Sidebar ...

I can't stand some things about the sidebar, and I must fix them now (template versions 0.04-0.05):

I need to be careful and not attempt too much of this all at once.   This is going to be one of those pages that I will probably update as I make progress.

Update 2008-08-20T23:54Z: A set of provisional changes are made with template version 0.04 with the idea of tweaking further after seeing the template at work.

Update 2008-08-21T01:07Z: The biggest change is to get rid of link underlining and use bold-face as well as an improved link color.  This and some minor layout adjustments are accomplished with template version 0.05.  I am going to let those sit there for a while until I see how to make the layout more pleasant still.

Update 2008-08-21T01:14Z: The changes work better if I save the new template to Blogger after previewing it.

Labels: , ,