n080301d1: nfoWorks Diary - Hard Hat Beginnings, April-May 2008

nfoWorks Diary: Hard Hat Beginnings April-May 2008	*tools for document interoperability*

nfoWorks > Diary > Hard Hat Beginnings

visits to popular ***nfoWorks*** pages

Previous Months
Feb-Mar 2008

2008-05-15 Choosing Source Control for nfoWorks

SourceForge, where I intend to maintain code under development and deliver nfoWorks libraries and fixtures, uses the SubVersion (SVN) source control system. CVS is also supported, but SVN has more appealing Windows clients, such as TortoiseSVN, and there are ways to use the interface with Microsoft's CodePlex if SourceForge becomes unviable.
   Meanwhile, there are "distributed" source control systems that work locally and can be published wholesale on web sites and anyone else can control how they except changes from one source-control set into another. In effect, they are all forks and you can merge as you like. Linus Torvalds uses Git, a distributed source control system that he devised originally and that he uses for development of the Linux Kernel. Mercurial is another one, although my leaning is toward Git.
   Today, Jeremy D. Miller blogs that he finally got git. Not only is Git extremely useful off-line, it is extremely fast and the compressed repositories are easy to ship around. I am linking to Jeremy as a reminder that I have been pondering the same thing.
   In my personal development model, I have reason to run Visual Source Safe on my local systems. This has to do with integration in on my IIS development server and working well on my SOHO LAN. It works just fine.
   I am prepared to work across source-control systems, taking a latest-versions snapshop out of one system and checking it in to another system. I can go from local VSS to SubVersion on SourceForge that way. The histories won't carry over. Instead, the latest-versions snapshots essentially represent roll-ups of changes from the last time a snapshot was checked-in. I can live with that. This gets trickier when multiple committers are working with the SourceForge repository, however.
   I could also check a latest-versions snapshot into a git for archiving and as a different way of publishing development trees on a web site as a cache and for downloading by other developers. That git repositories are single files is very useful, making distribution by file sharing rather practical. This also makes dealing with merges from multiple committers easier to deal with, something that makes Torvalds very happy with Git.
   That's the thinking so far. I want to keep my options open. The first big change will be when I learn to work with the SourceForge SVN repository later this year.
   --dh


2008-05-14 Does Document Translation Suck?

Jesper Lund Stocholm has expressed strong agreement (cache) with Rob Weir's argument that translation is never good enough and filtering to and from an internal model is the way to bring different formats in and out of a productivity application.
   Lund Stocholm says "I have been trying to pitch my idea of "document format channels" for some time now. The basic idea is not to do translations between formats but to support the feature sets of both formats in the major applications." He links to an investigation of conversions that he reported in November, 2007 (cache).
   It is not apparent what he has in mind that keeps the application from allowing a feature that is not available in the format of the document that is being worked on. This is a contrast with the Harmony Principles, where only features sets that work in both formats are implemented. (That is a big simplification of a complex situation, but it's the essential idea behind nfoWorks.)


2008-05-13 Spreadsheet Loading/Filtering/Translation Comparisons

Rob Weir reports (cache) some loading tests of spreadsheets in binary format, OOXML .xlsx, and ODF .ods formats. Using Office 2003 and OpenOffice.org 2.4, the OOXML and ODF formats took longer to load than the native binary formats of the corresponding applications. Oddly, even though Office 2003 uses a compatibility pack to load the OOXML version of the test spreadsheet, it loads that spreadsheet (which is larger than the corresponding ODF on disk) faster than OpenOffice.org loads the ODF version.
   But the horror story was the loading time of the ODF version into Excel 2003 via the ODF Translator plug-in. This took almost 10 minutes in contrast to the 14 seconds with OpenOffice.org.
   Although this may tell us far more about the implementation of the ODF translator, Weir argues that there is also substantiation of the inherent limitation of translators over filters (which go to an useful intermediate memory structure rather than completely into another recorded format).
   This is an useful experiment and we need to understand them better, creating similar performance tests that apply to nfoWorks approaches.
   The test files were obtained from a 2005 George Ou article (cache).


2008-05-12 Japanese Text Layout (via Rick Jelliffe)

There is a new W3C draft (cache) that provides important English-Language information on the requirements for layout of Japanese text on print-formatted pages. Based on a Japanese standard (JIS X 4051), the developing document should become a comprehensive resource on the proper layout of Japanese text and other writing systems that share Japanese layout conventions.
   This sort of detail should be eye-opening for those who think strictly in terms of left-to-right writing of texts based on a Roman alphabet.
   As an interoperability concern, I find conformance of layout to be particularly daunting and something that it would be great to be able to ignore in terms of harmonization of document formats. That is, of course, not possible if I am to take the Harmony Principles seriously. There is an impact on the applications and how users see the proper layout of what others and themselves have written. I remain daunted.


2008-05-10 ~~Facebook~~ FrontPage and Markup Preservation Struggles

(via Doug Mahugh). The Channel 9 interview with Terry Crowley (cache) touches on some topics that are very relevant to document interoperability and also the socio-political situations around it. Crowley reflects on the problems that FrontPage experienced as an impugned brand, how difficult mark-up preservation is, how much was invested in it, and how the resulting editor lives on as a component of other products. The problem of brand reputation is probably going to arise in OOXML-ODF (and intra-OOXML, intra-ODF) interoperability (and I see strenuous efforts to frame such a situation with thoughtless disregard for the law of unintended consequences). Crowley's remarks on the difficulty of mark-up preservation in the face of model incompatibilities are particularly relevant to the nfoWorks objectives and satisfiability of the Harmony Principles.


2008-05-09 Usability and Interoperability: Table-of-Content Magic

Joannie Strangeland has posted a blog entry (cache) on how to have Microsoft Word create independent tables-of-content within selected sections of a document in addition to the higher-level overall ToC for the document. The technique is illustrated in terms of interactions with features of Microsoft Word via the program's UI.
   This is exactly the level that someone wanting to make section-level ToCs needs to understand to accomplish this in a document. I have wanted to be able to do exactly this, so the mini-tutorial caught my eye at once. It also illustrates a number of considerations around cross-product/format document interoperability.
   First, there is no discussion of how this ends up being expressed in OOXML. This is not, of course, the user's principle concern.
   Secondly, there is no discussion of how this might be handled in an interchange and collaboration situation (even when the collaborators are all using the same software product [version]).
   Third, this particular magic is accomplished by a combination of particular, independent features. It is easy to see how the developers of another product might not have done what is necessary to anticipate this case, even when also accepting OOXML.
   Fourth, the translation to a different document format (i.e., ODF) of this particular document might only be possible by resolving model incompatibilities. That is, what is being done (and visible only through the OOXML being interpreted) needs to be abstracted out enough so that the same effect in a different model is achieved when it is not achievable by some naive one-for-one feature translation. This is the kind of thing that involves global analysis, inference/backtracking techniques, and recognition of particular use cases that we tend to expect of expert systems using various AI-programming techniques.
   Fifth, the prospect of accomplishing the kind of transformation in the fourth level in a round trip back to the original format is near inconceivable.
   Finally, I don't expect the Harmony Principles to have this kind of reach in any near-term future. At the same time, I wonder how easy it will be to detect this sort of thing in a harmonizing document processor or will we be left with ham-handed and still complex rules (i.e., no multiple ToCs in documents) that are more complicated than at the individual feature occurrence -- there must be some contextual knowledge of other aspects of the document in order to recognize a disharmony in a benign-appearing local element.
   So, here's already more food for thought than I thought this interesting example would provide.


2008-05-09 Daisy Translators and OOXML

Dough Mahugh has posted (cache) a number of links on the recently-announced completion of the Daisy translator for Word. This is part of the support for accessibility of Microsoft Office programs. Oliver Bell's 2008-05-07 post (cache) has additional information.
   I hadn't realized that Daisy involves its own XML format and that documents need to be translated to the Daisy XML. This raises another interesting harmonization case and I will have to look into it more deeply.
   There is also Daisy translation for ODF and that will be an interesting basis for comparison and broader understanding of how all of this is intended to work.

2008-05-09 Microsoft's Office Labs

This may be a bit tangential. Sarah Perez reports (cache) the launching of the Microsoft Office Labs site. Although the labs develop trial-concept solutions that can be tried out by a community within and without Microsoft, there is also provision, through Community Clips, for contribution of tips, how-to's, and demos from the public. Depending on how this develops, it may also be a place to demonstration interoperability-achieving techniques for the Microsoft Office System.


2008-05-09 Obsessing About Cache Accession and Diary Catch-Up

Every day I notice more feed items that should be captured in this diary and ultimately incorporated/reflected in content here. I went on a cruise at the end of April and so now there is a backlog.
   I have been operating under the obsession that the backlog should be rolled forward from its oldest to the newest, so that the accession numbers that I use on files (see the cache catalog) will be chronological.
   The "duhhh" moment is reminding myself that the whole principle behind using accession numberings is the ease of claiming current material and then filling-in (accessioning, if you will) the backlog as time permits. The idea is to close off the backlog so that more is not created, and the backlog is cleaned up in any order that works while staying current with the latest.
   So, slap on the head and I will now get over that and start with current and fill in the other backlog at my leisure. The accession numbers don't have to be chronological with the date of the source material, they are chronological with the sequence of accession, and that can be out of time sequence to overcome an obsessive blockage (inappropriate perfectionism, perhaps).
   I intend to switch from Hard Hat Beginnings to Hard Hat Era once I eliminate the backlog and have more useful content in the notes compilations. So we are extending into May. ...


2008-05-08 AIR for nfoWare Fixtures?

As part of joining Twitter, I wanted a desktop Twitter widget so I didn't have to keep a browser window open to see "tweets."   All of the popular ones for Windows seemed to be implemented using the Adobe Integrated Runtime (AIR). I chose Spaz because it looked good from the web page, is open source, and runs on Windows, OSX, and someday soon, Linux.
There are some glitches in Spaz, but I like the ease with which I was able to install and setup. I also like the look of the application. This makes AIR another consideration for test applications, demonstrations, and nfoWare fixtures along with standalone Silverlight.


2008-05-08 ODFDOM Available

A Document Object Model implementation for ODF is now publicly available (cache). There are JavaDoc specifications of the interface, a Java 5 reference implementation, and source code of the reference implementation. ODFDOM version 0.6 is offered as a successor to the AODL and ODF4j packages and is part of the ODF Toolkit.
   One comment on the post asks for a lower-level implementation that ports to Linux, as a libODF.
   It is interesting that LGPL v3 is claimed as the license. I could not find a license statement in a quick look at the JavaDoc download, nor is there a license file included in the runtime (.jar) package. There is an XML comment in an identity.xsl resource file that is from OpenOffice.org. The XML comment mentions LGPL v3 and refers to an absent LICENSE file. It also does not indicate how to obtain the source code of ODFDOM. ~~There is no license information in the source-code package either. This messiness is not going to have me looking at the source code.~~ The source code files do include copyright and license notices at the top of each file and at the top of some other text files, although no LICENSE file is included. I trust that later distributions will be more methodical in how license information is conveyed, especially with binaries.
   Meanwhile, the availability of this model and a Java API for it is extremely valuable as a basis for exploring the prospects of a harmonized DOM.


2008-05-07 Standards Lawyering and Existence of Conformant Documents

There was a lengthy round of blog posts between Alex Brown and Rob Weir concerning whether the RelaxNG Schema published in ISO/IEC DIS 26300 (ODF) had any valid documents at all and whether the DIS 26300 specification (ODF version) or any other allegedly-ODF document is/could-be a valid ODF document. It is not entirely clear that the matter is resolved, depending as it does on fine points of related specifications and on what schema validation of an instance document is intended to signify with regard to validity of the document in hand. The problem is complicated by the introduction of other standards by reference, some of which (such as XML Schema) modify the interpretation of other standards (XML itself, in the XML Schema case).
   The debate strikes me as a demonstration of the kind of back-and-forth that occurs in solidifying our understanding of what a standard specification requires and how implementers can disagree until interoperability considerations require some mutual accommodation and community-wide clarification of what the requirement is. A specification may also be revised to eliminate the particular misunderstanding, however it is resolved.
   There will be further intense scrutinization of DIS 26300 (ODF) and DIS 29500 (OOXML) separately and in terms of their harmonization. Meanwhile, setting aside the bickering and indignation that is expressed in the exchange, the blog entries and their comments are worthy of review for how issues of this kind are sorted out or at least taken to a point where the dispute is abandoned for now. This is also a demonstration of how important public conversations and analyses are to the strengthening of understanding and ultimately the specifications:
   -
   - Rob Weir: Achieving the impossible (cache). 2008-05-07 post demonstrating executions of two schema-validation programs using the ODF Schema and the ODF Specification. This seems to be the end of the discussion at this point.


2008-05-07 WebCite and Archiving of Referenced Materials (via Paolo Massa)

WebCite (cache) is a system for preserving material at a given URL and providing a permanent URL from which the material can be retrieved even if the original material is later changed or the URL becomes invalid.
   This mechanism and its support by "dark mirrors" would be a valid alternative to the private embargoed collections that I include as part of the caching of cited material here.
   This article explains the motivation, the mechanism that is used, and the relationship to other approaches that are in use.
   I'm not changing the cache approach for nfoWorks just yet (one motivation being the preservation of an off-line copy that I can always access). It is useful to have WebCite for explaining the motivation for the caching of material and also for the embargo of that material for which redistribution has not been explicitly granted.


2008-05-06 Backlog Activity

I have been at work on repaving of the ODMA site and moving back into ODMdev work while I also accumulate reference materials here. There are some commonalities that I want to exploit and much of the toolcraft that I'll use applies for ODMA, nfoWorks and also nfoWare. The material will seem to be rather scattered until enough pattern emerges.
   There is also a backlog of material since 2008-04-17 that merits some diary entries. I am starting that today.


2008-05-06 What Does OAI Have To Do With nfoWorks?

The Open Archives Initiative has, as one of its features, agreements on metadata, a format for web-site catalogs of metadata, a protocol for document discovery, and additional information on access of archived content. There is a document interoperability case lurking in OAI. There is an useful description of the overall OAI situation, with useful links, on ZA3038 (cache), via David Weinberger.


2008-05-06 Processing Streams of XML

Harry Pierson has two web posts on processing XML in IronPython. Although .NET libraries are discussed, these posts are valuable for their appraisal of the different ways to process XML and the adjustments Pierson made to accomplish a task of his: Stream Processing XML in IronPython (cache) and Deserializing XML with IronPython (cache). I'll be looking for more along these lines, with a variety of programming platforms.

2008-05-06 Securing Open-Source Software

David Wheeler has built a comprehensive presentation (cache) on security considerations for open-source-software. There is guidance on practices for establishing confidence in treatment of security concerns. These apply to nfoWorks libraries and fixtures of all kinds. What Wheeler refers to as the BSD-new license and those compatible with it are the licenses that will be used for deliverables from nfoWorks.


2008-05-06 Standalone Silverlight Applications

Miguel de Icaza reports (cache) on a successful approach to running standalone Silverlight applications using either .NET or Mono with Moonlight (on Linux). The OSX version of Mono doesn't have the necessary library at this point.
   This is a way to do some rapid tooling of document fixtures and tests where the .NET or Mono runtime is usable. The application is essentially running an embedded server that hosts Silverlight locally, it would appear. Still, that's potentially useful. It should even run Popfly applications, methinks.
   I mention this to put down a marker for prototyping and demonstration prospects. If I can minimize GUI development as part of establishing nfoWorks libraries and fixtures, I certainly will.


2008-04-17 Information on ODF and OpenOffice.org Progress

I don't have a good place to park this information yet, and I don't want to lose track of it.
   The GullFOSS blog provides regular accounts of development activity for OpenOffice.org releases. There are periodic "development at a glance (cache)" posts that provide a snapshot of OO.o development activities, including those related to OpenDocument format features and planned synchronization with (anticipated/pending/committee-approved) updates and additions to the ODF specification. This is where I go to learn that there is something new to find on the OpenDocument TC site.
   Michael Brauer (cache), currently the co-chair of the OASIS OpenDocument TC (cache), contributes here.
   Mathias Bauer, another OO.o developer, has just posted a great "ODF Enhancements for OpenOffice.org (cache)" article that describes how ODF changes make their way into OO.o, with an interesting guide to how public suggestions make their way into the ODF specification.
   Robert Weir (cache), the other OASIS OpenDocument TC co-chair, follows-up on Mathias Bauer's post with further guidance on "Suggesting ODF Enhancements (cache)." There is a careful explanation of the measures taken to assure that suggestions are useable by OASIS and don't create intellectual-property problems for the specification.
   The information on ODF discussions and contributions is valuable to have before contributing and exploring the available ODF discussion lists.
   -- dh

2008-04-16 Licensing and Open-Source Preferences

nfoWorks deliverables are provided under open-source licenses. The web-site materials are all provided under a Creative Commons Attribution license. Software and its source codes are provided under a BSD Template license. This license is the open-source counterpart of the Creative Commons Attribution license.
   The idea is that so long as attribution requirements are satisfied there are no copyright-based constraints on the use, re-use, and derivative use of nfoWorks deliverables. These licenses are compatible with GPL, the GNU Public License (but not vice versa). These licenses are compatible with the creation of proprietary, closed-source software (but not vice versa).
   Attribution is important as part of the provenance and accountability that is expected as part of responsibly-built software that incorporates or derives from nfoWorks deliverables. It is important to identify the dependency so that any announcements of defects and security flaws in any version of nfoWorks deliverables can be checked against the dependencies in works that incorporate those deliverables. It is a simple matter to always know the dependencies there are on particular versions of other software, and to demonstrate that by making the dependencies known in a way that is available for inspection.
   The same conditions are honored in any incorporation of other software in nfoWorks deliverables.
   Because the GPL is not compatible with the BSD Template license and its brethren, examination of GPL'd source code is avoided. Although there is no aversion to relying on GPL's utilities and tools, GPL'd source code is not touched except for the unlikely case of submitting defect reports and repair suggestions back to the authors. Proprietary, closed-source programs, even if used in the nfoWorks laboratory, are not redistributed and only freely-available software is required for making use of nfoWorks source codes.
   Finally, these license considerations apply to copyright on software. In the event that a patent covenant is known to apply and might be unavailable for a different use of the code, precautionary notices will be attached to the software and incorporated in the source code.
   This policy has been expressed in various forms from time to time. It seems like a good idea to clear the air and summarize the considerations in this one place. -- dh


2008-04-14 Rapid Development, Throw-Aways, Refactoring and Resets

I am not very keen about throw-away code. My inclination is to refine and refactor, especially early on, and continue to grow around well-defined but extensible (and properly-versioned) interface agreements. That's sort of for an iterative progression of the kind I have in mind for nfoWorks.
   There is a tension between the desire for rapid development, easy experimentation, and having higher-level ways to use components versus dealing with performance, code footprint, and platform portability while traveling light. I don't mind throw-away demonstrations and exercises, and these are useful for others to see how to start getting their arms around a technology, too. I do mind having to do big resets and do-overs that lose the benefits of progressive improvements and knowledge-building. I don't mean to exclude refactoring born from experience, but refactoring is on code that is worthy of keeping. The tension I feel around saying how it will go in advance will probably dissipate when I get into the work, so I will now wait to see how I can shape this in practice.
    I think support for rapid use, and the use of higher-level development stacks (Java, .NET, XML transformation tools, etc.) is valuable. I'd even consider EcmaScript and languages like Python in this picture. These are great for demonstration, rapid trials, tests, and even reference implementations.
   At the same time, I envision progressive refinement of lighter-weight, lower-level solutions that probably involve C, C++, and binary COM interfaces, the latter as a way to provide contracted interfaces that are easily coordinated properly beneath high-level wrappers, whether Java, .NET, or something else (Gnome, for all I know).
   This reminds me of a conversation back in the 70's on a project where we were inventing "middleware" four ourselves. We started thinking of segregation into underware, middleware, and outerware. I want to support all three, possibly with different tool sets, and my attention is mostly on the first two.   It is entirely understandable that users of nfoWorks results will be working mainly in the third category. -- dh


2008-04-13 Classifying Sources and Resources

I think the last part of the first "About nfoWorks" to be completed will be the section on related work. I notice that I am a little puzzled about those activities that are not directly technical or that have some mix of advocacy and technical that tips toward advocacy. Also, I tend to favor those projects that deliver code that is intended to be used and built-upon by others, with secondary interest in ferreting out code that is buried somewhere inside of a particular product, even though open-source.
   I suspect this will not be so worrisome when I sit down and compile sources and resources, so that may be the next nfoWare Note folio that I initiate.


2008-04-12 Conversing on the Interoperability Forums

The Microsoft-sponsored Interoperability Forum (cache) opened-up as promised on March 20. It consists of three MSDN Forums: Interoperability Conversations, Technical Interoperability Scenarios, and Achieving Interoperability through Standards.
   There seems to be some sort of birth pangs, and I am not clear why there's not much discussion.   It may be the setting, it could be something that people have about Microsoft, and it could just be that people who are seriously interested in interoperability (at least two of us) haven't found their way here. There are other possibilities.
   I would like to see these forums thrive and be lively, but it may depend on who is willing to speak up and find value in forum-based conversations. -- dh


2008-04-11 About nfoWorks and Harmony Principles

A draft "About nfoWorks" page is now available on the site and linked from the home page. This provides a new (0.1 beta) explanation of Harmony Principles and more about how the work of nfoWorks will proceed. We are close to a turnover point for going full-bore turtle into hard-hat, technical hunter-gathering activity. The 2008-03-30 diary entry on Ramping Up still applies; I need to give more attention to those other prerequisites now.
   Today, there are new blog posts on recent events and the current state of nfoWorks. If you have comments about anything you see here, those posts are a good place to leave them:
   OOXML + ODF: ISO Steps In, Orcmid's Lair, 2008-04-11 (cache)
   nfoWorks: What Are those Harmony Principles, Again? Professor von Clueless in the Blunder Dome, 2008-04-11 (cache)
   --dh


2008-04-10 Rick Jelliffe Perspective on SC34 Activity

Rick Jelliffe has a great perspective on the just-concluded SC34 plenary meeting. Because he wasn't there, he offers it in a "Fake blog from SC34 meeting in Norway (cache)." Jelliffe offers up some important items to hold onto here:
        1. The main SC34 web site (hosted in Japan by the SC34 Secretariat)
        2. The page for accessing public SC34 documents (rewards exploration)
        3. A reminder that TrueType is connected with the ISO/IEC Open Type standard, maintained in SC34 (and relevant for nfoWorks) with a hidden reminder that getting Asian scripts right is probably one of the best demonstrations of harmonization going.
        4. An useful sketch of SC34's interests and responsibilities
        5. Another reminder of my own armchair critic status, something I am working to alter
        6. A discussion of the criticality of accessibility considerations and the resources that apply in the work of SC34; a topic that it will be essential to address with regard to harmonization
        7. An injunction to become involved and where to do that (OASIS, W3C, Ecma TC45, the national mirror of SC34 in your neighborhood, etc.)
        8. Links to the DIN NIA-34 update on the harmonization investigation (PDF file, cache), great work that nfoWorks should align with
        9. An interesting side comment about the use of topic maps to present ODF-OOXML mappings (although DIN is focused on translations, not mappings, because of a number of issues that translation surfaces, including round-trip degradation in collaboration scenarios)
        10. Another side comment on how the concern for synchronizing ECMA versions and SC34 versions of OOXML might be extended to the case of OASIS and ODF as well.
   --dh


2008-04-09 ISO/IEC SC34 Takes Over OOXML

ISO/IEC Joint Technical Committee 1 (JTC1) Subcommittee 34 (SC34) held a plenary meeting in Oslo, Norway on April 5-9. Alex Brown provides a comprehensive report (cache).
   SC34 proposes to create ongoing activities to carry out its responsibilities:
        1. IS 29500 (OOXML) maintenance
        2. IS 26300 (ODF) maintenance (pending OASIS agreement)
        3. Harmonization (with a proposed work-item expected from the DIN NIA activity)
   To start things off, two ad hoc working groups have been created.
   The first ad hoc working group will propose how IS 29500 maintenance should proceed, producing a proposal by 2008-09-01, one month prior to the next SC34 meeting. This ad hoc group is chaired by Alex Brown who will lead a two-day meeting in London this July. Participation is from SC34 member bodies and I take it that ECMA TC45 members are invited to chime in.
   The second ad hoc working group is being created to capture technical comments on IS 29500 and make sure existing analysis is not lost. Within 90 days (by July 2) there will be a mechanism in operation "to compile a list of comments on ISO/IEC 29500 received from NBs, liaisons, and the general public" and then to "publish the on-going list as an open document on the SC 34 website."
   In the resolutions from the meeting (cache), I note that the final text of DIS 29500 has already been created. SC 34 requests distribution to its members no later than May 1. I don't know what the delay will be before publication as IS 29500:2008 happens, and I'll beg a copy of the final DIS 29500 before that just to make sure I don't step into some element of harmonization that is impacted by BRM-approved changes (especially the various conformance statements that are new in the final text). Also, to make any contributions to identification of defects, it is important to reference the most-authoritative available documents.
   Here's what it looks like for intercepting DIS/IS 29500 activity:
        1. Usable final text available in May for provisional use (if it can be obtained) until official IS 29500 editions are issued
        2. Mechanism for receiving defects and related comments on IS 29500 operating in July.
        3. In September, 2008, SC 34 meets in Korea and takes next steps, with meetings every six months (figure March 2009 in Prague, September 2009 in U.S., then 2010 meetings in Sweden, then South Africa).
   Working groups that will be doing the technical work are yet to be set up and they will have their own meetings, conference calls, and mailing lists as well as ones synchronized with SC34.
   (I have a current passport with lots of room for visa stamps. Now I just need a sponsor for expenses/subsistence and a national body to nominate me to a committee. Hint, hint.) --dh


2008-04-08 Featuring OOXML

Today, Microsoft announced an OOXML initiative in the public edition of its Registered Partner newsletter. (If you have a business that involves reliance on Microsoft platforms, it is not particularly difficult to be a Registered Partner. The first thing you have to get over is any reluctance about having a Microsoft Passport -- Windows Live ID -- account and providing some information about your business relationship to Microsoft products.)

This comes under the umbrella of supporting organizational activities that intersect with standards for document formats. This entry is a placeholder before there is organized material on those activities. When the time comes, I will need to draw some fine line between pure-advocacy activities versus constructive development and adoption support for interoperability purposes.
   The Microsoft Partner Program is oriented to business management, particularly marketing and sales, whether for integrators, resellers, or software developers. You will sense that in the focus on building business and the Microsoft encouragement to promote and resell Microsoft licenses of various kinds as well as add your own value. The major vendors have programs like this, including ones to attract Independent Software Vendors (ISVs).
   Here is, as they say, the money quote on OOXML:

Focus is on addition of value via composite, collaborative applications that integrate with the Microsoft Office System as a platform. Featuring OOXML involves some specific support for the format, its automatic use, and, desirably, reliance on custom-content and niceties for all of this that are part of Office 2007. There is a "Featuring OOXML" mark being encouraged for adoption in product materials and packaging.
   Although I don't envision much direct intersection with nfoWorks, there are some interoperability considerations that may be of concern to this particular community (or not). The following come to mind:
   1. One way to introduce Harmony Principles and reliance on interoperability profiles (whatever those turn out to be) might well be with the addition of add-ons to the Microsoft Office System, exploiting the hooks for integration of alternative formats. It's too early to register anything like that.
   2. Just the same, it would be interesting to see what the single-slide featuring of OOXML in a harmonious offering might look like.
   3. How much, I wonder, will there be blurring of what the Office System, a specific software suite, supports for integration of OOXML solutions, and what there is about the strict reliance on what the format supports in an interchange situation. (The same questions applies to the OpenOffice.org provision for plug-ins and supporting either OOXML or ODF. I presume that the IBM product is yet-another potential interoperability silo.)
   I predict that Office Business Applications (OBA among friends) featuring OOXML will thrive even more than they already are. We will learn much about entropy and the erosion of standards-observance for preservation and interoperability in how this plays out. --dh


2008-04-07 Standards of Fidelity

Chasing after the Harmony Principles is going to raise questions about fidelity of document interoperability and the conditions for degrees of fidelity, whatever that will come to mean.
   I imagine that one measure of harmony will involve consistent presentation rendering of documents by allegedly harmonious processors. The "renderance (cache)" example on Dean Allen's Textism blog is an example of the not-always-subtle practical difficulty of determining rendering fidelity across platforms and processors. There is much to grapple with in terms of specified behavior and in terms of implementation glitches, identifying the deviants in the game, and profiling around the pot holes.
   My first thought is that the ability to produce identically-rendered PDF output is one kind of test that might be mechanically verifiable, bolstered with some experience-based, craftily-composed test cases. This is a second-order kind of fidelity, tied to printing models and their pipelines, and we still have to deal with differences of screen appearance even when printing seems to come out "right" (i.e., harmonious-enough). But PDF, now itself standardized, seems like the most viable stake in the ground for the moment. (Ultimately, we might throw a standardized XPS into the mix as a kind of honesty check between available final-form renditions and how harmonious products achieve agreeable fidelity.)
   This is not an easy problem and it won't be swallowed whole. There will be serious temperance through reliance on an extensive progression of the least things that could possibly work at each stage. I think that applies to the strictest fidelity that can possibly be verified, too. --dh


2008-04-06 Correlating the Standards

It is necessary to track the individual editions of standards under each standards authority. This is because standards often reference specific editions of other standards. Also, standards show up under more than one standardization process.
   For example, ECMA-376 for Office Open XML File Formats was the same as DIS 29500, the draft that was proposed for standardization under ISO/IEC JTC1. But IS 29500 will be a very different beast, altered from ECMA-376 as a result of the Ballot Resolution Meeting that preceded achievement of approval.
   As another example, the OASIS Standard for Open Document Format (OpenDocument) v1.0 of 2005-05-01 was submitted as DIS 26300 and approved via the PAS process at ISO/IEC JTC1. But the issued ISO/IEC standard, IS 26300:2006, is the OASIS Standard for Open Document Format (OpenDocument) v1.0 (Second Edition) Committee Specification 1 of 2006-07-19.
   To illustrate the complexity of the kinds of meanderings that occur, ANSI/X9 X9.100-181-2007 Specifications for TIFF Image Format for Image Interchange is a special adaptation of TIFF for the exchange of the images of bank checks among financial institutions. The images are in Group 4 bi-level encoding and the TIFF 6.0 specification is referenced in the abstract. That makes it sound something like TIFF/F, which was standardized by CCITT. I don't know what the references in X9.100-181 are, but to my knowledge TIFF 6.0 is not under the management of any standards organization. The specification was issued on June 3, 1992 by Aldus Corporation, which held the copyright until it was inherited by Adobe in a subsequent acquisition. [The Adobe rebranding of the document preserves the technical content and even the cover date, but the front-matter is modified, including elimination of the names of external contributors, a matter of some personal interest. --dh].
   So that dependencies can be tracked and the correct materials understood for references from other specifications, I propose to catalog and capture specifications in the following ways:
   1. There will be a separate sequence of pages (a folio in the nfoWare organization of web materials) for the progression of specification versions under a single authority (e.g., OASIS, ISO/IEC JTC1, Ecma International, IETF, W3C, consortia, and proprietary authorities such as Sun Microsystems and Adobe). The dependencies can be cross-referenced among the different sequences for specifications of interest for nfoWare.
   2. There are standards development activities that one might want to track for relevant background. It is not of interest for nfoWare to provide a historical account. Interest is in the authoritative editions of specifications. However, the development activities may provide important resources for questions, discussions of points of concern and clarifications that may be important to achievement of harmonious interoperability. [added 2008-04-07: We certainly want to know about errata and there may be current items of work that need to be looked at with anticipation.] A separate folio is used for any tracking of an individual standard development activity, contact information, availability of archives and discussions, etc.
   3. Available resources for implementations of a given specification, including test suites, conformance-verification tools, samples, reference implementations, and translation/conversion aids will be catalogued in one or more separate folios, probably by platform as well as particular standard.
   This is still sketchy. We'll try it out first with ODF materials and then with OOXML materials and refine things as we go. --dh

2008-04-05 Pouring the Foundation

I am looking at an upward-from the bottom organization of materials and tools. This means I am looking at processing functions more than user interaction. It is my usual inclination to not worry about the user-interaction level, if ever, until after the machinery for the process (the model, not the view, one might say) is established to an useful degree. Here are some of the layers that are at the foundation for the nfoWorks effort:
   1. File streams and their manipulation at the octet-data, binary level
   2. Containers, such as Zip files, with their items, directories, and the various uses of compression, digital checks and signatures, and encryption
   3. Character encodings using single- and double-byte encodings, Unicode
   4. XML and its processing and transformation technologies
   5. Other, similar technologies and formats that are employed with documents but are not confined to documents (e.g., image formats)
These are not directly about ODF or OOXML. They are relied upon instrumentally, but they are not a central focus of nfoWorks.    They are, however, relevant more generally as nfoWare (and DMware, in the case of XML).
   Attention to foundation elements will be jointly covered in a way that the use on behalf of document harmonization is featured here, with the general treatment under nfoWorks.

   One area I am not clear on has to do with diving into the standards and specifications for these as a consequence of their incorporation by reference (and restrictive profile, or not) in the specifications for ODF, OOXML, and other document technologies that matter for nfoWorks. --dh


2008-04-04 Pondering the Accumulation of Clippings

With my attention on all of the elements that go into realization of Harmony Principles, I have started noticing way too many potentially-relevant sources and links in my web log feeds. I've jotted down 21 in my notebook since yesterday.
   I could use de.licio.us, but that actually takes longer, needing to leave my Outlook list of unread posts to go to the actual web pages. Also, that doesn't get the information closer to wear it belongs, which is somewhere here on nfoWare.
   I could put a de.licio.us feed here, and I might do that, but it isn't a way to capture the material. I need a way to capture, here on the site, and also have a way to publish it as a feed for anyone who cares. (It would be useful to do that with this diary too, and I have been short-sighted about that.)
   I could also finally bite the bullet and put up a wiki. The hosting service for nfoWare has a setup for MediaWiki, my favorite. I will consider that, especially as a way to be more inviting for community participation.
   Thinking about it right now, I do fancy the idea of editing directly into an RSS feed so that it is web-presentable and can be subscribed to.
   There are more cases to consider, including use of AtomPub, applying Windows Live Writer to the task, and so on. My pending infrastructure update and use of a Windows Home Server may also provide further opportunities. There may be an application of social software here.
   Still pondering ... --dh


2008-04-03 Sources of Published Standards

There are different sources to watch for published standards. For those specifications that are sold by various national bodies and non-governmental groups, such as ANSI and ISO, there are on-line catalogs.
   I have learned that Thomson Scientific operates TechStreet, a "World Standards Marketplace" promising access to the world's largest collection of industry codes and standards.
   Techstreet is apparently a reseller. I am not that thrilled about purchasing standards in this way, but the Techstreet site and their e-mail newsletters do provide useful information about newly-published standards around the world. Featured recently,
   - ANSI/X9 X9.100-181-2007 Specifications for TIFF Image Format for Image Exchange. This covers a specific application of TIFF used in the exchange of check images among financial institutions.
   - ISO/IEC 12207:2008 and 15288:2008 Set: Systems and Software Engineering - Life Cycle Processes
   There are ones that will be pertinent to nfoWorks, and I'll need to find much more economical access to those. TechStreet is an useful way to learn of specifications as they become available. --dh

2008-04-02 Tracking Defects and Analyses

There are a number of sources of analysis into the defects of public formats. Translator projects identify issues in the specifications and also in the ways that conflicting implementations arise. I will be tracking and contributing to the public activities of this kind.
   Opponents of OOXML have extensive compilations of defects. I am not interested in the attitude and point-of-view that leads to these materials, but I do want to keep an eye on them as pot holes to look out for on the highway to Harmony. Here are a representative few that I have in my collection. Their occurrence here does not mean that I concur with the claims about defects nor with the significance of the alleged defects:
   1. Reuven Lerner: OOXML: Why Is It Bad, and What Can We Do? Blog entry, OStatic, 2008-04-02 (cache)
   2. Rob Weir: OOXML's (Out of) Control Characters. An Antic Disposition (web log), 2008-03-24 (cache)
   3. Rob Weir: How Many Defects Remain in OOXML? An Antic Disposition (web log), 2008-03-18 (cache)
   --dh


2008-04-02 Model for OOXML Development Resources

On March 31, Doug Mahugh posted his guide to "Open XML Resources for Developers. (cache)" This is a great compilation of the kinds of materials that I want to look for in regard to the other standards that support document interoperability, including ODF, PDF, and the additional standards for various document constituents (TIFF, SVG, MathML, etc.). This, along with guidance to various open-source implementations, is a great format for me to follow in compilations here. --dh.
   [update 2008-04-03: Erika Ehrli's April 2 "Happy News for Open XML Developers (cache)" provides an additional list of resources and links, including ones not listed by Mahugh.]


2008-04-02 ISO Approves DIS 29500 (OOXML) as International Standard

The ISO announcement (cache) of the conclusion of DIS 29500 balloting after the Ballot Resolution Meeting (BRM) and reconsiderations by national bodies was made generally available today. DIS 29500-II (incorporating changes agreed at the BRM) will become ISO/IEC International Standard 29500:2008.
   If there are no formal appeals from any national bodies in the next two months, we should expect to see availability of the IS 29500:2008 some time this Summer, providing a clean version with all approved changes.
   In the meantime, nfoWorks will rely on ECMA-376, the ECMA Standard for Office Open XML File Formats. I'll keep an eye out for areas that are likely changed in IS 29500. There is much to do before that becomes a serious concern. --dh


2008-04-01 Being Stopped

I noticed, in "The Unbearable Overwhelm of Technical Debt (cache)" that I was being stopped around all of the incomplete efforts that I am juggling, including all the ones that I have forgotten about.
   For here, the sticking point was the unfinished repaving. I also noticed that I needed it to become April, so I could easily create more folios on nfoWorks technical matters. There was a silly obstacle. In my web-site development methodology, it is easy to use pages in one folder as boilerplate for pages in another folder. It is harder to use pages in the same folder as boilerplate for more pages. Now that I am starting the notes folder for April 2008, I can use material from the March 2008 folder. It is that simple.
   Of the dependencies noticed in my 2008-03-30 entry, there is one more of an infrastructure and plumbing nature: I am also going to upgrade the SOHO development systems and add a shared server for backing up and then archiving all nfoWorks and other development materials.
   I am now more empowered to start collecting materials and cataloging the resources needed to begin Harmony Principles experimentation. --dh


2008-03-31 and Earlier Diary Entries