tools for document interoperability


This is the web diary for nfoWorks and realization of the Harmony Principles. Pursuing Harmony tracks nfoWorks research, analysis, specification, and implementation of tools for document interoperability. There is commentary on related activities that address conformance, interoperability, and harmonization of document formats.

Click for Blog Feed
Blog Feed

Recent Items
Republishing before Silence
The Real Challenge of Achieving and Sustaining Int...
Let’s Try This for a While
Hooptedoodle: Blog Aversion and Standards Ignoranc...
ODF Implementation-Support Toolkits and Libraries
Adding Pursuing Harmony to Technorati
Office Shots for Confirmed ODF Interchange Fidelit...
ODF Interoperability at The Hague
ODF and IPR/Licensing Concerns
Open Government Data: Simple Principles

This page is powered by Blogger. Isn't yours?

Locations of visitors to nfoWorks

The nfoCentrale Blog Conclave
Millennia Antica: The Kiln Sitter's Diary
nfoWorks: Pursuing Harmony
Numbering Peano
Orcmid's Lair
Orcmid's Live Hideout
Prof. von Clueless in the Blunder Dome
Spanner Wingnut's Muddleware Lab (experimental)

nfoCentrale Associated Sites
DMA: The Document Management Alliance
DMware: Document Management Interoperability Exchange
Millennia Antica Pottery
The Miser Project
nfoCentrale: the Anchor Site
nfoWare: Information Processing Technology
nfoWorks: Tools for Document Interoperability
NuovoDoc: Design for Document System Interoperability
ODMA Interoperability Exchange
Orcmid's Lair
TROST: Open-System Trustworthiness



ODF Implementation-Support Toolkits and Libraries

I have no appraisal of the relative maturity and quality of the various toolkits that are emerging on the ODF scene (and likewise with regard to OOXML).  However, it is important to have a cataloging of what there is.  This is a random start.  I will add to this post and build an nfoWorks catalog page later:

  • lpOD: languages & platforms OpenDocument Project (also Français). 
    Definition of a Free Software API implementing the ISO/IEC 26300 standard.
    Development, for higher level use cases, in Python, Perl and Ruby languages.
    of a top-down oriented API.  Licensing is under Free Software Foundation (FSF) versions.

My interest

An important resource for ways to harmonize document formats involves attention to the libraries and models employed for constructing document-centric software and their applications.  This applies for the development of testing and conformance tools as well as for implementation of format-supporting software products.  Indeed, one might reasonably expect that such tools would be a companion demonstration of implementation-support quality.

In the interesting case of OpenDocument Format, the availability of open-source code bases for implementations is both a risk (in that deviations or omissions in support for the standards is are perpetuated through code mimicry) and an opportunity for faster tooling and testing.  Of course, closed-source implementations (and related toolkits) have their own dangers in this regard, while denying public inspection of the code.  I suspect that implementation notes are required in all cases to ensure understanding of intentions and interpretations as well as limitations and the different ways that discretionary matters are handled.

For ODF, the continuing work on toolkits and on independent open-source implementations is providing important diversity.  This can inform the search for a harmonious profile and perhaps suggest adaptations that encourage harmonious implementations.  Diversity across platforms and programming models may also help in the recognition and abstraction of essentials away from implementation incidentals.  That can also be valuable in ensuring that harmonization is on essentials and not accidents of implementation.

I will be reviewing available toolkits, libraries, and APIs as I define my own around interface contracts for abstracted levels of document models and processing support.  I expect some cross-fertilization while adhering to a model that is concentrated on harmony.

Labels: , ,



Office Shots for Confirmed ODF Interchange Fidelity

The new service received a fair amount of attention at the recent ODF Interoperability Plugfest.  Taking a page from the “test your site with all browsers” tools that are available, Office Shots will take an uploaded ODF document and show how it renders in different ODF-supporting products.  To deal with the problem of confirming appearance of the document back to the submitter, the rendering by each application is captured in PDF.

This is a fledgling service, currently in limited beta.  It is sponsored by the same Dutch organizations that sponsored the ODF Plugfest.

The power of the service is its user-relevant confirmation of the fidelity with which a document of interest is rendered by different ODF-supporting software/platform combinations.  It is an easy way for evaluators to verify whether their important documents are rendered successfully in interchange among ODF products.  It also allows the subjective determination of success to be left in the hands of the users who know what qualifies as acceptable fidelity in each particular case.

One of the most-difficult situations in interchange of documents is when the receiver is seeing something materially different than what the sender (1) had in mind and (2) expects has been communicated.  For the parties to communicate about a suspected difficulty, they need to use a “channel” that differs from the one that has apparently failed.  Screen shots serve that purpose.  PDF is also valuable in the case where a PDF can be extracted that accurately-enough reflects what is intended and/or what is being seen.

Office Shots provide a way to proactively check, either because a problem is suspected with a local rendition or to ensure that a document and the choice of implementation-supported features is treated consistently by a variety of other implementations/platforms.

One can imagine that, over time, we could see Office Shots support links for troubleshooting specific discrepancies, finding practices for avoiding many of them, and easy reporting of problems to development teams.

Office Shots promises to provide a terrific reality-based approach to confirming the interoperability of ODF implementations as far as presentation fidelity is concerned.  This is also a first-line check on confirming difficulties with round-trip inter-product fidelity preservation.  (Of course, if the goal is solely presentation fidelity, PDF and other final-form formats may be preferable, especially when long-term preservation is also a consideration.)

I look forward to the impetus that Office Shots will provide to user recognition of practical ODF interoperability considerations.  I also think it will provide important stimulus and confirmation for developers who want to improve the interoperable use of their ODF-supporting software.

Beside the site, there are other discussions of the project and its potential:

  • Glyn Moody: ODF and the Art of Interoperability.  Open Enterprise (blog), ComputerworldUK, 2009-06-19.
  • Sander Marechal: Easily testing ODF compatibility (odp, pdf).  Presentation to the ODF Plugfest, 2009-06-15.  [In this case, the PDF renders more poorly than the ODP on my computer.  I assume the problem is in the production of the PDF via the ODP implementation, yet another Officeshots interoperability case.]
  • Sander Marechal:  Product submission,, 2009-02-06.

Labels: ,



ODF Interoperability at The Hague

There’s a great event at The Hague these two days: June 15-16, 2009.  It’s all about OpenDocument Format (ODF) and interoperability

It is sponsored by a neutral (ODF-supporting) organization. It is attended by major implementers of ODF-supporting products, including IBM, Microsoft, and Sun Microsystems.

In short, all of the right people are in the same room, some for the first time, and I am so envious that I am not among them.  There should be a great deal of creative tension.

I will be watching for materials and progress reports.  There is already Doug Mahugh’s useful pre-event post on how Microsoft tested the ODF implementation in Office 2007 SP2 to ensure that it only produced standard-conforming documents and failed in ways that did not introduce security exploits against the Office System or documents of its users.

I have been meaning to post more about my involvement with ODF and how it is fueled by my interest in the harmonious level at which we can start and expand interoperability based around standard, open formats for office-productivity applications.  I will do that separately.  For now, I just want to register my excitement for the positive stage that participation at this meeting represents.

[Update 2009-06-16-18:56Z There are little odds and ends available from the ODF Plugfest so far, and I will compile some links here for safe-keeping.  I am sure there will be additional blog posts and reports by more attendees after they have had some time for reflection]

[Update 2009-06-17-17:11Z with a few more straggling in]

[Update 2009-06-18-17:51Z as other posts show up]

[Update 2009-06-23-14:55Z with some stragglers]

[Update 2009-06-24-18:55Z and one more interesting appraisal]

  • 2009-06-23 Sven Langkamp: ODF Plugfest.  (blog post) Sven’s Blog.  Useful perspective regarding participation by KOffice, an independent implementation of the ODF specification.

[Update 2009-06-27-21:40Z and the hits keep on coming …]

[Update 2009-07-01-15:25Z wrapping up, with anything more on plugfests in future posts]

Labels: , , , , ,



ODF and IPR/Licensing Concerns

Here are some apple-orange notions that have come to my attention in an oddly-convergent way.

New OASIS Technical Committee IPR Mode

OASIS has just announced the pending addition of a 4th IPR Mode to the set that technical committees can use as the way intellectual property (mainly essential claims of patents) will be made available to adopters of a TC-produced specification:

  1. RAND Mode, requiring the essential IPR of participants and contributors to be licensable under Reasonable And Non-Discriminatory terms
  2. RF on Rand Terms Mode, a Royalty-Free RAND mode that may have certain limitations
  3. RF on Limited Terms Mode, where the limitations allowed to RF on Rand Terms are not allowed
  4. Non-Assertion Mode, the new mode in which all contributors and participants make a non-assertion covenant with regard to the specifications that obligate them to do so

The ODF TC operates under the RF on Limited Terms Mode, the most-generous mode available until now.  As stated under the OASIS IPR Policy, a TC may not change its IPR Mode without closing and submitting a new charter.  I don’t expect such a shut-down and restart to happen, especially before ODF 1.2 becomes a ratified OASIS Standard.

Many will welcome this new mode.  I know that my willingness to participate in OASIS Technical Committee activities increases exponentially as we move down the list.  The RF on Limited Terms and the new Non-Assertion modes are the only ones that I have no hesitation about. 

The Non-Assertion Mode is comparable to everyone obligated by the IPR mode having automatically made an equivalent of the Microsoft Open-Specification Promise with regard to the specifications produced by the TC during their participation. 

Of course contributors, participants, and anyone else can provide non-assertion covenants with regard to any specification, as Sun Microsystems did for ODF in September, 2005.

Implementation License Models and Interoperability

The licenses under OASIS IPR modes apply to implementations of the applicable specifications, such as ODF.

I have recently been dealing with provisions of the ODF specification that do not seem to be understandable on their own, not even by consulting referenced source materials.  In that case, there is no way to ensure interoperability without consulting an implementation or two.  In complex cases (such as figuring out how to decrypt an ODF document that is encrypted using the approach sketched in the ODF specification), it is actually necessary to inspect code to determine what the missing but essential details might be.  (It would be better to find implementation descriptions that explain how the specification is being satisfied, but too often the code is the only reliable implementation description.)

When the code is available in an open-source implementation, it may be possible to reverse-engineer an implementation-independent interoperable interpretation.  That is what I would look for, assuming that I could master such code well enough to resolve questions the specification leaves open. 

Consulting code works for detective work around clarification and hole-filling of the specification.  If I want to make an implementation based on that interpretation, I must be especially careful about the license on that code.  For example, LGPL and GPL code and other reciprocal-license open-source software is not useful to me in producing software under a license that I prefer (Open BSD, Apache, etc.).   I am cautious about digging around in voluminous code anyhow, but I am particularly wary about risking that I might copy GPL code.

In this case, I am reluctant to rely too strongly on an abstracted interpretation unless the specification itself is updated and issued with an interpretation I can then safely rely on.

In effect, specifications that are sufficient for implementation-independent achievement of interoperability, along with royalty-free licenses or covenants, provide the ultimate clean-room support for achievement of unencumbered independent implementations.

That’s what I’m after.

Labels: , , , ,



Interoperable ODF: Finding Ground Truth

Jesper Lund Stocholm has found his files from the Microsoft Document Interoperability Initiative ODF Workshop.  His post, "DII ODF Workshop - the good stuff", shares the nitty-gritty on-the-ground experience of transferring ODF documents from to Microsoft's pre-beta Office 2007 SP2 implementation and back again.  There's a download of eleven test files, each in two forms, along with PDFs of how they render.  There's an version of each document.  Then there's the Microsoft Office 2007 SP2 pre-beta ODF saving of the same document.  This is enough to discern how the the two applications handle application-specific features from other applications and express application-specific features of their own.

There are some great lessons becoming available with regard to interoperable use of document formats.  Here's what I see in terms of the Microsoft Office and implementations of ODF:

  • Being standard is not the same as being interoperable. 
    Lund Stocholm points out, "The result of the validation is that all files generated by Microsoft Office 2007 SP2 are valid ODF 1.1-files."  The validation is essentially syntactical and that is not going to deal with all of the tolerated implementation variability, semantic bugs, and need for out-of-band agreements where the specification is (purposely and perhaps valuably) left wishy-washy.
  • There's a tremendous amount of binary information packaged in OO.o 2.4 and Office 2007 ODF document implementations.
    This information is carried in outside-of-ODF namespaces and MIME types for which there is no mutual agreement.  This can be reconciled among the different implementations, and we might expect more harmony before Office 2007 SP2 ships, assuming there are no intellectual-property difficulties not covered by existing non-assertion covenants.  This is a tricky area with socio-political and competition-law ramifications (illustrated by how no one seems to be bothered by the amount of binary material used in OO.o's implementation of ODF).
  • ODF-specification versioning is going to bother us for years, if not forever.
    Version churn is going to be a serious problem until those able to insist on demonstrable interoperability among applications compel some rational process for dealing with specification and implementation incompatibilities and defects,  The stakes are now raised for achieving useful up- and down-level accommodation of specification and (deviating but widespread) implementation versions.  Although I can see no way the ODF spreadsheet-formula problem could have been avoided, in particular, we must face two painful situations:
    • XML namespaces for ODF are not dealt with as contracted interfaces with explicit discrimination of additions and changes between versions of the specifications.
    • Requiring private agreement on spreadsheet formulas through at least ODF 1.1 is going to force dealing with at least three versions in the future, something like
      - a Microsoft Excel formula namespace (better: an ECMA-376 or IS 29500 one),
      - an formula namespace,
      - the default ODF OpenFormula namespace when finally introduced into ODF
      - versions of the above with their individual defects and incompatible implementations
  • It's the application [stupid?]
    People don't deal with formats and the nuances of format versions, allowed options, and private agreements.  People deal with software and the quality (and fidelity) of the electronic document that the software provides.  Expecting individual users to be self-consciously attentive to limitations on conformance and interoperability is even more hopeless than demanding meticulous adherence to security policies and practices in ordinary office work.  What people do want is for their interoperability case (however articulated) to just work.  In reality, even "Save as ..." is asking too much.
    • The first part of this lesson is going to involve recognition of the degree to which end-users are going to address interoperability by choosing specific software and believing interoperability is achieved, the ever-popular solution.
    • The second part of this lesson is recognition of the distance between the current state and one with broader interoperability and confident substitution of alternative software choices.  The differences among major ODF implementations will reveal how easy it is to lose interoperability while conforming to the current specifications.
    • Ultimately, we may have to accept that we are unwilling to pay the price for significant interoperability assurance except under extraordinary circumstances.  The "cost of interoperability" debate is ahead of us.

I don't foresee the Harmony Principles alleviating this situation in any way.  At best, I expect it to help us appreciate the cost of interoperability and its improvement over time.

Labels: , ,

Construction Structure (Hard Hat Area)
Creative Commons License You are navigating nfoWorks.
This work is licensed under a
Creative Commons Attribution 2.5 License.

template created 2008-08-13-18:06 -0700 (pdt)
$$Author: Orcmid $
$$Date: 13-11-11 19:13 $
$$Revision: 3 $