Harmony Initiative:
Initial Approach

 

tools for document interoperability


nfoWorks > Harmony Initiative > Initial Approach
  1. Harmony Principles
      
  2. Deliverables
      
  3. Incremental Development
      
  4. Start-Up Activities 
      
  5. Related Work
      
  6. References and Resources

The Harmony Initiative provides focus on interoperability of document formats and their processing within the general nfoWorks efforts.  The Initiative steers nfoWorks toward achievement of interoperability in the cross-employment of heterogeneous document formats.

We're also announcing a document interoperability initiative to ensure that the documents that are created by users are fully exchangeable, regardless of the tools that they are using.

-- Bob Muglia, Senior Vice President, Server and Tools Business,
Microsoft Corporation, February 21, 2008 [1]

The Initiative explores just how well documents can be made fully exchangeable, with particular attention on how that is done with a mix of different OpenDocument and Office Open XML format implementations.

The first question: What prerequisites and restraints must be satisfied to ensure that documents are fully exchangeable and users can be confident that is the case?  What tools can assist in the appraisal of the interoperable exchange of individual documents.

The next question: Is there enough harmonization for users to willingly create, collaborate, and preserve their work using only harmonious features of documents?

1. Harmony Principles (0.1 beta)

First sketched on February 7, 2008 [2], the Harmony Principles govern software tools and products that accept harmonious versions of standard document formats and faithfully produce harmonious versions in any of those formats.

1.1 Conditions of Satisfaction

1.1.1 Users can be confident their creations honor the format standards and depend only on their harmonious features.  The Harmony Principles are honored by default.

1.1.2 Users can be confident that documents confined to a particular profile of harmonious features can be interchanged and interoperated with via any software programs that honor the Harmony Principles for the class.

1.1.3 In the event that a document relies on features beyond the harmonious level supported by a software product, profile-allowed limitation to supported features is explicit, automatic, and user-understandable.

1.2 The Principles

1.2.1 Interoperability Classes: Applicability.  The Harmony Principles apply where there are standard formats intended to carry electronic-document information of essentially the same nature and having compatible needs for fidelity.  These are natural categories for interoperability.

For example, the word processing documents of Office Open XML format (OOXML) are in an interoperability class with the text documents of OpenDocument format (ODF).  One would not expect to find Rich Text Format (RTF, a format under a private authority) or SGML (an open standard for a different approach to documents) in this class.  It is conceivable that there may be future additions to a class  (e.g., DocBook, a specific type of SGML/XML document that might be amenable to harmonization profiling and translation) or, more likely, provision of import and export translators to formats outside of the class.  Spreadsheet documents would be in a different interoperability class. 

1.2.2 Standards Compliant.  Only open-standard formats are supported in an interoperability class.

1.2.3 Always Harmonious.  No features of a standard format are supported that are not perfectly and recognizably represented in all of the formats of the class.  Harmonious features are accurately expressible in any of the formats.  Unsupported features encountered in input documents are suppressed in a graceful way; an understandable account is provided.

1.2.4 Specifically Interoperable.  Standards and their harmonious features evolve; so do individual software implementations.  Programs and their users can establish profiles that limit the (versions of) harmonious features relied on in particular documents.  Programs will conform their employment of features to the requirements of such profiles.

1.3 Becoming Definite

These principles are abstract and indefinite in many ways.  Further development will provide definite characterizations, especially for the various qualities to be satisfied by harmonious documents.

1.3.1 Identification of Interoperability Classes.  The initial interoperability classes will be more-sharply defined.

1.3.2 Identification of Open-Standard Formats.  The qualification of public, open standards will need to be tightened.  The preference is for free availability of an ISO International Standard specification (including ISO/IEC ones).  Other cases will be by exception.  Specific criteria will be required.

1.3.3 Specified, Measurable Fidelity and Interoperability.  There are different kinds and degrees of faithfulness and suitability in a given context.  The notions of fidelity and harmonization will need to be made definite.  The simply-stated goal is exact matching of content, interpretation, and presentation.  Context, qualifications, and explicit measures are required.

1.3.4 Profile Verification.  There must be definite ways to verify the conformance of an electronic document to the standard for its format and also to a prescribed profile of harmonious features.

1.3.5 Degradation of Excluded Features.  Graceful degradation of excluded features must be defined, accounting for how users are to understand and to influence what happens.

1.3.6 How Much Harmony Is Enough?  We do not know what the threshold (or thresholds) might be under important usage conditions.   Exploration starts with seeing how few harmonious features are enough to be useful for anything, expanding until there's an acceptable minimal set for for one or more categories of usage.  The value of additional levels needs to be assessed and the levels well-defined.  The impact of look and feel and user experience will need to be understood and addressed if it is found to be a critical factor.  There are other external conditions that may also have to be considered.

2. Deliverables

All deliverables for nfoWorks are provided under open-source license and made freely available for download and use.  Source code and related development materials will be published and tracked on a public open-source development site.  Initial research activity will identify the opportunities to reuse and contribute to existing work.  Unique nfoWorks deliverables will address the specific achievement of the Harmony Principles.

2.1 Software Libraries, Utilities, Tests, and Reference Implementations

2.1.1 Existing Freely-Available Materials.  Existing materials will be identified, along with guidance for obtaining them from authoritative sources.  Some materials will be mirrored on the nfoWare site as a convenience and for reference.  It is not intended that nfoWare provide general redistribution of existing material.

2.1.2 Building Libraries.  Libraries will be founded on native code (in C and C++ source code) that can be ported between different operating-system and hardware platforms.  The Library APIs and interfaces will be amenable to integration into higher-level libraries that deliver frameworks for use in Java, .NET, Python and other programming and integration models that support native-layer "interop."  All libraries will be constructed with and useable with freely-available tools and compiler systems.

2.1.3 Utilities, Test Programs, and Tests.  Primarily designed for command-line and batch operation, utilities and test programs will be developed in appropriate higher-level languages when possible.  Test data, sample documents, and stress cases will be provided as they are developed.

2.1.4 Samples and Reference Implementations.  Samples and skeletal applications will be developed and provided to demonstrate use and employment of the Harmony Principles and nfoWorks libraries.  A reference implementation for a document-processing desktop application may be considered.  Performance, usability, and general fit and finish is not required to be at a level of quality or support required of end-user productivity software.

2.1.5 Reusability, Not Product, Ambitions.  The ambition for nfoWorks is to provide libraries and utilities of a quality level that has them be valuable and attractive for use in closed-source and open-source products where realization of Harmony Principles is important.  The nfoWorks deliverables are meant to be a source of consistency for document interoperability.  It is more important to encourage and support adoption in products than engaging in direct product delivery.  Community adoption and participation is preferred.

2.2 Analysis and Guidance

2.2.1 Determination of Harmonization.  An important output of nfoWorks activity is documented analysis of the selected specifications and the way that features are partially or entirely harmonizable or excluded from harmonization (whether as a temporary expedient or until standards-development provides better resolution).

2.2.2 Harmonization Guidance.  The nfoWorks experience will lead to guidance on how to safely navigate the feature sets of different standards and their major implementations.  The analysis and guidance will be used in recommendations concerning profile agreements and in suggestions to standards-development projects conducting maintenance on standards.

2.2.3 Harmonization of Harmonization.  Wherever possible, nfoWorks will be aligned with other harmonization activities and producing collaborative results, whether at nfoWorks or other readily-accessible location, is primary.

2.3 Documentation and Specifications

2.3.1 Specified Protocols, Interfaces, and Behavior.  The nfoWorks libraries and profile guidance will be supported by careful specifications that can be confirmed by inspection and tests.

2.3.2 Documentation for Usage and Adaptation.  Documentation sets will provide information and examples of usage of the libraries and utilities.  There will also be documentation to support adaptation of nfoWorks software for customization and extension.

3. Incremental Development

To provide steady progress and definite results, an increment approach will be applied.  This involves iterations of additions and expanded functionality to a software base that is always working, whether or not considered particularly useful.

3.1 The Least that Can Possibly Work

3.1.1 Starting with the first iteration, software deliverables can be built and deployed.  The software will perform a complete, end-to-end process, no matter how rudimentary the features may be.  The idea is to demonstrate a simple harmonization case and then expand the set of demonstrably-harmonized features.

3.1.2 At every iteration, the least that is needed to provide some minimal feature set or feature expansion will be introduced and tested.

3.1.3 Intermediate results may be incorporated in specialized document-processing software and utilities as further demonstration of usability.  The purpose is to build confidence in the operation of the software and encourage its adaptation to practical purposes.  The primary effort will be toward increasing the set of harmonious feature implementations and translations.

3.2 Availability of Tools, Test, Software and Experience

3.2.1 The material and the results of each iteration are made available in development folders of this site.

3.2.2 The code base, all changes, and downloadable results are maintained as open-source software and kept available and current on an open-source project site with full source-code control and bug-tracking support.

3.2.3 Everything needed to reproduce the construction and confirmation of software and tests is provided.  Those wanting to contribute to the tools or make specialized versions of their own can begin by replicating the construction of the appropriate version.

4. Start-Up Activities

Initial activities focus on gathering information and resources that are needed for the commencement of analysis and experiments.  When a starter set of initial materials has accumulated, new activities can start in parallel.  Research and collection of information and resources will continue.

The results of these activities will be observable in the growth of the nfoWorks Notes Catalog.

4.1 Gathering of Specifications, Analyses and Sources

4.1.1 nfoWorks notes will provide references to the sources and the specifications of relevant standards.  There will also be cache's of the material for preservation and reference in nfoWorks activities.  Only freely-redistributable ones will be accessible on nfoWorks.   Instructions will be provided for independently obtaining any of those materials that remain publicly available, whether free or for sale.

4.1.2 Related analysis efforts will also be catalogued and tracked in nfoWorks notes.  The notes will provide information on participating in the related work and on obtaining available materials.  Some materials may be cached on nfoWorks as well.  The efforts associated with standards development are explored first, followed by relevant privately-conducted but public efforts.

4.2 Collection of Usable Software, Documentation, and Examples

4.2.1 We are interested in freely-available software provided for working with standard formats, including software for translating to and from other (standard) formats.

4.2.2 The initial effort consists of cataloging what is available and identifying how it is obtained.  Software that is free to use and redistribute without limitation is preferred.  Source code that has no limitation on derivative works and their licensing is desired when that code is usefully adaptable for nfoWorks use.

4.3 Collection of Supporting Tools and Utilities

4.3.1 Tools and utilities for building nfoWorks software and tests are cataloged and collected.

4.3.2 When the tools and utilities are of general use and useable for more than nfoWorks, the collected software and supporting documentation may be hosted elsewhere.  Notes at nfoWorks will link to the general information and relate it to the specific use in nfoWorks projects.

4.3.3 It is required that all of the software-development projects for nfoWorks be freely reproducible, with or without modification.  All tools and utilities used will be ones that are freely available and having no limitation on their use. 

4.4 Overlap with Other Activities

4.4.1 Some of these efforts are ongoing and will continue beyond the commencement of analysis and experimentation efforts.

4.4.2 Other effort can proceed once there is a "starter set" of the initial resources.

4.4.3 There are also non-nfoWorks efforts underway, and these will impact the pace of nfoWorks development.  For now, nfoWorks is moving along in a leisurely pace

5. Related Work

5.1 Availability and License Considerations

5.1.1 Preference is given to related work that is available to the public.  Works that are freely-available are the first choice and the ideal case consists of material under a Creative Commons Attribution license, or equivalent.  (Public domain works with a known authorship will be treated the same.)

5.1.2 For software projects that provide source code, that code will be relied upon only if it is furnished under a license compatible with the Apache License 2.0 furnished for nfoWorks software deliverables.

5.1.3 Software tools and utilities having reciprocal licenses (e.g., the GNU Public License, GPL) will be used and redistributed only in binary form without modification.

5.2 Standards Development

5.2.1 DIN NIA Working Group on Translation 29500-26300. This working group of the DIN (German Standards National Body) mirror of ISO/IEC JTC1 SC34 proposes to identify the differences in IS 29500 (OOXML) and IS 26300 (OpenDocument) that need to be understood to accomplish harmonization and interoperability.  An initial working paper is available.  This work will also track the existing translation projects.

5.2.2 ISO/IEC JTC1 SC34 Subcommittee on Document Description and Processing Languages.  SC34 proposes to create 3 working groups.  One each for work on IS 29500 and IS 26300 and another for harmonization (Resolution 4 of March 2008 plenary meeting).  The DIN NIA Working Group has presented its approach for consideration in the harmonization work (1.7MB PDF file).  There is also an important effort to capture all comments and known defects in IS 29500 so they are preserved for the maintenance activity.  The next actions for establishment of IS 29500 maintenance and harmonization studies are expected at the SC34 Plenary meeting in Korea at the end of September, 2008.

5.2.3 OASIS Open Document Format for Office Applications (OpenDocument) TC

5.2.4 Ecma TC45 - Office Open XML Formats

5.3 Translation Projects

Some of these projects are dormant; the extent of completion and usable material has not been determined.  There are additional projects that remain to be identified.

5.3.1 OpenXML/ODF Translator Add-in for Office

5.3.2 Binary (doc, xls, ppt) to OpenXML Translator

5.3.3 Open XML to DAISY XML Translator M3 Beta

5.3.4 OpenOffice Filter to Microsoft Word XML

5.3.5 OpenOffice.org Writer Pre-Export Filter

5.3.6 UOF Converter for OpenOffice.org

5.4 Industry Initiatives

It appears to be quite easy to locate interoperability initiatives in which Microsoft is a participant or the sponsoring agent.   Industry initiatives are distinguished from advocacy efforts that do not involve pro-active achievement of interoperability arrangements, conformance testing, and other efforts to establish document interoperability (whether exclusively or as an in-scope focus).  Non Microsoft-centric industry initiatives are of interest too.

5.4.1 Interop Vendor Alliance

5.4.2 Document Interoperability Initiative

5.4.3 Interoperability Forum

 

5.5 Product SDKs and Import-Export Functions

Product SDKs provide opportunities for rapid construction of fixtures that work with a product-supported format.   Provisions for, and examples of import-export functions provide additional insight into ways for working with formats and potentially introducing harmonization via import-export or more-tightly integrated document processing.

5.6 Other Efforts

There are efforts under governmental and academic institutions.  Their relevance to (and requirements for) harmonization efforts will be assessed.  There are also individual contributors with information on blogs, web sites, and wikis.

6. References and Resources

[1] Microsoft.
Steve Ballmer, Ray Ozzie, Bob Muglia, Brad Smith: Press Conference Call on Microsoft's Strategic Changes in Technology and Business Practices to Expand Interoperability.  (transcript), PressPass -- Information for Journalists, microsoft.com, February 12, 2008.  Accessed on 2008-04-09-14:33 -0700 at <http://www.microsoft.com/presspass/press/2008/feb08/02-21ConCallTranscript.mspx>.
  
[2] Dennis E. Hamilton
ODF-OOXML: nfoWorks for Harmony?  (web log post), Professor von Clueless in the Blunder Dome, orcmid.com, 2008-02-07.  Accessed on  2008-04-09-14:41 -0700 at <http://orcmid.com/BlunderDome/clueless/2008/02/odf-ooxml-nfoworks-for-harmony.asp>.
  

Construction Structure (Hard Hat Area)
Creative Commons License You are navigating nfoWorks.
This work is licensed under a
Creative Commons Attribution 2.5 License.

0.09 2017-06-14 20:24 -0700