The Harmony Initiative provides focus on interoperability of
document formats and their processing within the general
nfoWorks efforts. The Initiative steers nfoWorks
toward achievement of interoperability in the cross-employment
of heterogeneous document
formats.
We're also announcing a
document interoperability initiative to ensure that
the documents that are created by users are fully
exchangeable, regardless of the tools that they are
using.
-- Bob Muglia,
Senior Vice President, Server and Tools Business,
Microsoft Corporation, February 21, 2008 [1]
The Initiative explores just how well documents can
be made fully exchangeable, with particular attention on how
that is done with a mix of different
OpenDocument and Office Open XML format implementations.
The first question: What prerequisites and restraints
must be satisfied to ensure that documents are fully
exchangeable and users can be confident that is the case?
What tools can assist in the appraisal of the interoperable
exchange of individual documents.
The next question: Is there enough harmonization for
users to willingly create, collaborate, and preserve their work
using only harmonious features of documents?
1. Harmony
Principles (0.1 beta)
First sketched on February 7, 2008 [2], the
Harmony Principles govern software tools and products that accept
harmonious versions of standard document formats and faithfully
produce harmonious versions in any of those formats.
1.1
Conditions of Satisfaction
1.1.1
Users can be confident their creations honor the format standards
and depend only on their harmonious features. The Harmony
Principles are honored by default.
1.1.2
Users can be confident that documents confined to a particular
profile of harmonious features can be interchanged and interoperated
with via any software programs that honor the Harmony Principles for
the class.
1.1.3 In
the event that a document relies on features beyond the harmonious
level supported by a software product, profile-allowed limitation to
supported features is explicit, automatic, and user-understandable.
1.2 The Principles
1.2.1
Interoperability Classes: Applicability. The Harmony
Principles apply where there are standard formats intended to carry
electronic-document information of essentially the same nature and
having compatible needs for fidelity. These are natural
categories for interoperability.
For example, the word processing documents of
Office Open XML format (OOXML) are in an interoperability class with
the text documents of OpenDocument format (ODF). One would not
expect to find Rich Text Format (RTF, a format under a private
authority) or SGML (an open standard for a different approach to
documents) in this class. It is conceivable that there may be
future additions to a class (e.g., DocBook, a specific type of
SGML/XML document that might be amenable to harmonization profiling
and translation) or, more likely, provision of import and export
translators to formats outside of the class. Spreadsheet
documents would be in a different interoperability class.
1.2.2
Standards Compliant. Only open-standard formats are
supported in an interoperability class.
1.2.3 Always
Harmonious. No features of a standard format are
supported that are not perfectly and recognizably represented in
all of the formats of the class. Harmonious features are
accurately expressible in any of the formats. Unsupported
features encountered in input documents are suppressed in a graceful
way; an understandable account is provided.
1.2.4
Specifically Interoperable. Standards and their harmonious
features evolve; so do individual software implementations.
Programs and their users can establish profiles that limit the
(versions of) harmonious features relied on in particular documents.
Programs will conform their employment of features to the
requirements of such profiles.
1.3 Becoming Definite
These principles are abstract and indefinite in many ways.
Further development will provide definite characterizations,
especially for the various qualities to be satisfied by harmonious
documents.
1.3.1
Identification of Interoperability Classes. The initial
interoperability classes will be more-sharply defined.
1.3.2
Identification of Open-Standard Formats. The qualification
of public, open standards will need to be tightened. The
preference is for free availability of an ISO International Standard
specification (including ISO/IEC ones). Other cases will be by
exception. Specific criteria will be required.
1.3.3
Specified, Measurable Fidelity and Interoperability. There
are different kinds and degrees of faithfulness and suitability in a
given context. The notions of fidelity and harmonization will
need to be made definite. The simply-stated goal is exact
matching of content, interpretation, and presentation.
Context, qualifications, and explicit measures are required.
1.3.4
Profile Verification. There must be definite ways to
verify the conformance of an electronic document to the standard for
its format and also to a prescribed profile of harmonious features.
1.3.5
Degradation of Excluded Features. Graceful degradation of
excluded features must be defined, accounting for how users are to
understand and to influence what happens.
1.3.6 How
Much Harmony Is Enough? We do not know what the threshold
(or thresholds) might be under important usage conditions.
Exploration starts with seeing how few harmonious features are
enough to be useful for anything, expanding until there's an
acceptable minimal set for for one or more categories of usage.
The value of additional levels needs to be assessed and the levels
well-defined. The impact of look and feel and user experience
will need to be understood and addressed if it is found to be a
critical factor. There are other external conditions that may
also have to be considered.
2. Deliverables
All deliverables for nfoWorks are provided under
open-source license and made freely available for download and use.
Source code and related development materials will be published and
tracked on a public open-source development site. Initial
research activity will identify the opportunities to reuse and
contribute to existing work. Unique nfoWorks
deliverables will address the specific achievement of the Harmony
Principles.
2.1
Software Libraries, Utilities, Tests, and Reference Implementations
2.1.1
Existing Freely-Available Materials. Existing materials
will be identified, along with guidance for obtaining them from
authoritative sources. Some materials will be mirrored on the
nfoWare site as a convenience and for reference. It is
not intended that nfoWare provide general
redistribution of existing material.
2.1.2
Building Libraries. Libraries will be founded on native
code (in C and C++ source code) that can be ported between different
operating-system and hardware platforms. The Library APIs and
interfaces will be amenable to integration into higher-level
libraries that deliver frameworks for use in Java, .NET, Python and
other programming and integration models that support native-layer "interop."
All libraries will be constructed with and useable with
freely-available tools and compiler systems.
2.1.3
Utilities, Test Programs, and Tests. Primarily designed
for command-line and batch operation, utilities and test programs
will be developed in appropriate higher-level languages when
possible. Test data, sample documents, and stress cases will
be provided as they are developed.
2.1.4
Samples and Reference Implementations. Samples and
skeletal applications will be developed and provided to demonstrate
use and employment of the Harmony Principles and
nfoWorks libraries. A reference implementation for a
document-processing desktop application may be considered.
Performance, usability, and general fit and finish is not required
to be at a level of quality or support required of end-user
productivity software.
2.1.5
Reusability, Not Product, Ambitions. The ambition for
nfoWorks is to provide libraries and utilities of a quality
level that has them be valuable and attractive for use in
closed-source and open-source products where realization of Harmony
Principles is important. The
nfoWorks deliverables are meant to be a source of
consistency for document interoperability. It is more
important to encourage and support adoption in products than
engaging in direct product delivery. Community adoption and
participation is preferred.
2.2
Analysis and Guidance
2.2.1
Determination of Harmonization. An important output of
nfoWorks activity is documented analysis of the selected
specifications and the way that features are partially or entirely
harmonizable or excluded from harmonization (whether as a temporary
expedient or until standards-development provides better
resolution).
2.2.2
Harmonization Guidance. The nfoWorks
experience will lead to guidance on how to safely navigate the
feature sets of different standards and their major implementations.
The analysis and guidance will be used in recommendations concerning
profile agreements and in suggestions to standards-development
projects conducting maintenance on standards.
2.2.3
Harmonization of Harmonization. Wherever possible,
nfoWorks will be aligned with other harmonization activities
and producing collaborative results, whether at
nfoWorks or other readily-accessible location, is primary.
2.3
Documentation and Specifications
2.3.1
Specified Protocols, Interfaces, and Behavior. The
nfoWorks libraries and profile guidance will be supported by
careful specifications that can be confirmed by inspection and
tests.
2.3.2
Documentation for Usage and Adaptation. Documentation sets
will provide information and examples of usage of the libraries and
utilities. There will also be documentation to support
adaptation of nfoWorks software for customization and
extension.
3. Incremental
Development
To provide steady progress and definite results, an increment
approach will be applied. This involves iterations of
additions and expanded functionality to a software base that is
always working, whether or not considered particularly useful.
3.1 The Least that Can
Possibly Work
3.1.1
Starting with the first iteration, software deliverables can be
built and deployed. The software will perform a complete,
end-to-end process, no matter how rudimentary the features may be.
The idea is to demonstrate a simple harmonization case and then
expand the set of demonstrably-harmonized features.
3.1.2 At
every iteration, the least that is needed to provide some minimal
feature set or feature expansion will be introduced and tested.
3.1.3
Intermediate results may be incorporated in specialized
document-processing software and utilities as further demonstration
of usability. The purpose is to build confidence in the
operation of the software and encourage its adaptation to practical
purposes. The primary effort will be toward increasing the set
of harmonious feature implementations and translations.
3.2
Availability of Tools, Test, Software and Experience
3.2.1
The material and the results of each iteration are made available in
development folders of this site.
3.2.2
The code base, all changes, and downloadable results are maintained
as open-source software and kept available and current on an
open-source project site with full source-code control and
bug-tracking support.
3.2.3
Everything needed to reproduce the construction and confirmation of
software and tests is provided. Those wanting to contribute to
the tools or make specialized versions of their own can begin by
replicating the construction of the appropriate version.
4. Start-Up Activities
Initial activities focus on gathering information and resources
that are needed for the commencement of analysis and experiments.
When a starter set of initial materials has accumulated, new
activities can start in parallel. Research and collection of
information and resources will continue.
The results of these activities will be observable in the growth
of the nfoWorks Notes
Catalog.
4.1
Gathering of Specifications, Analyses and Sources
4.1.1
nfoWorks notes will provide references to the sources and
the specifications of relevant standards. There will also be
cache's of the material for preservation and reference in
nfoWorks activities. Only freely-redistributable ones
will be accessible on nfoWorks.
Instructions will be provided for independently obtaining any of
those materials that remain publicly available, whether free or for
sale.
4.1.2
Related analysis efforts will also be catalogued and tracked in
nfoWorks notes. The notes will provide information on
participating in the related work and on obtaining available
materials. Some materials may be cached on
nfoWorks as well. The efforts associated with
standards development are explored first, followed by relevant
privately-conducted but public efforts.
4.2
Collection of Usable Software, Documentation, and Examples
4.2.1 We
are interested in freely-available software provided for working
with standard formats, including software for translating to and
from other (standard) formats.
4.2.2
The initial effort consists of cataloging what is available and
identifying how it is obtained. Software that is free to use
and redistribute without limitation is preferred. Source code
that has no limitation on derivative works and their licensing is
desired when that code is usefully adaptable for
nfoWorks use.
4.3 Collection of
Supporting Tools and Utilities
4.3.1
Tools and utilities for building nfoWorks
software and tests are cataloged and collected.
4.3.2
When the tools and utilities are of general use and useable for more
than nfoWorks, the collected software and supporting
documentation may be hosted elsewhere. Notes at nfoWorks
will link to the general information and relate it to the specific
use in
nfoWorks projects.
4.3.3 It
is required that all of the software-development projects for
nfoWorks be freely reproducible, with or without
modification. All tools and utilities used will be ones that
are freely available and having no limitation on their use.
4.4
Overlap with Other Activities
4.4.1
Some of these efforts are ongoing and will continue beyond the
commencement of analysis and experimentation efforts.
4.4.2
Other effort can proceed once there is a "starter set" of the
initial resources.
4.4.3
There are also non-nfoWorks efforts underway, and
these will impact the pace of nfoWorks development.
For now, nfoWorks is moving along in a
leisurely pace.
5. Related Work
5.1
Availability and License Considerations
5.1.1
Preference is given to related work that is available to the public.
Works that are freely-available are the first choice and the ideal
case consists of material under a Creative Commons Attribution
license, or equivalent. (Public domain works with a known
authorship will be treated the same.)
5.1.2
For software projects that provide source code, that code will be
relied upon only if it is furnished under a license compatible with
the Apache License 2.0 furnished for
nfoWorks software deliverables.
5.1.3
Software tools and utilities having reciprocal licenses (e.g., the
GNU Public License, GPL) will be used and redistributed only in binary
form without modification.
5.2 Standards
Development
5.2.1
DIN NIA Working Group on Translation 29500-26300. This
working group of the DIN (German Standards National Body) mirror of
ISO/IEC JTC1 SC34 proposes to identify the differences in IS 29500 (OOXML)
and IS 26300 (OpenDocument) that need to be understood to accomplish
harmonization and interoperability. An initial working paper
is available. This work will also track the existing
translation projects.
5.2.2
ISO/IEC
JTC1 SC34 Subcommittee on Document Description and Processing
Languages. SC34 proposes to create 3 working groups.
One each for work on IS 29500 and IS 26300 and another for
harmonization (Resolution 4 of
March 2008 plenary meeting). The DIN NIA Working Group has
presented its approach for consideration in the harmonization work
(1.7MB
PDF file). There is also an important effort to capture
all comments and known defects in IS 29500 so they are preserved for
the maintenance activity. The next actions for establishment
of IS 29500 maintenance and harmonization studies are expected at
the SC34 Plenary meeting in Korea at the end of September, 2008.
5.2.3
OASIS Open Document Format for Office Applications (OpenDocument) TC
5.2.4
Ecma TC45 - Office Open XML Formats
5.3 Translation
Projects
Some of these projects are dormant; the extent of completion and
usable material has not been determined. There are additional
projects that remain to be identified.
5.3.1
OpenXML/ODF Translator Add-in for Office
5.3.2
Binary (doc, xls, ppt) to OpenXML Translator
5.3.3
Open XML to
DAISY XML Translator M3 Beta
5.3.4
OpenOffice
Filter to Microsoft Word XML
5.3.5
OpenOffice.org Writer Pre-Export Filter
5.3.6
UOF
Converter for OpenOffice.org
5.4 Industry
Initiatives
It appears to be quite easy to locate interoperability
initiatives in which Microsoft is a participant or the sponsoring
agent. Industry initiatives are distinguished from
advocacy efforts that do not involve pro-active achievement of
interoperability arrangements, conformance testing, and other
efforts to establish document interoperability (whether exclusively
or as an in-scope focus). Non Microsoft-centric industry
initiatives are of interest too.
5.4.1
Interop Vendor Alliance
5.4.2
Document Interoperability Initiative
5.4.3
Interoperability Forum
5.5 Product SDKs and
Import-Export Functions
Product SDKs provide opportunities for rapid construction of
fixtures that work with a product-supported format.
Provisions for, and examples of import-export functions provide
additional insight into ways for working with formats and
potentially introducing harmonization via import-export or
more-tightly integrated document processing.
5.6 Other Efforts
There are efforts under governmental and academic institutions.
Their relevance to (and requirements for) harmonization efforts will
be assessed. There are also individual contributors with
information on blogs, web sites, and wikis.
6. References and
Resources
- [1]
Microsoft.
- Steve Ballmer, Ray Ozzie, Bob Muglia, Brad Smith: Press
Conference Call on Microsoft's Strategic Changes in Technology and
Business Practices to Expand Interoperability. (transcript),
PressPass -- Information for Journalists,
microsoft.com, February 12, 2008. Accessed on
2008-04-09-14:33 -0700 at <http://www.microsoft.com/presspass/press/2008/feb08/02-21ConCallTranscript.mspx>.
- [2]
Dennis E. Hamilton
- ODF-OOXML: nfoWorks for Harmony? (web log post),
Professor von Clueless in the Blunder Dome, orcmid.com,
2008-02-07. Accessed on 2008-04-09-14:41 -0700 at <http://orcmid.com/BlunderDome/clueless/2008/02/odf-ooxml-nfoworks-for-harmony.asp>.