formats. Using Office 2003 and OpenOffice.org 2.4,
the OOXML and ODF formats took longer to load than the
native binary formats of the corresponding applications.
Oddly, even though Office 2003 uses a compatibility pack
to load the OOXML version of the test spreadsheet, it
loads that spreadsheet (which is larger than the
corresponding ODF on disk) faster than OpenOffice.org
loads the ODF version.
2008-05-15 Choosing Source Control for nfoWorks
where I intend to maintain code under development and
deliver nfoWorks libraries and fixtures,
SubVersion (SVN) source control system.
is also supported, but SVN has more appealing Windows
clients, such as
TortoiseSVN, and there are ways to use the interface
CodePlex if SourceForge becomes unviable.
Meanwhile, there are "distributed" source control systems that work
locally and can be published wholesale on web sites and
anyone else can control how they except changes from one
source-control set into another. In effect, they
are all forks and you can merge as you like. Linus
distributed source control system that he devised
originally and that he uses for development of the Linux
Kernel. Mercurial is another one, although my
leaning is toward Git.
Today, Jeremy D. Miller blogs that he
finally got git. Not only is Git extremely
useful off-line, it is extremely fast and the compressed
repositories are easy to ship around. I am linking
to Jeremy as a reminder that I have been pondering the
In my personal development model, I have reason to run Visual
Source Safe on my local systems. This has to do
with integration in on my IIS development server and
working well on my SOHO LAN. It works just fine.
I am prepared to work across source-control systems, taking a
latest-versions snapshop out of one system and checking
it in to another system. I can go from local VSS
to SubVersion on SourceForge that way. The
histories won't carry over. Instead, the
latest-versions snapshots essentially represent roll-ups
of changes from the last time a snapshot was checked-in.
I can live with that. This gets trickier when
multiple committers are working with the SourceForge
I could also check a latest-versions snapshot into a git for
archiving and as a different way of publishing
development trees on a web site as a cache and for
downloading by other developers. That git
repositories are single files is very useful, making
distribution by file sharing rather practical.
This also makes dealing with merges from multiple
committers easier to deal with, something that makes
Torvalds very happy with Git.
That's the thinking so far. I want to keep my options open.
The first big change will be when I learn to work with
the SourceForge SVN repository later this year.
2008-05-14 Does Document Translation Suck?
- Jesper Lund Stocholm has expressed
strong agreement (cache)
with Rob Weir's argument that
translation is never good enough and filtering to and
from an internal model is the way to bring different
formats in and out of a productivity application.
Lund Stocholm says "I have been trying to pitch my idea of
"document format channels" for some time now. The basic
idea is not to do translations between formats but to
support the feature sets of both formats in the major
applications." He links to an investigation of
conversions that he reported in
November, 2007 (cache).
It is not apparent what he has in mind that keeps the application
from allowing a feature that is not available in the
format of the document that is being worked on.
This is a contrast with the Harmony Principles, where
only features sets that work in both formats are
implemented. (That is a big simplification of a
complex situation, but it's
the essential idea behind nfoWorks.)
2008-05-13 Spreadsheet Loading/Filtering/Translation
- Rob Weir
some loading tests of spreadsheets in binary format,
But the horror story was the loading time of the ODF version into
Excel 2003 via the ODF Translator plug-in. This
took almost 10 minutes in contrast to the 14 seconds
Although this may tell us far more about the implementation of the
ODF translator, Weir argues that there is also
substantiation of the inherent limitation of translators
over filters (which go to an useful intermediate memory
structure rather than completely into another recorded
This is an useful experiment and we need to understand them better,
creating similar performance tests that apply to
The test files were obtained from a 2005
George Ou article (cache).
2008-05-12 Japanese Text Layout (via
There is a new
that provides important English-Language information on
the requirements for layout of Japanese text on
print-formatted pages. Based on a Japanese
standard (JIS X 4051), the developing document should
become a comprehensive resource on the proper layout of
Japanese text and other writing systems that share
Japanese layout conventions.
This sort of detail should be eye-opening for those who think
strictly in terms of left-to-right writing of texts
based on a Roman alphabet.
As an interoperability concern, I find conformance of layout to be
particularly daunting and something that it would be
great to be able to ignore in terms of harmonization of
document formats. That is, of course, not possible
if I am to take the Harmony Principles seriously.
There is an impact on the applications and how users see
the proper layout of what others and themselves have
written. I remain daunted.
Facebook FrontPage and Markup Preservation
(via Doug Mahugh). The Channel
interview with Terry Crowley (cache) touches on some
topics that are very relevant to document
interoperability and also the socio-political situations
around it. Crowley reflects on the problems that
FrontPage experienced as an impugned brand, how
difficult mark-up preservation is, how much was invested
in it, and how the resulting editor lives on as a
component of other products. The problem of brand
reputation is probably going to arise in OOXML-ODF (and
intra-OOXML, intra-ODF) interoperability (and I see
strenuous efforts to frame such a situation with
thoughtless disregard for the law of unintended
consequences). Crowley's remarks on the difficulty
of mark-up preservation in the face of model
incompatibilities are particularly relevant to the
nfoWorks objectives and satisfiability of the
2008-05-09 Usability and Interoperability:
Joannie Strangeland has posted a
blog entry (cache)
on how to have Microsoft Word create independent
tables-of-content within selected sections of a
document in addition to the higher-level overall ToC
for the document. The technique is illustrated in
terms of interactions with features of Microsoft Word
via the program's UI.
This is exactly the level that someone wanting to make
section-level ToCs needs to understand to accomplish
this in a document. I have wanted to be able to do
exactly this, so the mini-tutorial caught my eye at
once. It also illustrates a number of
considerations around cross-product/format document
First, there is no discussion of how this ends up being expressed
in OOXML. This is not, of course, the user's
Secondly, there is no discussion of how this might be handled in an
interchange and collaboration situation (even when the
collaborators are all using the same software product
Third, this particular magic is accomplished by a combination of
particular, independent features. It is easy to
see how the developers of another product might not have
done what is necessary to anticipate this case, even
when also accepting OOXML.
Fourth, the translation to a different document format (i.e., ODF)
of this particular document might only be possible by
resolving model incompatibilities. That is, what
is being done (and visible only through the OOXML being
interpreted) needs to be abstracted out enough so that
the same effect in a different model is achieved when it
is not achievable by some naive one-for-one feature
translation. This is the kind of thing that
involves global analysis, inference/backtracking
techniques, and recognition of particular use cases that
we tend to expect of expert systems using various
Fifth, the prospect of accomplishing the kind of transformation in
the fourth level in a round trip back to the original
format is near inconceivable.
Finally, I don't expect the Harmony Principles to have this kind of
reach in any near-term future. At the same time, I
wonder how easy it will be to detect this sort of thing
in a harmonizing document processor or will we be left
with ham-handed and still complex rules (i.e., no
multiple ToCs in documents) that are more complicated
than at the individual feature occurrence -- there must
be some contextual knowledge of other aspects of the
document in order to recognize a disharmony in a
benign-appearing local element.
So, here's already more food for thought than I thought this
interesting example would provide.
2008-05-09 Daisy Translators and OOXML
Dough Mahugh has
a number of links on the recently-announced completion
of the Daisy translator for Word. This is part of
the support for accessibility of Microsoft Office
programs. Oliver Bell's
2008-05-07 post (cache)
has additional information.
I hadn't realized that Daisy involves its own XML format and that
documents need to be translated to the Daisy XML.
This raises another interesting harmonization case and I
will have to look into it more deeply.
There is also Daisy translation for ODF and that will be an
interesting basis for comparison and broader
understanding of how all of this is intended to work.
2008-05-09 Microsoft's Office Labs
This may be a bit tangential.
the launching of the
Microsoft Office Labs site. Although the labs
develop trial-concept solutions that can be tried out by
a community within and without Microsoft, there is also
provision, through Community Clips, for contribution of
tips, how-to's, and demos from the public.
Depending on how this develops, it may also be a place
to demonstration interoperability-achieving techniques
for the Microsoft Office System.
2008-05-09 Obsessing About Cache Accession and
Every day I notice more feed items
that should be captured in this diary and ultimately
incorporated/reflected in content here. I went on
a cruise at the end of April and so now there is a
I have been operating under the obsession that the backlog should
be rolled forward from its oldest to the newest, so that
the accession numbers that I use on files (see the
cache catalog) will be
The "duhhh" moment is reminding myself that the whole principle
behind using accession numberings is the ease of
claiming current material and then filling-in
(accessioning, if you will) the backlog as time permits.
The idea is to close off the backlog so that more is not
created, and the backlog is cleaned up in any order that
works while staying current with the latest.
So, slap on the head and I will now get over that and start with
current and fill in the other backlog at my leisure.
The accession numbers don't have to be chronological
with the date of the source material, they are
chronological with the sequence of accession, and that
can be out of time sequence to overcome an obsessive
blockage (inappropriate perfectionism, perhaps).
I intend to switch from Hard Hat Beginnings to Hard Hat Era once I
eliminate the backlog and have more useful content in
the notes compilations. So we are extending into
2008-05-08 AIR for nfoWare Fixtures?
As part of joining Twitter, I wanted
a desktop Twitter widget so I didn't have to keep a
browser window open to see "tweets." All of
the popular ones for Windows seemed to be implemented
using the Adobe Integrated Runtime (AIR).
Spaz because it looked good from the web page, is
open source, and runs on Windows, OSX, and someday soon,
There are some glitches in Spaz, but I like the ease with which I was
able to install and setup. I also like the look of
the application. This makes AIR another
consideration for test applications, demonstrations, and
nfoWare fixtures along with
2008-05-08 ODFDOM Available
A Document Object Model implementation for ODF is
publicly available (cache).
There are JavaDoc specifications of the interface, a
Java 5 reference implementation, and source code of the
reference implementation. ODFDOM version 0.6 is
offered as a successor to the AODL and ODF4j packages
and is part of the ODF Toolkit.
One comment on the post asks for a lower-level implementation that
ports to Linux, as a libODF.
It is interesting that LGPL v3 is claimed as the license. I
could not find a license statement in a quick look at
the JavaDoc download, nor is there a license file
included in the runtime (.jar)
package. There is an XML comment in an
resource file that is from OpenOffice.org. The XML
comment mentions LGPL v3 and refers to an absent
file. It also does not indicate how to obtain the
source code of ODFDOM.
There is no license
information in the source-code package either.
This messiness is not going to have me looking at the
source code. The source code files do
include copyright and license notices at the top of each
file and at the top of some other text files, although
file is included.
I trust that later distributions will
be more methodical in how license information is
conveyed, especially with binaries.
Meanwhile, the availability of this model and a Java API for it is
extremely valuable as a basis for exploring the
prospects of a harmonized DOM.
2008-05-07 Standards Lawyering and Existence of
There was a lengthy round of blog
posts between Alex Brown and Rob Weir concerning whether
the RelaxNG Schema published in ISO/IEC DIS 26300 (ODF)
had any valid documents at all and whether the DIS 26300
specification (ODF version) or any other allegedly-ODF
document is/could-be a valid ODF document. It is not
entirely clear that the matter is resolved, depending as
it does on fine points of related specifications and on
what schema validation of an instance document is
intended to signify with regard to validity of the
document in hand. The problem is complicated by
the introduction of other standards by reference, some
of which (such as XML Schema) modify the interpretation
of other standards (XML itself, in the XML Schema case).
The debate strikes me as a demonstration of the kind of
back-and-forth that occurs in solidifying our
understanding of what a standard specification requires
and how implementers can disagree until interoperability
considerations require some mutual accommodation and
community-wide clarification of what the requirement is.
A specification may also be revised to eliminate the
particular misunderstanding, however it is resolved.
There will be further intense scrutinization of DIS 26300 (ODF) and
DIS 29500 (OOXML) separately and in terms of their
harmonization. Meanwhile, setting aside the
bickering and indignation that is expressed in the
exchange, the blog entries and their comments are worthy
of review for how issues of this kind are sorted out or
at least taken to a point where the dispute is abandoned
for now. This is also a demonstration of how
important public conversations and analyses are to the
strengthening of understanding and ultimately the
- Rob Weir:
Achieving the impossible (cache).
2008-05-07 post demonstrating executions of two
schema-validation programs using the ODF Schema and the
ODF Specification. This seems to be the end of the
discussion at this point.
2008-05-07 WebCite and Archiving of Referenced
is a system for preserving material at a given URL and
providing a permanent URL from which the material can be
retrieved even if the original material is later changed
or the URL becomes invalid.
This mechanism and its support by "dark mirrors" would be a valid
alternative to the private embargoed collections that I
include as part of the caching of cited material here.
This article explains the motivation, the mechanism that is used,
and the relationship to other approaches that are in
I'm not changing the cache approach for nfoWorks just
yet (one motivation being the preservation of an
off-line copy that I can always access). It is
useful to have WebCite for explaining the motivation for
the caching of material and also for the embargo of that
material for which redistribution has not been
2008-05-06 Backlog Activity
I have been at work on repaving of
the ODMA site and moving back into ODMdev work while I
also accumulate reference materials here. There
are some commonalities that I want to exploit and much
of the toolcraft that I'll use applies for ODMA,
nfoWorks and also nfoWare.
The material will seem to be rather scattered until
enough pattern emerges.
There is also a backlog of material since 2008-04-17 that merits
some diary entries. I am starting that today.
2008-05-06 What Does OAI Have To Do With nfoWorks?
The Open Archives Initiative has, as
one of its features, agreements on metadata, a format
for web-site catalogs of metadata, a protocol for
document discovery, and additional information on access
of archived content. There is a document
interoperability case lurking in OAI. There is an
useful description of the overall OAI situation, with
useful links, on
2008-05-06 Processing Streams of XML
Harry Pierson has two web posts on
processing XML in IronPython. Although .NET
libraries are discussed, these posts are valuable for
their appraisal of the different ways to process XML and
the adjustments Pierson made to accomplish a task of
Stream Processing XML in IronPython (cache)
Deserializing XML with IronPython (cache).
I'll be looking for more along these lines, with a
variety of programming platforms.
2008-05-06 Securing Open-Source Software
David Wheeler has built a
comprehensive presentation (cache)
on security considerations for open-source-software.
There is guidance on practices for establishing
confidence in treatment of security concerns.
These apply to nfoWorks libraries and
fixtures of all kinds. What Wheeler refers to as
the BSD-new license and those compatible with it are the
licenses that will be used for deliverables from
2008-05-06 Standalone Silverlight Applications
Miguel de Icaza
on a successful approach to running standalone
Silverlight applications using either .NET or Mono with
Moonlight (on Linux). The OSX version of Mono
doesn't have the necessary library at this point.
This is a way to do some rapid tooling of document fixtures and
tests where the .NET or Mono runtime is usable.
The application is essentially running an embedded
server that hosts Silverlight locally, it would appear.
Still, that's potentially useful. It should even
run Popfly applications, methinks.
I mention this to put down a marker for prototyping and
demonstration prospects. If I can minimize GUI
development as part of establishing nfoWorks
libraries and fixtures, I certainly will.
2008-04-17 Information on ODF and OpenOffice.org
I don't have a good place to park
this information yet, and I don't want to lose track of
blog provides regular accounts of development
activity for OpenOffice.org releases. There are
at a glance (cache)" posts that provide a snapshot of OO.o
development activities, including those related to
OpenDocument format features and planned synchronization
with (anticipated/pending/committee-approved) updates
and additions to the ODF specification. This is
where I go to learn that there is something new to find
on the OpenDocument TC site.
Michael Brauer (cache), currently the co-chair of the
OASIS OpenDocument TC (cache), contributes here.
Mathias Bauer, another OO.o developer, has just posted a great "ODF
Enhancements for OpenOffice.org (cache)" article that
describes how ODF changes make their way into OO.o, with
an interesting guide to how public suggestions make
their way into the ODF specification.
Weir (cache), the other OASIS OpenDocument TC co-chair,
follows-up on Mathias Bauer's post with further guidance
ODF Enhancements (cache)." There is a careful
explanation of the measures taken to assure that
suggestions are useable by OASIS and don't create
intellectual-property problems for the specification.
The information on ODF discussions and contributions is valuable to
have before contributing and exploring the available
ODF discussion lists.
2008-04-16 Licensing and Open-Source Preferences
deliverables are provided under open-source licenses.
The web-site materials are all provided under a Creative
Commons Attribution license. Software and its
source codes are provided under a BSD Template license.
This license is the open-source counterpart of the
Creative Commons Attribution license.
The idea is that so long as attribution requirements are satisfied
there are no copyright-based constraints on the use,
re-use, and derivative use of nfoWorks
deliverables. These licenses are compatible with
GPL, the GNU Public License (but not vice versa).
These licenses are compatible with the creation of
proprietary, closed-source software (but not vice
Attribution is important as part of the provenance and
accountability that is expected as part of
responsibly-built software that incorporates or derives
from nfoWorks deliverables. It is
important to identify the dependency so that any
announcements of defects and security flaws in any
version of nfoWorks deliverables can be
checked against the dependencies in works that
incorporate those deliverables. It is a simple
matter to always know the dependencies there are on
particular versions of other software, and to
demonstrate that by making the dependencies known in a
way that is available for inspection.
The same conditions are honored in any incorporation of other
software in nfoWorks deliverables.
Because the GPL is not compatible with the BSD Template license and
its brethren, examination of GPL'd source code is
avoided. Although there is no aversion to relying
on GPL's utilities and tools, GPL'd source code is not
touched except for the unlikely case of submitting
defect reports and repair suggestions back to the
authors. Proprietary, closed-source programs, even
if used in the nfoWorks laboratory, are
not redistributed and only freely-available software is
required for making use of nfoWorks source
Finally, these license considerations apply to copyright on
software. In the event that a patent covenant is
known to apply and might be unavailable for a different
use of the code, precautionary notices will be attached
to the software and incorporated in the source code.
This policy has been expressed in various forms from time to time.
It seems like a good idea to clear the air and summarize
the considerations in this one place.
2008-04-14 Rapid Development, Throw-Aways,
Refactoring and Resets
I am not very keen about throw-away
code. My inclination is to refine and refactor,
especially early on, and continue to grow around
well-defined but extensible (and properly-versioned)
interface agreements. That's sort of for an
of the kind I have in mind for nfoWorks.
There is a tension between the desire for rapid development, easy
experimentation, and having higher-level ways to use
components versus dealing with performance, code
footprint, and platform portability while traveling
light. I don't mind throw-away demonstrations and
exercises, and these are useful for others to see how to
start getting their arms around a technology, too.
I do mind having to do big resets and do-overs that lose
the benefits of progressive improvements and
knowledge-building. I don't mean to exclude
refactoring born from experience, but refactoring is on
code that is worthy of keeping. The tension I feel
around saying how it will go in advance will probably
dissipate when I get into the work, so I will now wait
to see how I can shape this in practice.
I think support for rapid use, and the use of higher-level
development stacks (Java, .NET, XML transformation
tools, etc.) is valuable. I'd even consider
EcmaScript and languages like Python in this picture.
These are great for demonstration, rapid trials, tests,
and even reference implementations.
At the same time, I envision progressive refinement of
lighter-weight, lower-level solutions that probably
involve C, C++, and binary COM interfaces, the latter as
a way to provide contracted interfaces that are easily
coordinated properly beneath high-level wrappers,
whether Java, .NET, or something else (Gnome, for all I
This reminds me of a conversation back in the 70's on a project
where we were inventing "middleware" four ourselves.
We started thinking of segregation into underware,
middleware, and outerware. I want to support all
three, possibly with different tool sets, and my
attention is mostly on the first two. It is
entirely understandable that users of nfoWorks
results will be working mainly in the third category.
2008-04-13 Classifying Sources and Resources
I think the last part of the first
"About nfoWorks" to be completed will be
the section on related work. I notice that I am a
little puzzled about those activities that are not
directly technical or that have some mix of advocacy and
technical that tips toward advocacy. Also, I tend
to favor those projects that deliver code that is
intended to be used and built-upon by others, with
secondary interest in ferreting out code that is buried
somewhere inside of a particular product, even though
I suspect this will not be so worrisome when I sit down and compile sources and resources, so that may be
the next nfoWare Note folio that I
2008-04-12 Conversing on the Interoperability Forums
Interoperability Forum (cache) opened-up as promised on
March 20. It consists of
three MSDN Forums: Interoperability Conversations,
Technical Interoperability Scenarios, and Achieving
Interoperability through Standards.
There seems to be some sort of birth pangs, and I am not clear why
there's not much discussion. It may be the
setting, it could be something that people have about
Microsoft, and it could just be that people who are
seriously interested in interoperability (at least two
of us) haven't found their way here. There are
I would like to see these forums thrive and be lively, but it may
depend on who is willing to speak up and find value in
forum-based conversations. -- dh
2008-04-11 About nfoWorks and Harmony
A draft "About
nfoWorks" page is now available on the
site and linked from the home page. This provides
a new (0.1 beta) explanation of Harmony Principles and
more about how the work of nfoWorks will
proceed. We are close to a turnover point for
going full-bore turtle into hard-hat, technical
hunter-gathering activity. The
2008-03-30 diary entry
on Ramping Up still applies; I need to give more
attention to those other prerequisites now.
Today, there are new blog posts on recent events and the current
state of nfoWorks. If you have
comments about anything you see here, those posts are a
good place to leave them:
OOXML + ODF: ISO Steps In, Orcmid's Lair,
nfoWorks: What Are those Harmony Principles, Again?
Professor von Clueless in the Blunder Dome,
2008-04-10 Rick Jelliffe Perspective on SC34
Rick Jelliffe has a great perspective
on the just-concluded SC34 plenary meeting.
Because he wasn't there, he offers it in a "Fake
blog from SC34 meeting in Norway (cache)." Jelliffe
offers up some important items to hold onto here:
1. The main
SC34 web site (hosted in Japan by the SC34
2. The page for accessing public
SC34 documents (rewards exploration)
3. A reminder that TrueType is
connected with the ISO/IEC Open Type standard,
maintained in SC34 (and relevant for nfoWorks)
with a hidden reminder that getting Asian scripts right
is probably one of the best demonstrations of
4. An useful sketch of SC34's
interests and responsibilities
5. Another reminder of my own
armchair critic status, something I am working to alter
6. A discussion of the criticality of
accessibility considerations and the resources that
apply in the work of SC34; a topic that it will be
essential to address with regard to harmonization
7. An injunction to become involved
and where to do that (OASIS, W3C, Ecma TC45, the
national mirror of SC34 in your neighborhood, etc.)
8. Links to the DIN NIA-34 update on
the harmonization investigation (PDF
cache), great work that nfoWorks should
9. An interesting side comment about
the use of topic maps to present ODF-OOXML mappings
(although DIN is focused on translations, not mappings,
because of a number of issues that translation surfaces,
including round-trip degradation in collaboration
10. Another side comment on how the
concern for synchronizing ECMA versions and SC34
versions of OOXML might be extended to the case of OASIS
and ODF as well.
2008-04-09 ISO/IEC SC34 Takes Over OOXML
ISO/IEC Joint Technical Committee 1
(JTC1) Subcommittee 34 (SC34) held a plenary meeting in
Oslo, Norway on April 5-9. Alex Brown provides a
comprehensive report (cache).
SC34 proposes to create ongoing activities to carry out its
1. IS 29500 (OOXML) maintenance
2. IS 26300 (ODF) maintenance
(pending OASIS agreement)
3. Harmonization (with a proposed
work-item expected from the DIN NIA activity)
To start things off, two ad hoc working groups have been
The first ad hoc working group will propose how IS 29500
maintenance should proceed, producing a proposal by
2008-09-01, one month prior to the next SC34 meeting.
This ad hoc group is chaired by Alex Brown who
will lead a two-day meeting in London this July.
Participation is from SC34 member bodies and I take it
that ECMA TC45 members are invited to chime in.
The second ad hoc working group is being created to capture
technical comments on IS 29500 and make sure existing
analysis is not lost. Within 90 days (by July 2)
there will be a mechanism in operation "to compile a
list of comments on ISO/IEC 29500 received from NBs,
liaisons, and the general public" and then to "publish
the on-going list as an open document on the SC 34
resolutions from the meeting (cache), I note that the final
text of DIS 29500 has already been created. SC 34
requests distribution to its members no later than May
1. I don't know what the delay will be before
publication as IS 29500:2008 happens, and I'll beg a
copy of the final DIS 29500 before that just to make
sure I don't step into some element of harmonization
that is impacted by BRM-approved changes (especially the
various conformance statements that are new in the final
text). Also, to make any contributions to
identification of defects, it is important to reference
the most-authoritative available documents.
Here's what it looks like for intercepting DIS/IS 29500 activity:
1. Usable final text available in May
for provisional use (if it can be obtained) until
official IS 29500 editions are issued
2. Mechanism for receiving defects
and related comments on IS 29500 operating in July.
3. In September, 2008, SC 34 meets in
Korea and takes next steps, with meetings every six
months (figure March 2009 in Prague, September 2009 in
U.S., then 2010 meetings in Sweden, then South Africa).
Working groups that will be doing the technical work are yet to be
set up and they will have their own meetings, conference
calls, and mailing lists as well as ones synchronized
(I have a current passport with lots of room for visa stamps.
Now I just need a sponsor for expenses/subsistence and a
national body to nominate me to a committee. Hint,
2008-04-08 Featuring OOXML
Today, Microsoft announced an OOXML initiative in
the public edition of its Registered Partner newsletter.
(If you have a business that involves reliance on
Microsoft platforms, it is not particularly difficult to
be a Registered Partner. The first thing you have
to get over is any reluctance about having a Microsoft
Passport -- Windows Live ID -- account and providing
some information about your business relationship to
This comes under the umbrella of supporting
organizational activities that intersect with standards
for document formats. This entry is a placeholder
before there is organized material on those activities.
When the time comes, I will need to draw some fine line
between pure-advocacy activities versus constructive
development and adoption support for interoperability
The Microsoft Partner Program is oriented to business
management, particularly marketing and sales, whether
for integrators, resellers, or software developers. You
will sense that in the focus on building business and
the Microsoft encouragement to promote and resell
Microsoft licenses of various kinds as well as add your
own value. The major vendors have programs like
this, including ones to attract Independent Software
Here is, as they say, the
money quote on OOXML:
Focus is on addition of value via composite,
collaborative applications that integrate with the
Microsoft Office System as a platform. Featuring
OOXML involves some specific support for the format, its
automatic use, and, desirably, reliance on
custom-content and niceties for all of this that are
part of Office 2007. There is a "Featuring OOXML"
mark being encouraged for adoption in product materials
2008-04-07 Standards of Fidelity
Chasing after the Harmony Principles
is going to raise questions about fidelity of document
interoperability and the conditions for degrees of
fidelity, whatever that will come to mean.
Although I don't envision much direct intersection with
nfoWorks, there are some interoperability
considerations that may be of concern to this particular
community (or not). The following come to mind:
1. One way to introduce Harmony Principles and reliance on
interoperability profiles (whatever those turn out to
be) might well be with the addition of add-ons to the
Microsoft Office System, exploiting the hooks for
integration of alternative formats. It's too early
to register anything like that.
2. Just the same, it would be interesting to see what the
single-slide featuring of OOXML in a harmonious offering
might look like.
3. How much, I wonder, will there be blurring of what the Office
System, a specific software suite, supports for
integration of OOXML solutions, and what there is about
the strict reliance on what the format supports in an
interchange situation. (The same questions applies
to the OpenOffice.org provision for plug-ins and
supporting either OOXML or ODF. I presume that the
IBM product is yet-another potential interoperability
I predict that Office Business Applications (OBA among friends)
featuring OOXML will thrive even more than they already
are. We will learn much about entropy and the
erosion of standards-observance for preservation and
interoperability in how this plays out.
I imagine that one measure of harmony will involve consistent
presentation rendering of documents by allegedly
harmonious processors. The "renderance
example on Dean Allen's
Textism blog is an example of the not-always-subtle
practical difficulty of determining rendering fidelity
across platforms and processors. There is much to
grapple with in terms of specified behavior and in
terms of implementation glitches, identifying the
deviants in the game, and profiling around the pot
My first thought is that the ability to produce identically-rendered
PDF output is one kind of test that might be
mechanically verifiable, bolstered with some
experience-based, craftily-composed test cases.
This is a second-order kind of fidelity, tied to
printing models and their pipelines, and we still have
to deal with differences of screen appearance even when
printing seems to come out "right" (i.e.,
harmonious-enough). But PDF, now itself
standardized, seems like the most viable stake in the
ground for the moment. (Ultimately, we might throw
a standardized XPS into the mix as a kind of honesty check between
available final-form renditions and how harmonious
products achieve agreeable fidelity.)
This is not an easy problem and it won't be swallowed whole.
There will be serious temperance through reliance on an
extensive progression of the least things that could
possibly work at each stage. I think that applies
to the strictest fidelity that can possibly be verified,
2008-04-06 Correlating the Standards
It is necessary to track the
individual editions of standards under each standards
authority. This is because standards often
reference specific editions of other standards.
Also, standards show up under more than one
For example, ECMA-376 for Office Open XML File Formats was the
same as DIS 29500, the draft that was proposed for
standardization under ISO/IEC JTC1. But IS 29500
will be a very different beast, altered from ECMA-376 as
a result of the Ballot Resolution Meeting that preceded
achievement of approval.
As another example, the OASIS Standard for Open Document Format
(OpenDocument) v1.0 of 2005-05-01 was
submitted as DIS 26300 and approved via the PAS process
at ISO/IEC JTC1. But the issued ISO/IEC standard,
IS 26300:2006, is the OASIS Standard for Open
Document Format (OpenDocument) v1.0 (Second Edition)
Committee Specification 1 of 2006-07-19.
To illustrate the complexity of the kinds of meanderings that
occur, ANSI/X9 X9.100-181-2007 Specifications for
TIFF Image Format for Image Interchange is a special
TIFF for the exchange of the images of bank checks
among financial institutions. The images are in
Group 4 bi-level encoding and the TIFF 6.0 specification
is referenced in the abstract. That makes it sound
something like TIFF/F, which was standardized by CCITT.
I don't know what the references in X9.100-181 are, but
to my knowledge TIFF 6.0 is not under the management of
any standards organization. The specification was
issued on June 3, 1992 by Aldus Corporation, which held
the copyright until it was inherited by Adobe in a
subsequent acquisition. [The Adobe rebranding of
the document preserves the technical content and even
the cover date, but the front-matter is modified,
including elimination of the names of external
contributors, a matter of some personal interest. --dh].
So that dependencies can be tracked and the correct materials
understood for references from other specifications, I
propose to catalog and capture specifications in the
1. There will be a separate sequence of pages (a folio in
the nfoWare organization of web materials)
for the progression of specification versions under a
single authority (e.g., OASIS, ISO/IEC JTC1, Ecma
International, IETF, W3C, consortia, and proprietary
authorities such as Sun Microsystems and Adobe).
The dependencies can be cross-referenced among the
different sequences for specifications of interest for
2. There are standards development activities that one might want
to track for relevant background. It is not of
interest for nfoWare to provide a
historical account. Interest is in the
authoritative editions of specifications. However,
the development activities may provide important
resources for questions, discussions of points of
concern and clarifications that may be important to
achievement of harmonious interoperability. [added
2008-04-07: We certainly want to know about errata and
there may be
current items of work that need to be looked at with
anticipation.] A separate folio is
used for any tracking of an individual standard
development activity, contact information, availability
of archives and discussions, etc.
3. Available resources for implementations of a given
specification, including test suites,
conformance-verification tools, samples, reference
implementations, and translation/conversion aids will be
catalogued in one or more separate folios, probably by
platform as well as particular standard.
This is still sketchy. We'll try it out first with ODF
materials and then with OOXML materials and refine
things as we go. --dh
2008-04-05 Pouring the Foundation
I am looking at an upward-from the
bottom organization of materials and tools. This
means I am looking at processing functions more than
It is my usual inclination to not worry about the
user-interaction level, if ever, until after the
machinery for the process (the model, not the view, one
might say) is established to an useful degree.
Here are some of the layers that are at the foundation
for the nfoWorks effort:
1. File streams and their manipulation at the octet-data, binary
2. Containers, such as Zip files, with their items, directories,
and the various uses of compression, digital checks and
signatures, and encryption
3. Character encodings using single- and double-byte encodings,
4. XML and its processing and transformation technologies
5. Other, similar technologies and formats that are employed with
documents but are not confined to documents (e.g., image
These are not directly about ODF or OOXML. They
are relied upon instrumentally, but they are not a
central focus of nfoWorks.
They are, however, relevant more generally as
nfoWare (and DMware, in the case of XML).
Attention to foundation elements will be jointly covered in a way
that the use on behalf of document harmonization is
featured here, with the general treatment under
One area I am not clear
on has to do with diving into the standards and
specifications for these as a consequence of their
incorporation by reference (and restrictive profile, or
not) in the specifications for ODF, OOXML, and other
document technologies that matter for nfoWorks.
2008-04-04 Pondering the Accumulation of Clippings
With my attention on all of the
elements that go into realization of Harmony Principles,
I have started noticing way too many
potentially-relevant sources and links in my web log
feeds. I've jotted down 21 in my notebook since
I could use de.licio.us, but that actually takes longer, needing to
leave my Outlook list of unread posts to go to the
actual web pages. Also, that doesn't get the
information closer to wear it belongs, which is
somewhere here on nfoWare.
I could put a de.licio.us feed here, and I might do that, but it
isn't a way to capture the material. I need a way
to capture, here on the site, and also have a way to
publish it as a feed for anyone who cares. (It
would be useful to do that with this diary too, and I
have been short-sighted about that.)
I could also finally bite the bullet and put up a wiki. The
hosting service for nfoWare has a setup for MediaWiki,
my favorite. I will consider that, especially as a
way to be more inviting for community participation.
Thinking about it right now, I do fancy the idea of editing
directly into an RSS feed so that it is web-presentable
and can be subscribed to.
There are more cases to consider, including use of AtomPub,
applying Windows Live Writer to the task, and so on.
My pending infrastructure update and use of a Windows
Home Server may also provide further opportunities.
There may be an application of social software here.
Still pondering ... --dh
2008-04-03 Sources of Published Standards
There are different sources to watch
for published standards. For those specifications
that are sold by various national bodies and
non-governmental groups, such as ANSI and ISO, there are
I have learned that Thomson Scientific operates
TechStreet, a "World Standards Marketplace"
promising access to the world's largest collection of
industry codes and standards.
Techstreet is apparently a reseller. I am not that thrilled
about purchasing standards in this way, but the
Techstreet site and their e-mail newsletters do provide
useful information about newly-published standards
around the world. Featured recently,
ANSI/X9 X9.100-181-2007 Specifications for TIFF
Image Format for Image Exchange. This covers a
specific application of TIFF used in the exchange of
check images among financial institutions.
ISO/IEC 12207:2008 and 15288:2008 Set: Systems and
Software Engineering - Life Cycle Processes
There are ones that will be pertinent to nfoWorks, and I'll need to
find much more economical access to those.
TechStreet is an useful way to learn of specifications
as they become available. --dh
Defects and Analyses
There are a number of sources of
analysis into the defects of public formats.
Translator projects identify issues in the
specifications and also in the ways that conflicting
implementations arise. I will be tracking and
contributing to the public activities of this kind.
Opponents of OOXML have extensive compilations of defects. I
am not interested in the attitude and point-of-view that
leads to these materials, but I do want to keep an eye
on them as pot holes to look out for on the highway to
Harmony. Here are a representative few that I have
in my collection. Their occurrence here does not
mean that I concur with the claims about defects nor
with the significance of the alleged defects:
1. Reuven Lerner:
OOXML: Why Is It Bad, and What Can We Do? Blog
entry, OStatic, 2008-04-02 (cache)
2. Rob Weir:
OOXML's (Out of) Control Characters. An
Antic Disposition (web log), 2008-03-24 (cache)
3. Rob Weir:
How Many Defects Remain in OOXML? An Antic
Disposition (web log), 2008-03-18 (cache)
2008-04-02 Model for OOXML Development Resources
On March 31, Doug Mahugh posted his
guide to "Open
XML Resources for Developers. (cache)" This is a great
compilation of the kinds of materials that I want to
look for in regard to the other standards that support
document interoperability, including ODF, PDF, and the
additional standards for various document constituents
(TIFF, SVG, MathML, etc.). This, along with
guidance to various open-source implementations, is a
great format for me to follow in compilations here. --dh.
[update 2008-04-03: Erika Ehrli's April 2 "Happy
News for Open XML Developers (cache)" provides an additional
list of resources and links, including ones not listed
ISO Approves DIS 29500 (OOXML) as International Standard
ISO announcement (cache) of the conclusion of DIS 29500
balloting after the Ballot Resolution Meeting (BRM) and
reconsiderations by national bodies was made generally
available today. DIS 29500-II (incorporating
changes agreed at the BRM) will become ISO/IEC
International Standard 29500:2008.
If there are no formal appeals from any national bodies in the next
two months, we should expect to see availability of the
IS 29500:2008 some time this Summer, providing a clean
version with all approved changes.
In the meantime, nfoWorks will rely on ECMA-376, the
ECMA Standard for Office Open XML File Formats.
I'll keep an eye out for areas that are likely changed
in IS 29500. There is much to do before that
becomes a serious concern. --dh
2008-04-01 Being Stopped
I noticed, in "The
Unbearable Overwhelm of Technical Debt (cache)" that I was
being stopped around all of the incomplete efforts that
I am juggling, including all the ones that I have
For here, the sticking point was the unfinished repaving. I
also noticed that I needed it to become April, so I
could easily create more folios on nfoWorks
technical matters. There was a silly obstacle.
In my web-site development methodology, it is easy to
use pages in one folder as boilerplate for pages in
another folder. It is harder to use pages in the
same folder as boilerplate for more pages. Now
that I am starting the notes folder for April 2008, I
can use material from the March 2008 folder. It is
Of the dependencies noticed in my
there is one more of an infrastructure and plumbing
nature: I am also going to upgrade the SOHO development
systems and add a shared server for backing up and then
archiving all nfoWorks and other
I am now more empowered to start collecting materials and
cataloging the resources needed to begin Harmony
and Earlier Diary Entries