Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera of the OpenDocument Fellowship have provide this article for us, comparing OpenDocument Format and Microsoft's new MS XML format technically, not legally. Groklaw will be doing that separately, but this article addresses interoperability. That is the point of XML, after all, is it not?
Format comparison between ODF and MS XML
OpenDocument: OpenDocument Format (.odt)
MS XML: Microsoft Office Open XML (.docx)
There has been a lot of attention to the legal encumbrances in
Microsoft's new MS XML format. In this article we'll look at the
technical side, and try to show you how the design of these formats
affect interoperability. After all, that is the purpose of open
OpenDocument benefits from 5 years of development involving many
experts from diverse backgrounds (Boeing, National Archives of
Australia, Society for Biblical literature, etc.). It was written with the explicit purpose of being interoperable across different
platforms. In contrast, MS XML has not gone through a peer-review
process, and was written with only one product in mind. This
difference shows in the design of the formats.
What you should already know
We've tried to write this article for a general audience. But the ideal reader should be familiar with HTML.
What to watch out for
As you read this article, think of the following:
- Which format is more understandable?
The easier a format is to learn, the
easier it is to support. The programmers who create the tools you use
will be able to create them more efficiently and reliably with the more
- Which format reuses existing standards?
Reusing existing standards allows the
programmer to reuse her existing skills and her existing tools. Also,
existing standards are well tested and mature. We know that they work.
MS XML and OpenDocument are both ZIP archives containing several files and directories.
You can download a sample file of each from http://blogs.msdn.com/brian_jones/archive/2005/06/20/430892.aspx
The Microsoft files were produced by Brian Jones (of Microsoft). The .odt file is OpenDocument and was translated from the .doc file; the .docx is MS XML. Download the .odt and .docx, and unzip them.
Mixed content model
OpenDocument uses a mixed content model, whereas the MS XML format does not - but what is mixed content?
In non-mixed content, an element contains either other elements or text as its immediate children, but not both. A mixed content document, though, allows text and elements to be freely mixed. As an example:
1001 Washington St.
<para>Please welcome <name>Joe</name>
to our team.</para>
<para>He is <age>45</age> years
old and lives in <city>Pekin</city>,
Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing). This sort of mismatch leads to awkward markup:
<w:t>This is a </w:t>
<w:t> document </w:t>
<w:t> formatting, and a </w:t>
<w:hyperlink w:rel="rId4" w:history="1">
<w:rStyle w:val="Hyperlink" />
This is a <text:span text:style-name="T1">
very basic</text:span> document <text:span
text:style-name="T2"> with some </text:span>
formatting, and a <text:a xlink:type="simple"
This is a <i>very basic</i> document
<b>with some</b> formatting, and a
Now, ask yourself:
- Would you rather transform MS XML to XHTML or transform OpenDocument to XHTML?
- If you already know XHTML, which format allows you to reuse your skills more?
The different choice of model has an effect in how the format
handles formatting. The mixed-content model makes more sense, and is
closer to what a developer will be familiar to:
<w:t>this is bold</w:t>
this is bold
<b>this is bold</b>
If you are a developer, used to existing technologies, which format allows you to reuse your current skills most?
Separation of content and presentation
Let's go back to the above code sample
this is bold
Here, "Strong_20_Emphasis" refers to a style located elsewhere in the document. OpenDocument always uses
styles for formatting. This separation of content and presentation
makes some operations simpler. For example, say that instead of bold,
you want to use a different font type to emphasize text. You just edit
the style definition.
<w:t>this is bold</w:t>
That <w:b /> means "bold". The formatting is embedded into the tag itself.
To be fair, MS XML does make an attempt to
separate content and presentation. Both formats give you some
separation, and neither format gives you perfect separation. But
OpenDocument goes much further in that direction.
Separation into files
Along similar lines, MS XML and OpenDocument both separate the
document into several XML files (which are then zipped together).
However, they go about it in different ways. For example:
<w:hyperlink w:rel="rId1" w:history="1">
<w:t>This is a hyperlink</w:t>
This is a hyperlink
Notice that with MS XML we don't know where the hyperlink
points to - we have to look that up in a completely separate file.
With OpenDocument we immediately know where the link is pointing. This example takes us to our next point...
Reuse of standards
OpenDocument reuses existing standards whenever possible. It uses
SVG for drawings, MathML for equations, etc. This makes the format
infinitely more transparent to someone familiar with XML technologies.
It also allows you to reuse existing tools that understand these
standards. In contrast, Microsoft has decided to reinvent the wheel at
Look back at the example hyperlink above; you'll see that a number of attributes in the OpenDocument are prefixed with xlink:. What is XLink?
XLink is the XML Linking Language (XLink) Version 1.0,
which is an industry standard for references. Rather than reinventing
the wheel, OpenDocument simply uses the existing mechanism.
XLink is used in many ways in OpenDocument -
for example, embedding images. MS XML and OpenDocument both reference
an image file within the ZIP archive. But compare:
OpenDocument lets a developer reuse her
existing knowledge of XML technologies. And her XLink-aware tools will
work with OpenDocument.
Once again, where OpenDocument relies on a published standard, MS
XML re-invents the wheel. OpenDocument uses the Dublin Core metadata
standard. Any DC-aware application can add/view/update the metadata
without having to understand OpenDocument.
In this case the MS XML markup is very
similar. But since it's not quite the same, a standard tool that knows
Dublin Core will not automatically understand Microsoft's format (XML
is case sensitive). You still need a new tool.
Some rights reserved
This article was a collaboration between several members of the
: Alex Hudson, J. David Eisenberg, Bruce D'Arcus
and Daniel Carrera. You are free to use it under the terms of the
Commons Attribution ShareAlike license.