decoration decoration

When you want to know more...
For layout only
Site Map
About Groklaw
Legal Research
ApplevSamsung p.2
Cast: Lawyers
Comes v. MS
Gordon v MS
IV v. Google
Legal Docs
MS Litigations
News Picks
Novell v. MS
Novell-MS Deal
OOXML Appeals
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v Novell
Sean Daly
Software Patents
Switch to Linux
Unix Books


Groklaw Gear

Click here to send an email to the editor of this weblog.

You won't find me on Facebook


Donate Paypal

No Legal Advice

The information on Groklaw is not intended to constitute legal advice. While Mark is a lawyer and he has asked other lawyers and law students to contribute articles, all of these articles are offered to help educate, not to provide specific legal advice. They are not your lawyers.

Here's Groklaw's comments policy.

What's New

No new stories

COMMENTS last 48 hrs
No new comments


hosted by ibiblio

On servers donated to ibiblio by AMD.

Format Comparison Between ODF and MS XML ~ by Carrera, D'Arcus, Eisenberg
Friday, November 25 2005 @ 02:46 PM EST

Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera of the OpenDocument Fellowship have provide this article for us, comparing OpenDocument Format and Microsoft's new MS XML format technically, not legally. Groklaw will be doing that separately, but this article addresses interoperability. That is the point of XML, after all, is it not?


Format comparison between ODF and MS XML

OpenDocument: OpenDocument Format (.odt)
MS XML: Microsoft Office Open XML (.docx)


There has been a lot of attention to the legal encumbrances in Microsoft's new MS XML format. In this article we'll look at the technical side, and try to show you how the design of these formats affect interoperability. After all, that is the purpose of open standards.

OpenDocument benefits from 5 years of development involving many experts from diverse backgrounds (Boeing, National Archives of Australia, Society for Biblical literature, etc.). It was written with the explicit purpose of being interoperable across different platforms. In contrast, MS XML has not gone through a peer-review process, and was written with only one product in mind. This difference shows in the design of the formats.

What you should already know

We've tried to write this article for a general audience. But the ideal reader should be familiar with HTML.

What to watch out for

As you read this article, think of the following:

  • Which format is more understandable?
The easier a format is to learn, the easier it is to support. The programmers who create the tools you use will be able to create them more efficiently and reliably with the more understandable format.
  • Which format reuses existing standards?
Reusing existing standards allows the programmer to reuse her existing skills and her existing tools. Also, existing standards are well tested and mature. We know that they work.

The basics

MS XML and OpenDocument are both ZIP archives containing several files and directories.

You can download a sample file of each from

The Microsoft files were produced by Brian Jones (of Microsoft). The .odt file is OpenDocument and was translated from the .doc file; the .docx is MS XML. Download the .odt and .docx, and unzip them.

Mixed content model

OpenDocument uses a mixed content model, whereas the MS XML format does not - but what is mixed content?

In non-mixed content, an element contains either other elements or text as its immediate children, but not both. A mixed content document, though, allows text and elements to be freely mixed. As an example:

Non-mixed content

     1001 Washington St.


Mixed content
   <para>Please welcome <name>Joe</name>

   to our team.</para>

   <para>He is <age>45</age> years
   old and lives in <city>Pekin</city>,


Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing). This sort of mismatch leads to awkward markup:

  <w:t>This is a </w:t>

   <w:b />
  <w:t>very basic</w:t>

  <w:t> document </w:t>

   <w:i />
  <w:t>with some</w:t>

  <w:t> formatting, and a </w:t>
 <w:hyperlink w:rel="rId4" w:history="1">

    <w:rStyle w:val="Hyperlink" />


<text:p text:style-name="Standard">
   This is a <text:span text:style-name="T1">

   very basic</text:span> document <text:span
   text:style-name="T2"> with some </text:span>
   formatting, and a <text:a xlink:type="simple"

  This is a <i>very basic</i> document
  <b>with some</b> formatting, and a
  <a href="">hyperlink</a>


Now, ask yourself:

  • Would you rather transform MS XML to XHTML or transform OpenDocument to XHTML?
  • If you already know XHTML, which format allows you to reuse your skills more?


The different choice of model has an effect in how the format handles formatting. The mixed-content model makes more sense, and is closer to what a developer will be familiar to:


    <w:b />
  <w:t>this is bold</w:t>

<text:span text:style-name="Strong_20_Emphasis">
    this is bold

<b>this is bold</b>

If you are a developer, used to existing technologies, which format allows you to reuse your current skills most?

Separation of content and presentation

Let's go back to the above code sample

<text:span text:style-name="Strong_20_Emphasis">
    this is bold


Here, "Strong_20_Emphasis" refers to a style located elsewhere in the document. OpenDocument always uses styles for formatting. This separation of content and presentation makes some operations simpler. For example, say that instead of bold, you want to use a different font type to emphasize text. You just edit the style definition.

    <w:b />
  <w:t>this is bold</w:t>


That <w:b /> means "bold". The formatting is embedded into the tag itself.

To be fair, MS XML does make an attempt to separate content and presentation. Both formats give you some separation, and neither format gives you perfect separation. But OpenDocument goes much further in that direction.

Separation into files

Along similar lines, MS XML and OpenDocument both separate the document into several XML files (which are then zipped together). However, they go about it in different ways. For example:

<w:hyperlink w:rel="rId1" w:history="1">
      <w:t>This is a hyperlink</w:t>

<text:a xlink:type="simple"
    This is a hyperlink


Notice that with MS XML we don't know where the hyperlink points to - we have to look that up in a completely separate file. With OpenDocument we immediately know where the link is pointing. This example takes us to our next point...

Reuse of standards

OpenDocument reuses existing standards whenever possible. It uses SVG for drawings, MathML for equations, etc. This makes the format infinitely more transparent to someone familiar with XML technologies. It also allows you to reuse existing tools that understand these standards. In contrast, Microsoft has decided to reinvent the wheel at every turn.

Look back at the example hyperlink above; you'll see that a number of attributes in the OpenDocument are prefixed with xlink:. What is XLink?


XLink is the XML Linking Language (XLink) Version 1.0, which is an industry standard for references. Rather than reinventing the wheel, OpenDocument simply uses the existing mechanism. XLink is used in many ways in OpenDocument - for example, embedding images. MS XML and OpenDocument both reference an image file within the ZIP archive. But compare:

  <v:imagedata w:rel="rId1"
           o:title="My Image"/>

        xlink:type="simple" xlink:show="embed"


OpenDocument lets a developer reuse her existing knowledge of XML technologies. And her XLink-aware tools will work with OpenDocument.


Once again, where OpenDocument relies on a published standard, MS XML re-invents the wheel. OpenDocument uses the Dublin Core metadata standard. Any DC-aware application can add/view/update the metadata without having to understand OpenDocument.

<Title>My document</Title>

<Creator>Joe User</Creator>
<dc:title>My document</dc:title>
<dc:creator>Joe User</dc:creator>



In this case the MS XML markup is very similar. But since it's not quite the same, a standard tool that knows Dublin Core will not automatically understand Microsoft's format (XML is case sensitive). You still need a new tool.

Some rights reserved

This article was a collaboration between several members of the OpenDocument Fellowship : Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera. You are free to use it under the terms of the Creative Commons Attribution ShareAlike license.



Format Comparison Between ODF and MS XML ~ by Carrera, D'Arcus, Eisenberg | 295 comments | Create New Account
Comments belong to whoever posts them. Please notify us of inappropriate comments.
Corrections here
Authored by: MathFox on Friday, November 25 2005 @ 05:55 PM EST
if needed.

When people start to comment on the form of a message, it is a sign that they
have problems to accept the truth of the message.

[ Reply to This | # ]

OT here
Authored by: SpaceLifeForm on Friday, November 25 2005 @ 05:57 PM EST
Please post links in HTML.

[ Reply to This | # ]

To be honest
Authored by: Nick_UK on Friday, November 25 2005 @ 06:02 PM EST
Did anyone believe MS would follow the standard?

This is old news even before it is analysed on presumption
of past practices of MS.


[ Reply to This | # ]

The choice is clear for maintenance
Authored by: SpaceLifeForm on Friday, November 25 2005 @ 06:21 PM EST
If I have a document that contains various font type and/or sizes, and I want to change all occurances of a particular type-size combination, it will be much easier, quicker, and safer to do those changes to an OpenDocument formatted document.

[ Reply to This | # ]

Parser ...
Authored by: ikocher on Friday, November 25 2005 @ 06:35 PM EST
All I can see is parser pandemonium for ms-xml ...

To the simple eye ms-xml might seem easier to parse, but in it is not. xml
parsing is not really easy, but when you need to carry states outside the
section you are in, dereferences to links, and other curiosities... that is
really hard to do.

I think this is simple "ms parse style". I think that there is a
"intelligent guy" that had though about making some king of
optimization before design. This is not the first time I see this
"style" on ms. One of the funniest optimizations I had seen is in
windows 95, that a file called msdos.sys or something like that needed to be at
least 1024 bytes long, for the parser/tokenizer to be simpler and fast. The
funny part is that this file is read only _once_ when windows loaded... great
optimization, full speed for a small file that is procesed only once at power
up. This is what I call "ms parse style".

The format reflect these "ideas". Wonder... they should patent them


[ Reply to This | # ]

uneccessary use of gender specific pronouns
Authored by: Nivag on Friday, November 25 2005 @ 06:39 PM EST
Unless it is appropriate, can people please avoid using gender specific
pronouns. It doesn't matter in this article whether, or not, the developer is
female - but it keeps referring to "her", it would be better to use
"their" instead. Constant reference to "her" imply that
most developers are female.

Traditionaaly "he" has been used to indicate either gender, and it is
the greatest common string of characters in "He" and She". Also,
"she" has traditionally be associated with females only. But, I would
recommend using "their" and "they" rather than
"hers/his" and "he/she".

Also using gender neutral language avoids offending people who identify
themselves with both genders (they are rare, but I spoken with several). Note
that using "themselves" is a lot more elegant and shorter than saying
"herself or himself".

Constant inappropriate use of gender distracts from the otherwise excellent
technical article.

- Nivag

[ Reply to This | # ]

I'm convinced, stick with XHML (nt)
Authored by: Anonymous on Friday, November 25 2005 @ 07:10 PM EST

Hasn't someone mentioned the future is computers not paper..... Oh it's not about paper it's about translation of data.

Then why did Microsoft want to lock up the standard? Why would you pick a standard hard to translate? And why all this presentation stuff? Shouldn't you mention it's a paragraph and the local presentation layer work out what to do with a paragraph.

Shakes head and goes back to his editor and types in <P> to start a paragraph. I and every browser known to man knows what that means. Even Open office and Microsoft office have worked that out.

Crazy Engineer.

[ Reply to This | # ]

Obligatory XML/Lisp comparison
Authored by: adobriyan on Friday, November 25 2005 @ 07:29 PM EST
Speaking of wheel reinventing:
> <document>
> <name>Joe</name>
> <age>45</age>
> <address>
> <street>
> 1001 Washington St.
> </street>
> <city>Pekin</city>
> <state>IL</state>
> </address>
> </document>

That would be:

((name "Joe")
(age "45")
(address (street "1001 Washington St.")
(city "Pekin")
(state "IL")

> <document>
> <para>Please welcome <name>Joe</name>
> to our team.</para>
> <para>He is <age>45</age> years
> old and lives in <city>Pekin</city>,
> <state>Illinois</state>.</para>
> </document>

That would be:

((para "Please welcome " (name "Joe") " to our
(para "He is " (age "45") " years old and lives in
(city "Pekin") ", " (state "Illinois")

[snip unreadable MS w:r w:t ... crap, they have zero taste]

> <text:p text:style-name="Standard">
> This is a <text:span text:style-name="T1">
> very basic</text:span> document <text:span
> text:style-name="T2"> with some </text:span>
> formatting, and a <text:a xlink:type="simple"
> xlink:href="">hyperlink
> </text:a>
> </text:p>

That would be:

(p (style-name "Standard") "This is a "
(span (style-name "T1") "very basic") " document
(span (style-name "T2") "with some") " formatting,
and a "
(a (type "simple") (href "")

> <p>
> This is a <i>very basic</i> document
> <b>with some</b> formatting, and a
> <a href="">hyperlink</a>
> </p>

That would be:

(p "This is a " (i "very basic") " document " (b
"with some")
" formatting, and a "
(a (href "") "hyperlink")

> <Title>My document</Title>
> <Creator>Joe User</Creator>
> <DateCreated>2005-11-24T01:26:30</DateCreated>

That would be:

((title "My document") (Creator "Joe User") (DateCreated

> <dc:title>My document</dc:title>
> <dc:creator>Joe User</dc:creator>
> <dc:date>2005-11-24T01:26:30</dc:date>

That would be:

(dc (title "My document")
(creator "Joe User")
(date "2005-11-24T01:26:30")

[and so on ...]
I think it's little strange that some ODF(XML-based) people blame
MS XML(XML-based) people for wheel reinventing, when XML reinvents Lisp
since day 1.

P.S.1: office documents are _that_ huge now, so they need splitting into
multiple files and compressing?... Time to think how did this happen?

P.S.2: XML is verbose more than needed. MS XML snippets demonstrate that you can
make it extremely verbose. Now, was your favourite XML parsing library audited
for security wrt malicious input?

[ Reply to This | # ]

Was the MS format designed on a desert island?
Authored by: Prototrm on Friday, November 25 2005 @ 07:51 PM EST
The MS format is interesting. To me, it has the earmarks of a programmer who
only had access to the basic XML specs, and nothing else (Been there, done that,
got the T-shirt). It appears to me that they did no research to discover how
people were using XML, and may in fact, have been instructed not to do so. IMO,
the format's designer had not worked with XML before, and found themselves
making up the rules as they went along. This approach is acceptable if the
result isn't going to be used outside the confines of a non-open piece software,
because it doesn't hurt anyone but you. The result is the worst of all possible
worlds: you get the slow response of text parsing, and the inability to re-use
that comes from using a binary format.

I don't know about you people, but I'd rather stick with the old *.doc format.
This *.docx has no advantages, and lots of disadvantages, by comparison. And MS
claims *.docx is better than Open Document? I can't believe they can do so with
a straight face. It's clear that the benefits of an XML document format are the
very characteristics that MS ignored.

[ Reply to This | # ]

"Reveal Codes"
Authored by: Anonymous on Friday, November 25 2005 @ 08:26 PM EST

I'm by no means an expert on this, and I haven't looked carefully, but I have the following impressions (please correct me if I'm out of it).

This pitch kind of reminds me of the Word Perfect vs. Word interchanges of some time back. In particular, Word Perfect always had a "reveal codes" mode, whereby Word Perfect would display, and allow you to alter, the "codes" that control formatting, layout, fonts, and what have you.

My impression of word, on the other hand, was that the model for formatting the documents was to break down the document into a heirarchy of nested objects (as in object oriented programming). That approach doesn't lend itself to decomposition in displayable format codes, so Word never did that. Many people liked Word Perfect much more than Word simply because of the reveal codes approach - when the document wasn't formatting the way you wanted it, you could reveal the codes and find out why. When Word decides to format a document in a way other than expected, sometimes even "experts" have a heck of a time figuring out why, and getting things straightened out.

A quick glance at the pitch in this story gives me the impression that the current dichotomy, while not the same, is quite similar, with ODF being more like Word Perfect, and MS XML being more like the old MS Word.

[ Reply to This | # ]

Office 2003 or Office 12 XML?
Authored by: micheal on Friday, November 25 2005 @ 09:01 PM EST
It is not clear if the article is about Office 2003 XML or Office 12 XML. I
presume (since Office 12 is not yet out) that it is Office 2003. If so, I think
this should be emphasized.


If I have anything to give, made of this life I live, it is this song, which I
have made. Now in your keeping it is laid.

[ Reply to This | # ]

Format Comparison Between ODF and MS XML
Authored by: wap3 on Friday, November 25 2005 @ 10:30 PM EST
Ok, after reading this, I have one experience and then what Dept. of Homeland Security is doing right now. [may be long]

A few years back I wrote an application in Delphi.
There was a *template file* in RTF that would allow the user to customize the output.
During testing I was notified that there were issues, the RTF template was mangled.
Well after several emails and getting a copy of the mangled RTF I asked what they had done.
The answer was they used Word/WordPad to edit it to their style.
Come to find out Word/WordPad *does not* comply with *standard RTF*.
So I had to write an editor and include it along with mandatory instructions in the manual to use the included editor or a *standards compliant one* and never use MS Word/WordPad.

'Nuf said, MS has never and will never cooperate with *standards*.

Here is a another arrow in the quiver supporting OpenDocument.
The Department of Homeland Security this year mandated that all law enforcement and other service providers stop using the old 10-4 codes [read what MS-XML looks like - what is <w.r> anyway?] for the purpose of inter-operability.
In the 10-4 codes there is a bank of codes that are open for the agency to use as they see fit.
During the recovery efforts of the space shuttle this came to light.
One person got on the radio and announced, I have a 10-88, meaning they found a body part [to the way they used their unassigned codes] but another group thought they they were saying they had found a wounded person and needed an ambulance and medic [according to how they used the code].

So if plain English works for DHS then it should be held that public documents also use plain legible standards.

Albeit that every dispatcher and law enforcement agent throughly hates and despises the change, it is mandated and future money and funding is at risk for non-compliance.

Oh, back in the days, 10-x codes were great in fooling the crooks that had bought new C.B. radios, which is what law enforcement used.
But now with the digital multi-channel units that trick got squashed -- until the bad guys started [and legal by the way] having scanners that listen in on the new law enforcement bands/channels.

That's my $0.175 [was 2 cents - thanks GWB] and I'm sticking to it, since I'm the Technical Operations Manager for nineteen 9-1-1 agencies.

Link to radio communications requirement is here< /a>.


[ Reply to This | # ]

I'd say it was deliberate
Authored by: Altair_IV on Saturday, November 26 2005 @ 01:54 AM EST
After reading this, I don't get the feeling that MSXML is a product of poor
design or planning. Rather, I believe that M$ deliberately made their XML
schema as difficult to work with as possible. While they may have made the
decision to fully document and "open up" the format (and I'll believe
that when I see it), I think they also wanted to obsfucate it as much as they
could, confusing and hindering the ability of anyone else to implement it

So while it would be relatively easy to transform an ODF (for example) document
to MSXML, creating scripts to go the other way would be much more difficult and
time-consuming to do (not impossible, of course, but definitely more difficult).
They have always wanted to encourage one-way compatibility only, and think this
is just another small step in the same direction.

The poster formerly known as m(_ _)m
(I finally got around to creating a new account.)

Monsters from the id!!

[ Reply to This | # ]

Finishing Touch
Authored by: tredman on Saturday, November 26 2005 @ 01:55 AM EST
I think the References section says it all:

"Microsoft's Open Packaging Conventions - another proprietary standard you
must understand (and licence) in order to interoperate with Office 12."

So, even if Microsoft opens up the specification to the document schema itself,
it still has restrictive licenses on the packaging. Taken in the context of the
Massachusetts ODF melee, once again MS does what's best for public perception
instead of the customer.

"I drank what?" - Socrates, 399 BCE

[ Reply to This | # ]

  • Finishing Touch - Authored by: Anonymous on Saturday, November 26 2005 @ 12:20 PM EST
Open Office in use in more France administration office
Authored by: Anonymous on Saturday, November 26 2005 @ 08:31 AM EST
after the french "gendarmerie" (police force with a military status)
this is the service of the economics ministry who is choosing open format and
open office <a
m/"> article is in french</a>

and there is a site to help administration to go with open source software.<a

But Linux is still not really competing with windows on the desktop.

[ Reply to This | # ]

Authored by: Anonymous on Saturday, November 26 2005 @ 10:07 AM EST
I haven't seen anything that wierd since I had to reverse engineer old files
from a typesetting machine in the 1980s. Where the heck is the structure?

[ Reply to This | # ]

KDE developer blog entry on this
Authored by: arand on Saturday, November 26 2005 @ 10:49 AM EST
Here is a blog entry on ms-xml format. I think that says it the best.

A little correction: shouldn't the last example on article be the way around?

[ Reply to This | # ]

Format Comparison Between ODF and MS XML ~ by Carrera, D'Arcus, Eisenberg
Authored by: Anonymous on Saturday, November 26 2005 @ 03:41 PM EST
Um, now we're talking about standards, shouldn't the XHTML be like this:

  This is a <em>very basic</em>
document <strong>with some</strong> formatting, and a <a

[ Reply to This | # ]

Format Comparison Between ODF and MS XML ~ by Carrera, D'Arcus, Eisenberg
Authored by: Anonymous on Sunday, November 27 2005 @ 07:07 AM EST
The worst abuse of XML I've ever seen. The authors are, of course, not a third

[ Reply to This | # ]

The problem with Open but Proprietary 'standards'
Authored by: greengrass on Monday, November 28 2005 @ 06:59 AM EST
This a very old article from 1998, but it does a fair job dishing Open but Proprietary standards.


OSML Open For Business

[ Reply to This | # ]

Format Comparison Between ODF and MS XML ~ by Carrera, D'Arcus, Eisenberg
Authored by: Anonymous on Tuesday, November 29 2005 @ 11:04 AM EST
This article is misleading. The Open Office XML format (and the example on Brian
Jones web site) is perfectly acceptable and easily understable once the xml is
indented and formatted formatted for readability.

[ Reply to This | # ]

Format Comparison Between ODF and MS XML ~ by Carrera, D'Arcus, Eisenberg
Authored by: Anonymous on Tuesday, November 29 2005 @ 11:45 AM EST
Isn't it possible that in their production format, Microsoft chose to optomize file size at the expense of readability by shortening all the tags to a small length? If the tags were more human-readable, would the format be better? You could do a simple search and replace on their tags with more verbose ones to get a "better" format:
    <w:b />
  <w:t>this is bold</w:t>
becomes this:
< word:text_element >
    < word:text_formatting >
        < word:bold / >
    < /word:text_formatting >
    < word:text > this is bold < /word:text >
< /word:text_element >
I agree that the non-mixed vs. mixed argument holds some water, but the unreadability is probably just a filesize measure. It will save disk space, load time, and memory use.

[ Reply to This | # ]

Groklaw © Copyright 2003-2013 Pamela Jones.
All trademarks and copyrights on this page are owned by their respective owners.
Comments are owned by the individual posters.

PJ's articles are licensed under a Creative Commons License. ( Details )