decoration decoration
Stories

GROKLAW
When you want to know more...
decoration
For layout only
Home
Archives
Site Map
Search
About Groklaw
Awards
Legal Research
Timelines
ApplevSamsung
ApplevSamsung p.2
ArchiveExplorer
Autozone
Bilski
Cases
Cast: Lawyers
Comes v. MS
Contracts/Documents
Courts
DRM
Gordon v MS
GPL
Grokdoc
HTML How To
IPI v RH
IV v. Google
Legal Docs
Lodsys
MS Litigations
MSvB&N
News Picks
Novell v. MS
Novell-MS Deal
ODF/OOXML
OOXML Appeals
OraclevGoogle
Patents
ProjectMonterey
Psystar
Quote Database
Red Hat v SCO
Salus Book
SCEA v Hotz
SCO Appeals
SCO Bankruptcy
SCO Financials
SCO Overview
SCO v IBM
SCO v Novell
SCO:Soup2Nuts
SCOsource
Sean Daly
Software Patents
Switch to Linux
Transcripts
Unix Books
Your contributions keep Groklaw going.
To donate to Groklaw 2.0:

Groklaw Gear

Click here to send an email to the editor of this weblog.


To read comments to this article, go here
Format Comparison Between ODF and MS XML ~ by Carrera, D'Arcus, Eisenberg
Friday, November 25 2005 @ 02:46 PM EST

Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera of the OpenDocument Fellowship have provide this article for us, comparing OpenDocument Format and Microsoft's new MS XML format technically, not legally. Groklaw will be doing that separately, but this article addresses interoperability. That is the point of XML, after all, is it not?

****************************************************************************

Format comparison between ODF and MS XML

OpenDocument: OpenDocument Format (.odt)
MS XML: Microsoft Office Open XML (.docx)

Introduction

There has been a lot of attention to the legal encumbrances in Microsoft's new MS XML format. In this article we'll look at the technical side, and try to show you how the design of these formats affect interoperability. After all, that is the purpose of open standards.

OpenDocument benefits from 5 years of development involving many experts from diverse backgrounds (Boeing, National Archives of Australia, Society for Biblical literature, etc.). It was written with the explicit purpose of being interoperable across different platforms. In contrast, MS XML has not gone through a peer-review process, and was written with only one product in mind. This difference shows in the design of the formats.

What you should already know

We've tried to write this article for a general audience. But the ideal reader should be familiar with HTML.

What to watch out for

As you read this article, think of the following:

  • Which format is more understandable?
The easier a format is to learn, the easier it is to support. The programmers who create the tools you use will be able to create them more efficiently and reliably with the more understandable format.
  • Which format reuses existing standards?
Reusing existing standards allows the programmer to reuse her existing skills and her existing tools. Also, existing standards are well tested and mature. We know that they work.

The basics

MS XML and OpenDocument are both ZIP archives containing several files and directories.

You can download a sample file of each from http://blogs.msdn.com/brian_jones/archive/2005/06/20/430892.aspx

The Microsoft files were produced by Brian Jones (of Microsoft). The .odt file is OpenDocument and was translated from the .doc file; the .docx is MS XML. Download the .odt and .docx, and unzip them.

Mixed content model

OpenDocument uses a mixed content model, whereas the MS XML format does not - but what is mixed content?

In non-mixed content, an element contains either other elements or text as its immediate children, but not both. A mixed content document, though, allows text and elements to be freely mixed. As an example:

Non-mixed content
<document>
   <name>Joe</name>
   <age>45</age>
   <address>

      <street>
     1001 Washington St.
      </street>
      <city>Pekin</city>
      <state>IL</state>

   </address>

</document>
Mixed content
<document>
   <para>Please welcome <name>Joe</name>

   to our team.</para>

   <para>He is <age>45</age> years
   old and lives in <city>Pekin</city>,
   <state>Illinois</state>.</para>

</document>
 

Non-mixed documents usually represent structured data; mixed documents are usually used to represent narrative. MS XML uses the non-mixed model to represent narrative (word processing). This sort of mismatch leads to awkward markup:

MS XML
<w:p>
 <w:r>
  <w:t>This is a </w:t>

 </w:r>
 <w:r>
  <w:rPr>
   <w:b />
  </w:rPr>
  <w:t>very basic</w:t>

 </w:r>
 <w:r>
  <w:t> document </w:t>
 </w:r>
 <w:r>

  <w:rPr>
   <w:i />
  </w:rPr>
  <w:t>with some</w:t>
 </w:r>

 <w:r>
  <w:t> formatting, and a </w:t>
 </w:r>
 <w:hyperlink w:rel="rId4" w:history="1">
  <w:r>

   <w:rPr>
    <w:rStyle w:val="Hyperlink" />
   </w:rPr>
   <w:t>hyperlink</w:t>
  </w:r>

 </w:hyperlink>

</w:p>
OpenDocument
<text:p text:style-name="Standard">
   This is a <text:span text:style-name="T1">

   very basic</text:span> document <text:span
   text:style-name="T2"> with some </text:span>
   formatting, and a <text:a xlink:type="simple"
   xlink:href="http://example.com">hyperlink
   </text:a>

</text:p>
XHTML
<p>
  This is a <i>very basic</i> document
  <b>with some</b> formatting, and a
  <a href="http://example.com">hyperlink</a>

</p>

Now, ask yourself:

  • Would you rather transform MS XML to XHTML or transform OpenDocument to XHTML?
  • If you already know XHTML, which format allows you to reuse your skills more?

Formatting

The different choice of model has an effect in how the format handles formatting. The mixed-content model makes more sense, and is closer to what a developer will be familiar to:

MS XML

<w:r>
  <w:rPr>
    <w:b />
  </w:rPr>
  <w:t>this is bold</w:t>

</w:r>
OpenDocument
<text:span text:style-name="Strong_20_Emphasis">
    this is bold
</text:span>

XHTML
<b>this is bold</b>
 

If you are a developer, used to existing technologies, which format allows you to reuse your current skills most?

Separation of content and presentation

Let's go back to the above code sample

OpenDocument
<text:span text:style-name="Strong_20_Emphasis">
    this is bold
</text:span>

 

Here, "Strong_20_Emphasis" refers to a style located elsewhere in the document. OpenDocument always uses styles for formatting. This separation of content and presentation makes some operations simpler. For example, say that instead of bold, you want to use a different font type to emphasize text. You just edit the style definition.

MS XML:
<w:r>
  <w:rPr>
    <w:b />
  </w:rPr>
  <w:t>this is bold</w:t>

</w:r>
 

That <w:b /> means "bold". The formatting is embedded into the tag itself.

To be fair, MS XML does make an attempt to separate content and presentation. Both formats give you some separation, and neither format gives you perfect separation. But OpenDocument goes much further in that direction.

Separation into files

Along similar lines, MS XML and OpenDocument both separate the document into several XML files (which are then zipped together). However, they go about it in different ways. For example:

MS XML
<w:hyperlink w:rel="rId1" w:history="1">
   <w:r>
      <w:t>This is a hyperlink</w:t>
   </w:r>

</w:hyperlink>
OpenDocument
<text:a xlink:type="simple"
        xlink:href="http://example.com">
    This is a hyperlink
</text:a>

 

Notice that with MS XML we don't know where the hyperlink points to - we have to look that up in a completely separate file. With OpenDocument we immediately know where the link is pointing. This example takes us to our next point...

Reuse of standards

OpenDocument reuses existing standards whenever possible. It uses SVG for drawings, MathML for equations, etc. This makes the format infinitely more transparent to someone familiar with XML technologies. It also allows you to reuse existing tools that understand these standards. In contrast, Microsoft has decided to reinvent the wheel at every turn.

Look back at the example hyperlink above; you'll see that a number of attributes in the OpenDocument are prefixed with xlink:. What is XLink?

XLink

XLink is the XML Linking Language (XLink) Version 1.0, which is an industry standard for references. Rather than reinventing the wheel, OpenDocument simply uses the existing mechanism. XLink is used in many ways in OpenDocument - for example, embedding images. MS XML and OpenDocument both reference an image file within the ZIP archive. But compare:

MS XML
<w:pict>
  <v:imagedata w:rel="rId1"
           o:title="My Image"/>
</w:pict>

OpenDocument
<draw:frame>
  <draw:image
        xlink:href="Pictures/000000001.jpg"
        xlink:type="simple" xlink:show="embed"
        xlink:actuate="onLoad"/>
</draw:frame>

 

OpenDocument lets a developer reuse her existing knowledge of XML technologies. And her XLink-aware tools will work with OpenDocument.

Metadata

Once again, where OpenDocument relies on a published standard, MS XML re-invents the wheel. OpenDocument uses the Dublin Core metadata standard. Any DC-aware application can add/view/update the metadata without having to understand OpenDocument.

MS XML:
<Title>My document</Title>

<Creator>Joe User</Creator>
<DateCreated>2005-11-24T01:26:30</DateCreated>
OpenDocument:
<dc:title>My document</dc:title>
<dc:creator>Joe User</dc:creator>

<dc:date>2005-11-24T01:26:30</dc:date>

 

In this case the MS XML markup is very similar. But since it's not quite the same, a standard tool that knows Dublin Core will not automatically understand Microsoft's format (XML is case sensitive). You still need a new tool.

Some rights reserved

This article was a collaboration between several members of the OpenDocument Fellowship : Alex Hudson, J. David Eisenberg, Bruce D'Arcus and Daniel Carrera. You are free to use it under the terms of the Creative Commons Attribution ShareAlike license.

References


  View Printable Version


Groklaw © Copyright 2003-2013 Pamela Jones.
All trademarks and copyrights on this page are owned by their respective owners.
Comments are owned by the individual posters.

PJ's articles are licensed under a Creative Commons License. ( Details )