Arnaud's Blog

Opinions on open source, standards, and other things

Let’s be clear: The Apache Software Foundation does NOT support OOXML.

OK, I’ll admit that nobody has claimed otherwise. Yet.

But in these days and age you are never too prudent. It wouldn’t surprise me to see this or other similar fancy claim being published eventually.

Indeed, in its desperate and last minute attempts to convince National Bodies around the world that OOXML is happening anyway so they might as well support it as an ISO standard, Microsoft is eager to claim support by as many companies and organizations as possible.

As evidence, in its latest OOXML propaganda open letter Microsoft lists IBM among other companies as having “already adopted (or announced adoption of) Open XML in their products”. This, despite a clear explanation of the contrary by Rob Weir, published two months ago! Does anyone believe they haven’t seen it or heard about this? I sure don’t. And if there was any room for misunderstanding Bob Sutor’s statement filled that in.

A colleague in a foreign country even reported that in a National Body meeting he had been confronted by a representative from Microsoft who was trying to silence him via intimidation and insistence that IBM supported OOXML contrary to what he was saying.

Microsoft’s oversight of IBM’s denials is clearly not accidental. It is part of a well crafted and continuous disingenuous plan to convince NBs at all cost. There is already so much evidence of Microsoft going far beyond what most would consider normal lobbying behavior it is sickening. For one, I’m not ready to forget the case of the NGOs in India. Talk about dirty practices.

But what really is at the bottom of Microsoft’s claims is that basically any software that handles XML supports OOXML. While technically this is true to a certain degree, such a bold claim without any further qualification is pure misinformation. Obviously, one of the advantages of using XML is to make your format, whatever it is, easier to handle, it’s one of the fundamental benefits of using XML. But as I previously touched on in my entry on XML vs Open, there is a big difference between being able to handle XML files at the XML level and truly supporting the particular format at hand.

Supporting OOXML. cannot be merely declared on the sole basis that a software can read OOXML files, or store OOXML files. If that were the case, then any XML parser could be said to support OOXML and the Apache Software Foundation could be said to support OOXML because its XML parser, Xerces, can read OOXML files (one would actually have to unzip them first but it’s not like Microsoft would stop at that kind of detail). But it takes much more than that to really support OOXML.

One has to understand the actual structure beyond the XML representation and the semantic associated to each and every piece of data found in an OOXML file. That’s what the 6000+ pages of the specification are supposed to define, unfortunately they do that extremely poorly.

The good news is that I don’t think Microsoft is fooling that many people. Based on my own observation of Microsoft representatives and the way they talk to people they seem to be completely oblivious to the fact that they appear as if they think the people they are talking to are too stupid to see through their tired arguments. I’ve got news for them: people aren’t that stupid. Thankfully. And I’m hopeful the results at the end of the month will make that clear.

The other good news is that whether OOXML gets approved or not, I believe Microsoft will pay a high price for all of its mischief and its image will come out of this badly damaged, something they can only blame themselves for.

In the meanwhile, don’t take for granted any claims of support for OOXML from Microsoft. The fact that Microsoft claims IBM has adopted OOXML can only make one wonder about all the other companies they list…

March 19, 2008 Posted by | standards | , , , , | 15 Comments

XML vs Open

I heard Microsoft claiming that OOXML is open because it is in XML. In “open” they mean that anyone can use, process, manipulate, interpret OOXML documents. Is that really so? I say not!

A while ago my colleague Kelvin Lawrence had a blog entry titled “It uses XML so it is a standard right? wrong!,” on a type of abuse regarding XML which consists of people claiming that because their format is in XML it is a standard. I had then commented to Kelvin’s entry pointing out another fallacy regarding XML which is that because a format is in XML anybody can process it.

The claim from Microsoft regarding OOXML being open because it is an XML format hits that very point I was making. This is just plain wrong and people need to understand why. So I’m going to expand a bit on what I said in my comment to Kelvin’s entry.

The best analogy I’ve found to get people to understand why this assertion is false is that saying that your format is in XML is about the same as saying that your language uses the roman alphabet. This alone clearly doesn’t guarantee that anyone who knows the roman alphabet can understand your language.

At most, knowing the roman alphabet only guarantees that you can decipher the letters, one by one. This certainly doesn’t guarantee that you will be able to understand the words, yet alone the sentences, the letters form.

The same is true about XML formats. Knowing that a format is in XML merely guarantees that you can parse the document. Parsing in computer science is the function that scans a document, typically a file, to extract the information it contains. XML makes it easy to do this operation and turn the content of an XML document into a structure in memory. But what that structure represents, what the pieces of that structure represent, you don’t know. They are just bits and pieces in a hierarchical form.

Because XML is a text-based format in which data is tagged, as a human being, you might actually be able to guess a bit more by looking inside the document. If you see a tag called “table” for instance, it’s probably safe to infer that this part of the document contains tabular data. But you’re unlikely to go much further than that and a program certainly won’t do any of that guessing.

If the document comes with a schema, such as an XML schema, the structure in memory may be a bit richer. Instead of only having character strings, you’ll have typed data for one thing. So, for instance, instead of having the character string “123”, you may have the number 123. You may also know that a set of pieces of data is referenced as some kind of record called “customer”. But you still won’t have much more than that.

Tim Berners-Lee intends to go one step further with a set of technologies the W3C has been developing under the umbrella of “Semantic Web“. However, we have yet to see how far this will get us and in any case this doesn’t apply to formats such as OOXML for which this technology isn’t used.

So the only way to know more is to have a documentation that tells you what the format is really made of, what each tag corresponds to, and how they relate to each other. This is where the specification comes in to play.

The specification is the document that tells you that the “P” tag corresponds to a paragraph and that you can expect to find on the “P” tag the “align” attribute that specifies the paragraph alignment. The specification is what defines the semantic, the meaning of what’s in the document, beyond the XML format.

Only by carefully reading the specification, and writing programming code that interprets the document content accordingly, you will be able to fully process the document as intended. Without the specification how are you supposed to know that “P” stands for a paragraph rather than, say, a person?

This is why the specification is so important, and this is one of the reasons so many people have been complaining about OOXML. OOXML is so poorly defined that there is no way two engineers in two different places in the world can sit down, implement the specification, and except the same behavior. The OOXML specification has way too many unspecified or incompletely specified features.

This isn’t to say that there is no value in a format being XML based. Obviously I wouldn’t have spent several years working on XML if I thought so. Having a format in XML allows you to use existing code to parse the document in memory rather than having to write a different parser for every document format. There is definitely value in that and it does contribute to making a format more open by lowering the cost of implementation but that’s not enough to make it “open”.

Interestingly enough, if Microsoft fully documented its existing binary format for Office and made that documentation freely available to all without any legal barrier, their binary format could be more open than OOXML is, even though it’s not XML based.

Of course the fact that Microsoft keeps referring to its format as “Open XML” only makes the situation more confusing.

In any case, don’t fall for it. Look beyond the claims.

October 23, 2007 Posted by | open, standards | , , , , | 2 Comments