Arnaud's Blog

Opinions on open source, standards, and other things

OOXML and legacy documents

One of the stated goals of OOXML is to address legacy documents and the need for long-term preservation. The Office Open XML Overview states that “Preserving the financial and intellectual investment in those documents (both existing and new) has become a pressing priority.” I think we can all relate to that but I fail to see how OOXML addresses the preservation of existing documents.

We’ve all had to face the challenge of keeping old files alive by converting them to the latest format as new version of a software comes out. This is typically a tedious process that consists of opening each and every one of your files and saving it back in the new format. So, it is no doubt that organizations like the British Library are interested in a solution to that problem. But is OOXML really the solution?

I already discussed how the mere fact that OOXML is in XML is no guarantee that the format is more open than its binary sibling. It is indeed no guarantee that anybody other than Microsoft will effectively be able to process OOXML files, neither today nor in many years from now. In fact, given the poor quality of the current specification it is actually guaranteed that nobody else but Microsoft can do that.

But the point that strikes me as the oddest about this statement is that, even if the OOXML specification was of reasonable quality and truly allowed for complete implementations other than Microsoft Office, it still doesn’t do any good to existing documents. Simply because existing documents are NOT in the OOXML format.

This is a point I already touched on in my previous entry on migration cost. It seems so obvious that one would think it’s not even worth mentioning but evidence shows that many people don’t know that.

I’ve talked to government people who understood the issues the OOXML specification raise but were worried that voting against OOXML as an ISO standard was going to jeopardize the future of their existing files. They truly believed that OOXML was going to save their files from being doomed in the future. I just got off the phone with someone who too didn’t realize the proposed standard was not the format that is in use today. This gentleman thought Microsoft was merely standardizing the format which is already a de facto standard. Little did he know that OOXML wasn’t that format.

People don’t appear to understand that OOXML is a different format. They don’t realize that using it implies getting new software and converting all their files to the new format. They don’t understand that basically only Microsoft is in a position to reliably perform this conversion because they are the only ones to really know what’s in their binary format, which they did not open.

If Microsoft really cared about people’s concern with regard to the preservation of their existing files, they would have done just that: open their binary format. That’s the format that is being used, the format in which existing files are in. Opening that format would mean to fully document it and to remove any legal barrier to fully implement it.

So, how exactly are people supposed to take advantage of OOXML to preserve their existing files from the adversity of ever changing software? The reality is they need to buy Microsoft Office 2007 and, once again, open each and every one of their existing files and save them back using the new format. I was told Microsoft is working on a tool that will allow converting files in a batch mode. That sure would be helpful but does anybody think that tool will be free? I doubt it.

So, in practice, to take advantage of the OOXML standards-wannabe and be, in theory, free from Microsoft lock-in it appears that one has to at least buy Microsoft Office one more time. A sort of toll to the use of a so called “open standard”. Rather odd I think, don’t you?


November 27, 2007 Posted by | standards | , , , | 3 Comments

Format vs Tool (continued)

One thing I should have added to my previous entry is that I believe the reason some people think the tool is more important than the format is because they are confusing the means from the end.

We use tools to achieve specific tasks. Because the tools are what we are primarily interacting with, tools take a prominent role and some people end up thinking that the tools are what matters most. But I believe this is wrong.

The tool is merely a means to an end. The end being to capture, create, process, communicate, and share information. The information is the end game, not the tool. Making this distinction is fundamental. The tool is merely what we use to manipulate the information which is really what we care about.

In this context, having a standard format that can represent the information is tremendously more important than any specific feature a particular tool may have. In fact, having a standard format enables the information to be manipulated using different tools, allowing you to change tools based on your needs and what is available. This in turn leads to having more features at your disposal.

This model is undoubtedly more powerful than being stuck with a single tool, not matter how great that tool may be at a given time, and depending on a single vendor to provide you with all the features you may need or want. Having a standard format enables competition which leads to more innovation and greater tools.

I know not everybody agrees with that last point; some people think that standards stifle innovation, but I disagree and plan to discuss this in a future entry.

November 21, 2007 Posted by | standards | , , | 1 Comment

Format vs Tool – where is the value?

At Goscon last month, Jason Matusow of Microsoft, stated that what matters most is not how information is stored but how you access it. According to Jason the real value is in the tool and this is what you should worry about; the format used to store the information is an implementation detail.

I understand why Microsoft would say that in light of the increasing demand for open standards like ODF. When you enjoy a quasi monopoly status you don’t necessarily want to open your formats and enable competition. But it remains that this argument appears to me as terribly retrograde and at odd with the era we’re in.

Contrary to Microsoft’s claim, I think the tool is no longer the center of interest, the information is. When I made that point at Goscon somebody in the audience applauded and I’m confident that this view is shared by many people but experience shows that what I think is common knowledge is often not. I’ve also learned that only through repetition things eventual sink in. So, I want to discuss this a bit further. Hopefully this will have some value even to those of you who are already convinced.

We’ve all used tools that function like black boxes. You use a specific tool to generate information, and you use that same tool to retrieve that information back. The information is literally imprisoned in some form of storage only known to the application you’ve been using.

When you think about it, if you create a book using Microsoft Word, even though the content of the book is yours, you are not free to access it the way you want. You can only access your own book through Microsoft Word.

But this is a model of the past. It was ok when all we did was to create documents that lived on one computer and stayed there, when sharing a document meant to print it and mail or fax it. But this is no longer acceptable in a world where information is primarily destined to be shared via some digital media, email or other.

The web has demonstrated the power of separating the way the data is represented from the application we use to access it. It is thanks to standards like HTML and CSS that we can all browse the web independently of what computer and browser we use. It is thanks to these standards that people can use whatever hardware and software they like to create and deliver web pages.

Similarly, having been using ODF for a while now, I’ve experienced first hand the pleasure of being able to try new tools as they come out, and switch tool depending on what I’m doing and my liking, all the while without having to convert my documents from one format to another. It may sound like I’m preaching but it is very real. Freedom is exhilarating!

There is no doubt in my mind that people who have had a taste of the freedom provided by this new model of separating the data format from the application will no longer accept the old model. They will no longer accept a model that ties their information to the application they happened to use to create it.

Those of us who are old enough to have known the old model will keep wanting more freedom, and the younger crowd will simply expect it. The future generations will demand it, and will reject anything that doesn’t respect what is fundamentally a right. The right to access YOUR information the way YOU want.

November 20, 2007 Posted by | standards | , , | 2 Comments

Good chuckle

Because I was one of the editors of the HTML 4 specification my name and former email address are in the related HTML4 DTD files.

The DTD files are the files containing the formal definition of the HTML language used for web pages. As such they often are referenced from web pages and, although it’s normally completely transparent to users, occasionally people ran into one of these files. When they see my name in there some of them assume I have something to do with the page they are dealing with.

Usually, the consequences are rather mundane. The most common case is when they are facing some technical difficulty and they just send me (and my co-editors) an email asking for help. I’m used to that and these messages don’t surprise me anymore.

On the other hand, I was not prepared for the one I recently received. Below is the top portion of it.

From: Nancy ****

To: ‘RITA’; ‘sales@ ****’; ‘’

Cc: ‘robert@ ****’; ‘’; ‘’

Subject: RE: status call this morning

Dave Raggett, Arnaud Le Hors, Ian Jacobs , RITA, ROBERT,


Out of sympathy for “Nancy” and respect for the company this involved -which may very well be at fault but I’m not here to judge – I anonymized it a bit. Amazingly enough Nancy took the time to hide the balance amount from her attached bank statement but not her account number, address, etc.

This definitely beats every email I have ever received related to my involvement in HTML. 🙂

November 9, 2007 Posted by | Uncategorized | | Leave a comment

CDF and interoperability

Andy Updegrove published an enlightening piece on why the recent claims from the founders of the OpenDocument Foundation regarding the W3C Compound Document Format (CDF) have been puzzling many of us. I just want to add a tidbit of information regarding CDF which is in line with my previous post on XML vs Open.

CDF is just another piece of technology that helps raising the level of interoperability achievable between software components exchanging XML data. It provides us with a formal way of describing how various XML vocabularies are being used together. This is definitely useful and that’s why IBM, for one, has been participating in its development. Yet, this is no magic bullet either.

CDF is merely a framework, a container. As such, CDF itself does not ensure interoperability. Interoperability can only be achieved with regard to a specific “CDF profile”. A CDF profile lists a specific set of XML vocabularies and how they are to be mixed. Interoperability is only achieved between applications that support the same CDF profile(s).

This is applications that not only support CDF but also support every one of the XML vocabularies being used in that particular profile as well as the particular way they are being used together (CDF supports various combination models).

I’m sure you’ve had the same experience as I have with video files you can open but your media player won’t play because it doesn’t have the right codec. That’s the exact same problem. The MPEG video format is a container that lets the player discover what video compression is used in a standard way. This is nice but, as experience shows, it doesn’t guarantee that your player will be able to render all videos, merely that it can figure out what’s in the file and whether it can render it or not.

So, again, let’s be careful not to jump to conclusions too fast. Just like XML itself and many other technologies, CDF is useful but it does not in and of itself guarantee interoperability.

November 9, 2007 Posted by | standards | , | 1 Comment

OOXML, ODF, and migration costs

There is an important point I want to make about what people need to consider when contemplating whether to move to ODF or not.

Some people seem to think that the choice they have is to either stick with the status quo – Microsoft Office -, or disrupt the status quo and adopt ODF, with the assumption that the former is easier and a more natural progression than the latter.

This is missing a very important point: OOXML is NOT the status quo, it is a NEW format, just like ODF. As such its adoption presents challenges very similar to those of the adoption of ODF.

OOXML, just like ODF, requires a migration.

Moving to ODF, means deploying new software, training people to that software, developing support for it, etc, plus disrupting your work environment by introducing a format not everybody may be ready to deal with. The cost of that migration is undoubtedly the biggest barrier to the adoption of ODF. Yet, the same applies to OOXML.

Indeed, moving to OOXML means moving to the new Microsoft Office application, training people to it, developping support for it, etc, and, just like with ODF, disrupting your work environment by introducing a new format.

Even though I don’t have actual numbers to back this up, I think it’s fair to say that the incurred cost ought to be similar on the migration front. Given that, and considering that there are several freely available offerings for ODF, I’ll then venture to say that migration to ODF is actually likely to be cheaper because it saves you from having to pay Microsoft Office license fees.

In a desperate attempt to disrupt the momentum behind ODF Microsoft hurried to create a standard they could claim to support. Yet, this new proprietary format in disguise faces the same challenges as the format they are trying to stop: cost of migration.

So, remember that when it’s time for you to choose. The choice you have to make is not between adopting a new format ODF or sticking with Microsoft Office. It is between migrating to ODF or migrating to OOXML, both new formats, ODF being an open standard for which offerings are freely available.

November 2, 2007 Posted by | standards | , , | 4 Comments

Censorship, moving to

When I talk about “open” these days I’m usually talking about open source and standards and what it means. Like in my previous post on XML vs Open. This post is about another form of openness, or lack of openness rather: censorship.

I used to associate “digital divide” with the mere fact that some people simply don’t have access to computers and the internet. I’ve now come to realize that it goes beyond that. Indeed, there also are the people who have access to computers and the internet but are only offered a crippled version of it.

In these days and age people like myself, who live in the western part of the world, tend to lose sight of how much freedom we enjoy. My recent trip to China made this very clear to me.

I’m sure you’ve heard of the censorship that exists in China. I certainly had. Yet, when I was in China a few weeks ago, sitting in my hotel room trying to find information on Beijing and what I might want to go sightseeing, it took me a while to realize that, what was preventing me from accessing certain websites wasn’t a faulty connection or some network misconfiguration but censorship.

Interestingly enough I was able to circumvent it very easily by setting up my DNS server to some US-based IBM server I had access to, through my VPN connection. Rather weak way of implementing censorship I thought. But then I realized that, even though this was trivial for me to do, this is still probably effective for the vast majority of users in China. Indeed, if the internet used to be reserved to the geeks, they have long been outnumbered, and by a huge number. The vast majority of internet users don’t even have a clue what a DNS server is, and there is no reason it should be otherwise.

Back home from China, a few days later, I reset my network setting to its usual configuration and didn’t expect to have to worry about China’s censorship anymore. At least, not until I go back to China. Little did I know that this censorship was going to hit me again so quickly. Right here, at home. This time, it affects me in that people in China cannot directly access this blog. Indeed, I was informed by a colleague from China that to access this site he has to go through an anonymizer. Bummer.

So, this is it for Blogger. I’m moving to WordPress. Why is Blogger censored and WordPress is not is beyond me. If anything, it only proves how arbitrary censorship can be. But Shush!… Let’s not advertize this too much, the Chinese’s window to the world might close a bit more…

My new blog address is:
See you there!

November 1, 2007 Posted by | blogs, censorship, open | 5 Comments