Microsoft Word’s formatting garbage, quantified

Anyone who’s spent any amount of time working on the web dreads it: content delivered in Microsoft Word format. Word adds tons of formatting garbage that results in bloated files and messes up the presentation when content gets brought into HTML.

When Microsoft released Office 2007, they touted switching to an XML-based document format for all of the apps. But all XML is not created equal.

Case in point: I am currently working on a project that is going to involve receiving content for a number of web pages in a tabular form, either in Word or Excel format. A spreadsheet, essentially (if not technically), with each page represented by a row, and its text content in a cell. I will be writing a PHP script to parse the spreadsheet data and generate a set of HTML files with the content loaded in them.

I’m currently trying to determine if Word or Excel would be the better format to receive the content in, which involves opening up .xlsx and .docx files in BBEdit and looking at the raw data stored within them. I’ve managed to identify the embedded XML files in each that hold the actual content. These files store the same actual text content, but their XML schemas vary based on the needs of Word and Excel.

So… how do they match up? The XML file I pulled out of Excel is 14 KB. The one from Word is 202 KB. For the mathematically inclined amongst you, that’s a little more than 14 times larger. Yes… another (perhaps more hyperbolic) way you could say it is that the Word document is exponentially larger.

That’s just ridiculous.

What makes up the difference? Well, the Excel file’s XML is nothing but basic tags. There are no attributes on any of the tags, as far as I can tell. It’s pure semantic structure. The Word XML, on the other hand, is almost nothing but attributes. And there’s nothing smart about them either. Most of them are assigning fonts to the text. The same font names, over and over and over again throughout the file.

That’s… beyond ridiculous.

Giving Microsoft a ribbin’ over the ribbon

OK, that was an incredibly lame title; I guess I’ve just read too many headline puns in Entertainment Weekly over the years.

Anyway, I’d like to take a moment out of my ongoing obsession with translucent menu bars and open source operating systems (OSOSes?) and turn to the “dark side,” if you will. (That’d be Microsoft.)

A few weeks ago I took a training course for work. The course was not actually on Office 2007, but the computers in the training room were equipped with it, and it did come into play a few times. This was my first exposure to this version of Office, and needless to say I was stunned (and not necessarily in a good way) by the radically altered user interface.

I wouldn’t say I have any kind of unhealthy attachment to the lowly menu bar, but it is, after all, one of the cornerstones of a graphical user interface. I suppose being a Mac user has an effect on my sense of its importance, since it is ever-present at the top of the screen. I do think the Windows approach, where the menus are integrated into the application window, makes more sense and is — perhaps (gasp!) — more intuitive for novice users. But regardless of where it is, in most applications it just needs to be there, and without it I’m as lost as I’d be if I were looking not at a computer screen but at the inscrutable LCD display of a photocopier or a fax machine. (Have I ever mentioned how much I hate photocopiers and fax machines?)

If you’ve not yet seen Office 2007, you may not understand where I’m going with this, but, yes… it’s true… the menu bar is gone — GONE!!! — in all Office 2007 applications. Instead, you have… this:

Microsoft Word 2007 ribbon

Credit where credit is due (so Microsoft will not sue, since this image is surely copyrighted), I swiped this screenshot from here.

Maybe it’s just the effect of Steve Ballmer‘s voice ringing incessantly in the ears of their developers, but Microsoft actually has the audacity to suggest that this “ribbon” reduces clutter. Never mind the fact that you likely will have no idea where your formerly familiar menu options have gotten off to in this sea of buttons. And do not for a moment ask yourself why, if the tabbed ribbons have replaced the menus, they couldn’t have at least given them familiar names and organization (“File, Edit, View,” etc.).

Maybe I’m too “old school.” Maybe I’m a “dinosaur” or a “curmudgeon.” Some have made the valid argument that this interface may in fact be more intuitive to a new user who’s not familiar with the older versions of Word, Excel and the rest (yes, PowerPoint and Outlook are the Professor and Mary Ann of Office). But I have to ask this: how many people who are going to be using this really have never used Word (or for that matter, a computer with a GUI) before? And even if they haven’t, is an interface that assaults the new user with no less than sixty-one (according to my count in the above screenshot) buttons, tabs, or other clickable thingamabobbers, really going to instill in them a sense of ease, comfort and self-confidence at the keyboard?

But the ironic beauty (for us Apple fanboys) of this new interface is more than skin deep. For me, the most, erm, (I’ll use the word again) stunning thing about the interface is the magical, shiny, multi-colored and oh-so-enticing (yet strangely off-putting) Office button in the upper left corner, which not only beckons to you like a mercury-flavored Spree in this screenshot, but in fact pulses (yeah, that effect was cool in 2001) to the point of literally begging you to click it.

Go ahead. Click it.

But only click it once. For if you click it once, it spreads before you the most wondrous, the most essential (and for that matter, just about the only) menu in the entire application, containing options for opening, saving, printing and whatnot.

Click it twice, though, and guess what. No really, come on. Take a wild guess. That’s right, it closes the program. Brilliant! That’s really taking the novice into consideration. If there’s one thing I know about novice computer users, it’s that they don’t understand the difference between a single click and a double click. In fact, it seems the human brain must be hardwired to intuitively grasp that any quick poking motion with a finger should be done twice in rapid succession, and it’s only through years of experience with a computer that the tech savvy among us have trained ourselves out of this habit. Why else would so many websites (the first realm in computing that so boldly ventured into the netherworld of the single mouse click) have to plaster their pages with warnings not to click “Submit” buttons twice, lest should send you a duplicate copy of The Birds in My Life. (For the record, I found that particular item by going to Amazon and typing “stuff old clueless people like” into the search box.)

Now where was I? Oh yeah… my desktop. Because that’s what I’m looking at now that I accidentally double-clicked the mercury Spree. I assume that button is intended to be the Office counterpart to the new Start menu icon in Windows Vista. I have yet to use Windows Vista, or even to encounter a computer that has it installed. Nor have I yet talked to anyone who’s actually purchased it or a computer that came with it, but I’d guess that’s mainly because I don’t know anyone like this guy:

A typical Windows Vista user

Yes, that guy was in a picture on this page. I went to Microsoft’s website, looking for information about Windows Vista, and the first human face I encountered was that of Andy Samberg‘s stoner (or would it be “stoner-er”?) little brother.

OK, well… I don’t really know how to wrap this up. It’s almost 2 AM and I’m spent. I might go weeks minutes before I can find anything more to criticize about Microsoft. But don’t worry, when I do, you’ll be the first to know.