Microsoft Word’s formatting garbage, quantified

Anyone who’s spent any amount of time working on the web dreads it: content delivered in Microsoft Word format. Word adds tons of formatting garbage that results in bloated files and messes up the presentation when content gets brought into HTML.

When Microsoft released Office 2007, they touted switching to an XML-based document format for all of the apps. But all XML is not created equal.

Case in point: I am currently working on a project that is going to involve receiving content for a number of web pages in a tabular form, either in Word or Excel format. A spreadsheet, essentially (if not technically), with each page represented by a row, and its text content in a cell. I will be writing a PHP script to parse the spreadsheet data and generate a set of HTML files with the content loaded in them.

I’m currently trying to determine if Word or Excel would be the better format to receive the content in, which involves opening up .xlsx and .docx files in BBEdit and looking at the raw data stored within them. I’ve managed to identify the embedded XML files in each that hold the actual content. These files store the same actual text content, but their XML schemas vary based on the needs of Word and Excel.

So… how do they match up? The XML file I pulled out of Excel is 14 KB. The one from Word is 202 KB. For the mathematically inclined amongst you, that’s a little more than 14 times larger. Yes… another (perhaps more hyperbolic) way you could say it is that the Word document is exponentially larger.

That’s just ridiculous.

What makes up the difference? Well, the Excel file’s XML is nothing but basic tags. There are no attributes on any of the tags, as far as I can tell. It’s pure semantic structure. The Word XML, on the other hand, is almost nothing but attributes. And there’s nothing smart about them either. Most of them are assigning fonts to the text. The same font names, over and over and over again throughout the file.

That’s… beyond ridiculous.

Scene of the crime

Here are a couple of photos I took of my office/studio for the RPM Challenge, and I figured I might as well post them here too, so you can get a glimpse into the world where the magic happens. Well, where… something happens.

Bringing pleasure to computerized machines

Automated Postal CenterIf you ever visit a large and/or busy post office, you may have seen one of the US Postal Service’s latest advances in self-service technology, the Automated Postal Center.

The post office near my downtown office building has one of these, and I love it. I use it every chance I get. Not to slight the job performance of postal workers (never cross a postal worker), but I find these machines to be faster and more efficient than going to the window, plus there’s almost never any line. Granted, maybe someday when everyone learns to love technology as much as I do (fax machines and photocopiers excluded), things will change, but for now I can usually just walk right up, take care of my business, and move on.

But there’s something about these machines I don’t like: the illogically friendly, human tone of the on-screen text, especially at the conclusion of the transaction:

Thanks. It’s been a pleasure serving you.

Really? Has it? Can a machine derive pleasure from anything? And if so, from serving me? Well, I suppose we do want our sentient utilitarian devices to be as servile as possible. But we’re not there yet. Some human wrote the computer program that operates this equipment, and they put that string of text into it. Who are they fooling? And why are those people being allowed out in public?

Wouldn’t “Thank you for your business” have sufficed? I’d feel a lot more comfortable with that.

Giving Microsoft a ribbin’ over the ribbon

OK, that was an incredibly lame title; I guess I’ve just read too many headline puns in Entertainment Weekly over the years.

Anyway, I’d like to take a moment out of my ongoing obsession with translucent menu bars and open source operating systems (OSOSes?) and turn to the “dark side,” if you will. (That’d be Microsoft.)

A few weeks ago I took a training course for work. The course was not actually on Office 2007, but the computers in the training room were equipped with it, and it did come into play a few times. This was my first exposure to this version of Office, and needless to say I was stunned (and not necessarily in a good way) by the radically altered user interface.

I wouldn’t say I have any kind of unhealthy attachment to the lowly menu bar, but it is, after all, one of the cornerstones of a graphical user interface. I suppose being a Mac user has an effect on my sense of its importance, since it is ever-present at the top of the screen. I do think the Windows approach, where the menus are integrated into the application window, makes more sense and is — perhaps (gasp!) — more intuitive for novice users. But regardless of where it is, in most applications it just needs to be there, and without it I’m as lost as I’d be if I were looking not at a computer screen but at the inscrutable LCD display of a photocopier or a fax machine. (Have I ever mentioned how much I hate photocopiers and fax machines?)

If you’ve not yet seen Office 2007, you may not understand where I’m going with this, but, yes… it’s true… the menu bar is gone — GONE!!! — in all Office 2007 applications. Instead, you have… this:

Microsoft Word 2007 ribbon

Credit where credit is due (so Microsoft will not sue, since this image is surely copyrighted), I swiped this screenshot from here.

Maybe it’s just the effect of Steve Ballmer‘s voice ringing incessantly in the ears of their developers, but Microsoft actually has the audacity to suggest that this “ribbon” reduces clutter. Never mind the fact that you likely will have no idea where your formerly familiar menu options have gotten off to in this sea of buttons. And do not for a moment ask yourself why, if the tabbed ribbons have replaced the menus, they couldn’t have at least given them familiar names and organization (“File, Edit, View,” etc.).

Maybe I’m too “old school.” Maybe I’m a “dinosaur” or a “curmudgeon.” Some have made the valid argument that this interface may in fact be more intuitive to a new user who’s not familiar with the older versions of Word, Excel and the rest (yes, PowerPoint and Outlook are the Professor and Mary Ann of Office). But I have to ask this: how many people who are going to be using this really have never used Word (or for that matter, a computer with a GUI) before? And even if they haven’t, is an interface that assaults the new user with no less than sixty-one (according to my count in the above screenshot) buttons, tabs, or other clickable thingamabobbers, really going to instill in them a sense of ease, comfort and self-confidence at the keyboard?

But the ironic beauty (for us Apple fanboys) of this new interface is more than skin deep. For me, the most, erm, (I’ll use the word again) stunning thing about the interface is the magical, shiny, multi-colored and oh-so-enticing (yet strangely off-putting) Office button in the upper left corner, which not only beckons to you like a mercury-flavored Spree in this screenshot, but in fact pulses (yeah, that effect was cool in 2001) to the point of literally begging you to click it.

Go ahead. Click it.

But only click it once. For if you click it once, it spreads before you the most wondrous, the most essential (and for that matter, just about the only) menu in the entire application, containing options for opening, saving, printing and whatnot.

Click it twice, though, and guess what. No really, come on. Take a wild guess. That’s right, it closes the program. Brilliant! That’s really taking the novice into consideration. If there’s one thing I know about novice computer users, it’s that they don’t understand the difference between a single click and a double click. In fact, it seems the human brain must be hardwired to intuitively grasp that any quick poking motion with a finger should be done twice in rapid succession, and it’s only through years of experience with a computer that the tech savvy among us have trained ourselves out of this habit. Why else would so many websites (the first realm in computing that so boldly ventured into the netherworld of the single mouse click) have to plaster their pages with warnings not to click “Submit” buttons twice, lest Amazon.com should send you a duplicate copy of The Birds in My Life. (For the record, I found that particular item by going to Amazon and typing “stuff old clueless people like” into the search box.)

Now where was I? Oh yeah… my desktop. Because that’s what I’m looking at now that I accidentally double-clicked the mercury Spree. I assume that button is intended to be the Office counterpart to the new Start menu icon in Windows Vista. I have yet to use Windows Vista, or even to encounter a computer that has it installed. Nor have I yet talked to anyone who’s actually purchased it or a computer that came with it, but I’d guess that’s mainly because I don’t know anyone like this guy:

A typical Windows Vista user

Yes, that guy was in a picture on this page. I went to Microsoft’s website, looking for information about Windows Vista, and the first human face I encountered was that of Andy Samberg‘s stoner (or would it be “stoner-er”?) little brother.

OK, well… I don’t really know how to wrap this up. It’s almost 2 AM and I’m spent. I might go weeks minutes before I can find anything more to criticize about Microsoft. But don’t worry, when I do, you’ll be the first to know.

Entropy

6:00 AM. The strident shriek of my alarm clock jolts me awake.

I slap the snooze button.

6:09 AM. Another shriek. Another slap.

6:18 AM. I put the clock and myself out of our collective misery and stumble to the bathroom.

Less than 1% of the water on Earth is considered “fresh,” which is to say it is not seawater. A far smaller fraction of that so-called “fresh” water is actually potable. I crank the faucet on the shower and ease myself under the steam and hot spray. Several gallons of pure, drinkable, truly fresh water mix with soap suds and a day’s worth of human sweat and oil, and swirl in a clockwise motion (the Coreolis Effect being, at this magnitude, a misunderstood non-phenomenon) down the drain. Into the sewer system. Into the next phase of their existence as part of that 99%+ of the world’s non-potable water.

I dry myself off, get dressed, fill my Thermos, and walk to the car. I turn the key, hear the engine roar. Its pistons fire, burning a highly-refined form of petroleum that was once, millions of years ago, the flesh and substance of untold species of flora and fauna. They lived their lives, died, decomposed, were covered over by the decomposed substance of their progeny, subsumed beneath the surface, compressed over the eons, turning to a mysterious black liquid that one day would become more valuable than gold to a species that did not yet exist. A substance that would generate untold wealth and wars, things that also did not yet exist.

A gallon of this refined liquid, formed over the millennia, transports me in comfort and — barring an unexpected collision with an SUV, the playground bully of the Interstate highway — safety from home to office.

8:30 AM. I turn the key, open the door, and walk to my desk. I sit down in front of a box of metal and plastic, a precision device, assembled in Mexico by laborers whose annual wages might… perhaps… allow them to afford one of these devices themselves, were it not for more basic needs such as food and shelter.

This box is already obsolete, and those laborers are hard at work even now as I sit at my desk, assembling the latest replacement units that will themselves pass with great haste into obsolescence, soon to find their permanent (for the next several tens of thousands of years, anyway) home in a landfill, next to the mounds of paper towels I used to dry my hands in the office lavatory and the styrofoam container and waxed-paper cup from my lunch today and eventually the larger box of metal and plastic as well, the one with 4 wheels, which burns 2 gallons of that refined liquid daily to transport me to this office and back home again.