Microsoft Word’s formatting garbage, quantified

Anyone who’s spent any amount of time working on the web dreads it: content delivered in Microsoft Word format. Word adds tons of formatting garbage that results in bloated files and messes up the presentation when content gets brought into HTML.

When Microsoft released Office 2007, they touted switching to an XML-based document format for all of the apps. But all XML is not created equal.

Case in point: I am currently working on a project that is going to involve receiving content for a number of web pages in a tabular form, either in Word or Excel format. A spreadsheet, essentially (if not technically), with each page represented by a row, and its text content in a cell. I will be writing a PHP script to parse the spreadsheet data and generate a set of HTML files with the content loaded in them.

I’m currently trying to determine if Word or Excel would be the better format to receive the content in, which involves opening up .xlsx and .docx files in BBEdit and looking at the raw data stored within them. I’ve managed to identify the embedded XML files in each that hold the actual content. These files store the same actual text content, but their XML schemas vary based on the needs of Word and Excel.

So… how do they match up? The XML file I pulled out of Excel is 14 KB. The one from Word is 202 KB. For the mathematically inclined amongst you, that’s a little more than 14 times larger. Yes… another (perhaps more hyperbolic) way you could say it is that the Word document is exponentially larger.

That’s just ridiculous.

What makes up the difference? Well, the Excel file’s XML is nothing but basic tags. There are no attributes on any of the tags, as far as I can tell. It’s pure semantic structure. The Word XML, on the other hand, is almost nothing but attributes. And there’s nothing smart about them either. Most of them are assigning fonts to the text. The same font names, over and over and over again throughout the file.

That’s… beyond ridiculous.

Light, pollution, memory

Light pollution

I remember the first time I ever observed light pollution. I didn’t know what it was, and I’m not sure it even had a name back then.

It was 1993. I was in college, and I was home for Easter. In fact it was early Easter morning. My uncle was staying with us, in my room, which was in the process of becoming the guest room. He always stayed in my room when he stayed with us. Eventually I would stay in that room, as a guest room, not my room, once I was no longer a resident of the house, but a guest.

At the time, though, I was not yet a guest, though no longer quite a resident. Nonetheless, he was visiting, so he got my room and I was relegated to the couch in the family room. The family room, which had been added on in 1987, when I was 13, had two skylights. One was directly above the couch, so when I was lying on the couch I could look directly up at the sky.

When I was growing up, cities, at least the small town in which I grew up (which I always thought of as a city, despite its modest population of 26,210 — which was no longer the population, but had been the population in the 1970 census, and the city could not yet bring itself to acknowledge the loss of over 10% of its population in the subsequent decades, so it still appeared on the signs as you drove into town) had not yet switched over to sodium-based street lights. However this particular small town/city had made the switch in the brief time since I had gone off to college at an even smaller town — one small enough that even I could make no pretense as to its being a “city.”

I awoke in the middle of the night. Technically, the early morning, Easter morning. It was overcast, and as I now know well, in a city illuminated by sodium streetlights on an overcast night, it is never truly dark, never truly nighttime. Instead, the best you get is an eerie orange twilight, which is what I observed for the first time in my life, that early Easter morning in 1993, 20 years ago.

It was perhaps 2 AM, and as I awoke, then arose, and walked to the kitchen to get a better view, I beheld the city aglow in an unnatural orange luminescence, and… well… it freaked the shit out of me. I had never seen anything like it, and I didn’t understand what could be causing it. Being Easter morning, and being highly impressionable, especially to my own half-lucid, half-dreamlike fantasies, I was sure Armageddon, or… something… was nigh.

Of course, it was not. And eventually I made the connection between the reference to sodium lights I’d heard on Sting’s The Soul Cages album with the eerie orange light, which has since become commonplace in my mostly urban adult life, where I am usually far too busy or distracted or just simply tired to bother to look up into the sky at night and think the kinds of existential, philosophical, cosmic, spiritual, infinite thoughts I used to dwell on so much between the ages of 5 and 22.

But tonight, for a brief moment, I lingered at my back door in south Minneapolis, with a glass of scotch in one hand and my iPhone in the other. On that late night/early Easter morning 20 years ago, I’m not sure which of the two would have seemed more out-of-place in my hands. Surely both would be just as out-of-place as apocalyptic paranoia in my 2013 brain. But still, the connection to that moment half a lifetime ago was there, and I was transported back to a place where I can stare into the sky at night, silently, and wonder.

Can a developer use an iPad as their only portable “computer”?

I am at a crossroads in my work situation. Since 2008 I have worked as a freelance web developer, which naturally meant using a laptop as my primary/only computer. I worked mostly from home, but I would frequently go to coffeehouses, and occasionally work on-site at client offices. A portable computer was a must.

The same week that Steve Jobs announced the 11-inch MacBook Air, I went out and purchased one. It was exactly what I wanted: a full-blown Mac, (almost) as small as an iPad (which of course I already owned as well, but mainly used for testing, occasional gaming, and watching all six seasons of Lost in the span of a month on Netflix, not for “real work”). I loved the MacBook Air. I said it was the best Mac I’d ever owned, though I admitted it was a tad underpowered. Enough so that when SLP needed a new computer 6 months later, I gave her my MacBook Air instead and bought myself a new, slightly more powerful version of the same.

That MacBook Air has been my only computer ever since. In fact, shortly after switching to it full-time, I wrote a glowing review of it right here. But last April I moved my business into a storefront studio space. I’m not going to coffeehouses anymore. Now, more often than not, clients come to me instead of the other way around. And all of this time I’ve been sitting at a desk, with that same 11-inch MacBook Air hooked up to an HP 23-inch LCD. (Yes, HP. I may be a self-proclaimed Apple fanboy, but even I can’t justify the expense of one of their Cinema Display monitors.)

It’s in this context that I’ve finally really become aware of the performance limitations of the 2010 MacBook Air. It’s unbearably slow with Adobe Creative Suite apps. It’s even unbearably slow running Panic’s Coda. And no computer today should choke up on what is essentially a glorified text editor. (That said… As much as I love Coda, it does seem bloated and slow almost everywhere I’ve used it. There’s no comparison to the blazing speed of BBEdit, which I also love, but Coda has some features I prefer, so it remains my main coding tool.)

Over the past few months, as my workload has increased and my patience has diminished, I can no longer pretend that the 2010 MacBook Air’s performance is adequate for my needs. I know the 2012 Airs are at least 3 times faster than the one I have, and I’m sure this year’s will be even faster, and maybe even have a Retina display, and therein lies the problem: I’ve been desperately wanting to upgrade my Mac, but I couldn’t bring myself to buy one of the current 11-inch Airs (the only portable I will consider) when they’re getting so close to a refresh.

At the same time, I have a major crunch at work over the next 3 months. I couldn’t afford to wait on my creaky old Air anymore. So last weekend I settled on a compromise, borne of the fact that I almost never touch my MacBook Air outside of the studio anymore. I got a Mac mini for the studio. I went with the more powerful quad-core i7 model, which is rated on Geekbench as at least 6 times faster than my old MacBook Air, and almost twice as fast as the current ones.

I’ve already noticed a huge difference. Adobe Creative Suite is way faster, almost to the point of no longer being infuriating. (But that’s another story.) Coda is still occasionally sluggish, but that may have more to do with the fact that I’m working with files on our local file server over a questionable WiFi connection. I should try putting the files directly on my hard drive to see if it makes a difference.

But now I am faced with a weird dilemma. This is the first desktop computer I’ve owned since the Dell I had back in 2001, and the first Mac desktop since even before that. (It was a Bondi blue G3 tower, if you were wondering.) The dilemma is this: in a world of iPads, where I am already pretty much never touching my MacBook Air outside of work, do I really need a portable Mac at all?

I still have the Air, of course, and have continued to lug it around next to my iPad in my Tom Bihn bag this week. But why? In the two meetings I had this week at client offices, I only used my iPad. Maybe the iPad is really all I need. Maybe?

I have a few months to find out. I won’t consider buying another MacBook Air until the new models are out, so in the meantime I will experiment. I will try only using the iPad for any and all computing tasks outside of the studio. I’ve begun that today, by typing this blog post on it as I sit at the kitchen counter with my Saturday morning coffee. It’s been a bit of a challenge. I gave up on using the WordPress web interface and switched to the (marginally better) dedicated iPad app. And I’ve made lots of typos… some that iOS auto-corrected, some it didn’t, and some false positives it shouldn’t have. (C’mon, iPad… use some context, would ya? Why would anyone ever write “you we’re”?)

The biggest challenge will be if I have to write some actual code. But it’s a far different world for that than it was even a year ago. I have a handful of coding apps on my iPad, though nothing I have could be more valuable than another pair of apps from Panic: Diet Coda (great name, BTW) and Prompt, a terminal app. I haven’t had much call to use either of them… yet. But I’ve been comforted knowing they’re there.

At the end of this month we’re planning a family vacation to Utah. That may prove to be the ultimate test. Do I dare leave for a week with only my iPad? Honestly, I’m not sure I can. It will depend on the state of my various work projects at that time. But I’d like to be able to give it a try.

I’ll post follow-ups here as the experiment continues.

Building a centered gallery grid with flexible column count for responsive web pages

It took an untold number of fruitless Google searches and a couple of hours of trial and error to get this to work. I think part of the problem may have been that I simply didn’t really know how to describe what I’m trying to do in a way that would yield good search results. And so, I hope now that I have a solution, sharing it here might help someone else.

The situation: I have a web page that contains a gallery of square images. The page is responsive but the sizes of the images are fixed. I want the page to automatically show as many images across as will fit in the layout on any particular screen, creating anywhere from one to five columns as needed. And, it needs to stay centered.

I got all of this going pretty easily… all except the “it has to stay centered” part. I was able to get it to work if there was only a single line of images, but as soon as they wrapped to multiple lines, the container element went to a full width and the images became left-aligned. It took considerable effort to discover a solution, although that solution itself is embarrassingly simple. I was hung up on a couple of possible approaches that got me nowhere, which probably contributed to the problems I had finding the right way to do it.

So… here we go. We’ll start with an unordered list:

<ul class="gallery">
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
  <li><img src="image.jpg" alt="" /></li>
</ul>

And here is that embarrassingly simple CSS:

ul.gallery {
  text-align: center;
}

ul.gallery li {
  display: inline-block;
}

OK, that’s not really all of the CSS. Your li tag needs height and width properties, and you may want to give it margin as well. But those values are going to be specific to your project.

TinyMCE and the non-breaking space problem

Let’s get right to it then: TinyMCE is great, but I am annoyed by its willingness to take users’ multiple spaces literally! Collapsing multiple spaces is a basic characteristic of HTML, and allowing users to carelessly (or intentionally, which they still shouldn’t do) insert multiple spaces by converting every other one of those spaces into the &nbsp; (non-breaking space) character is BAD!

IMHO.

Anyway… start with a pinch of Stack Overflow, add a dash of the official TinyMCE documentation, along with a heaping tablespoon of reading between the lines, and I have a working solution to the problem. My installation of TinyMCE now automatically converts any &nbsp; characters in the text back into regular ol’ spaces.

It’s a bit draconian; after all there are legitimate uses for non-breaking spaces. But 95% of the times they’re inserted by TinyMCE are user accidents, and another 4.9% of those times are abuses like faked “tabs” that would be solved better by another approach altogether. (There are reasonable CSS-based solutions that work in some cases, but let’s talk HTML’s need for tabs another time.)

Anyway… here’s the gist of the solution. You need to create a callback function. Here’s mine:

function my_cleanup_callback(type,value) {
  switch (type) {
    case 'get_from_editor':
      // Remove &nbsp; characters
      value = value.replace(/&nbsp;/ig, ' ');
      break;
    case 'insert_to_editor':
    case 'submit_content':
    case 'get_from_editor_dom':
    case 'insert_to_editor_dom':
    case 'setup_content_dom':
    case 'submit_content_dom':
    default:
      break;
  }
  return value;
}

It may look like there’s a lot of extra stuff in here you don’t need; I included all possible values for type inside the switch to be prepared for the future. You do want to check for type == 'get_from_editor' though; otherwise your replace() is going to run under way too many conditions and may cause weird behavior like new paragraphs appearing when you just want to insert new text into an existing one, or browser-generated warnings about leaving the page when you try to save. (I ran into both as I was fine-tuning this.)

Now that you have your callback function, you just need to… you know… call it. That’s done inside tinyMCE.init(). You’ll need to include this line somewhere:

cleanup_callback: 'my_cleanup_callback',

Be sure to check if cleanup_callback is already declared somewhere, and also don’t forget the comma at the end, unless you’re inserting this as the last line.

Once you’ve got it all rolled out to your site, you’ll need to clear your cache. I’ve found TinyMCE’s configuration files can be annoyingly persistent in the browser cache.

Yes… you have correctly observed that I had to use non-breaking spaces myself in this post, to get the indents in the code samples to show. Pay no attention to the man behind the curtain. And remember my complaint about the lack of tab characters in HTML. Another day.