OK, maybe not you per se. This is not a judgment of your merits as a developer, or as a human being. But it does mean the problem is almost certainly something specific to the code you’ve written.
The Hierarchy of Coding Errors
If your code isn’t working, the source of the problem is one of the following, in order from most likely to least likely:
- New procedural code you’ve just written
- New object-oriented code you’ve just written*
- Custom functions or objects you built, but have used before
- Third-party/open source add-ons to the software platform you’re using (e.g. WordPress plugins)
- Standard functions or objects in the software platform (e.g. WordPress core)
- Public code libraries that are included in your chosen software platform (e.g. jQuery)
- Browser bugs
- OS bugs
- Internet protocol bugs
- Quantum fluctuations in the fabric of spacetime
- Gremlins
You may have guessed correctly at this point that this blog post is not just idle Friday afternoon musings. I’ve spent the majority of the day today troubleshooting a very strange issue with a website I’m currently building. I fixed the problem, but not after being forced to — once again — confront this humbling reality. If something’s not working, it’s probably your own fault. Especially if you’re the only person with the problem.
Googling the issue got me (almost) nowhere… which was the most obvious clue that it was my own fault
Aside from the natural human inclination to deflect blame, the tools we have for troubleshooting these types of problems are not necessarily well suited to forcing us to be honest with ourselves. It’s too easy to blame external forces.
Here’s my scenario. I found out last week while presenting work-in-progress to a client at their office that there was a JavaScript-related problem with the website. It only affected Internet Explorer (and Edge), which I had not yet tested the site in, and, weirdest of all, it didn’t always happen. I’d say maybe 10-20% of the time, the page loaded normally. But the rest of the time, it got an error.
Since this was only affecting one browser, my natural inclination was to start all the way at number 7 on the list, blaming Internet Explorer. But I’ve learned that as much as I want to blame it, issues with IE usually just shine a light on something in my own code that other browsers are more forgiving about. So it was time to walk backwards down the list. (Again… not really, but this is how it played out.)
The error that the browser reported was a “security problem” with jQuery Migrate. First I had to figure out what the hell jQuery Migrate was and why it was being loaded. (Turns out, it’s a place the jQuery team dumped deprecated code it pulled from version 1.9. It’s loaded by default by WordPress.)
With that in mind, this should be affecting every site I’ve built recently, since they’re all in WordPress. But it was only affecting this one site. So I had to try to narrow down where the problem exists. With WordPress, there are two main “variables” in the implementation: themes and plugins. When in doubt, try switching your theme and disabling the plugins you’re using. I started by disabling all of the plugins, one by one. No change. I found the error didn’t occur if I disabled Advanced Custom Fields, but that’s because half of the page didn’t load without it! (That’s another error on my part but let’s ignore that for now, shall we?)
OK, so it’s not a plugin. Next I swapped in the standard Twenty Sixteen theme in place of my custom theme. Not surprisingly, the error didn’t occur, but that didn’t help much because none of my Advanced Custom Fields content was in the pages. I still couldn’t rule out ACF as the culprit. But I tend to reuse field groups from site to site, so once again, if this were attributable to an ACF issue — even something specific to my field groups — it would’ve cropped up on another site.
So now I had little left to do but selectively comment out elements of the theme so I could narrow down where the problem was. (I make this all sound like a logical progression; in fact my debugging process is a lot more chaotic than this description — I actually did this commenting-out process haphazardly and repetitively throughout the afternoon.)
Eventually I pinpointed the troublesome block of code. Yes, it was #1 from the list. But as is usually the case with hard-to-diagnose problems, the complete picture here is that #1 included a combination of #3 and #5, which triggered an error message generated by #6, but only in the context of #7.
Yes. That’s what happened.
In the footer of the page, I had a link to the client’s email address. As is my standard (but by now probably outmoded) practice, I have a custom-built function I wrote years ago to obfuscate the email address by randomly converting most (but not all) of the characters in the string into HTML ampersand entities. My problem was not that function itself, which is tried and true. It’s that in this particular instance I called it on a string that included the mailto:
pseudo-protocol, not just the email address itself.
I think the colon in mailto:
is particularly significant to the problem, as evidenced by the fact that around 10-20% of the time the problem didn’t occur, and the page loaded normally. Since my obfuscation function randomly leaves characters in the string alone, that’s about how often the colon would have been kept untouched.
But even then, what difference should it make? Browsers decode those entity strings and can handle them in the href
attribute of links just fine. However in this particular case I didn’t just use my obfuscation function. Without giving it much thought, in this particular site I had decided to wrap the obfuscated string in the standard WordPress esc_url()
function. Trying to properly sanitize things, like a good developer. Right? Except — and I took a quick look at the source code to confirm it — there’s special handling in esc_url()
for strings that don’t contain a colon. So the roughly 86% of the time that my string didn’t contain a colon, esc_url()
was prepending http://
onto the string.
This situation was causing a particular piece of code in jQuery Migrate to barf… but only in Internet Explorer and Edge, for reasons I still don’t understand, but it has to do with how the different browsers handle security warnings in JavaScript. I found along the way (but before I had pinpointed the real problem) that if I commented out a particular segment of code in jQuery Migrate pertaining to the handling of selectors containing hashtags (see, the HTML ampersand entities again) I could get the page to load normally.
So, like I said, my newly written procedural code (#1), which itself included calls to both an existing custom function I wrote (#3) and a function baked into the WordPress core (#5), caused jQuery Migrate to issue an error (#6) but it was one that only a particular browser (Internet Explorer/Edge) cared to acknowledge (#7).
No wonder it took all afternoon to figure it out.
* The only reason I break out OO from procedural code is that OO has more structured patterns that are less likely to result in sloppy mistakes. Slightly.