When switching servers breaks code: a WordPress mystery

Earlier this week I launched a brand new WordPress site for a long-time client. Break out the champagne! But of course it’s never that simple, is it?

The client’s live server is a newly configured VPS running Ubuntu 16.04 LTS and PHP 7.0; meanwhile, our staging server is still chugging away on Ubuntu 14.04 LTS and PHP 5.5. So, clearly, a difference there. But I was pleased to find that, for the most part, the site functions perfectly on the new server.

But then the client discovered a problem: on one page, content from a custom post type query wasn’t displaying.

Here’s a short version of the pertinent code:

$people = new WP_Query(array(
  ‘order’ => ‘ASC’,
  ‘orderby’ => ‘menu_order’,
  ‘posts_per_page’ => -1,
  ‘post_type’ => ‘person’,
));

if ($people->have_posts()) {
  while ($people->have_posts()) {
    $people->the_post();
    ?>
    <article>

      <header><h2><?php the_title(); ?></h2></header>
      <div><?php the_content(); ?></div>

    </article>
    <?php
  }
}

Strangely, the_title() was working fine, but the_content() wasn’t. It had been — still is, in fact — working on our staging server, all other things within the WordPress context being equal. (Identical, up-to-date versions of the theme files and all plugins, and WP itself.) And the client confirmed that the content was present in WP admin.

I found, confusingly, that get_the_content() works, even though the_content() doesn’t. But of course you don’t get all of the proper formatting (like paragraph breaks) without some WP filters that the_content() applies, so I tried this:

<?php echo apply_filters(‘the_content’, get_the_content()); ?>

Still didn’t work. After a bit more research I was reminded that the pertinent function that filter runs is wpautop(), so I just called that directly:

<?php echo wpautop(get_the_content()); ?>

Now I have the content displaying nicely, but this is clumsy and I really do not get what might be different. I know the new server is running PHP 7.0 and our staging server is running PHP 5.5… but I’m struggling to understand what kind of changes in PHP could cause this specific problem.

Since get_the_content() works, and the_content() doesn’t, the problem has to lie in something that’s happening with the filters on the_content(). Why? Because the_content() calls get_the_content() right up front. In fact, there’s not a lot to the_content() at all. This function lives in wp-includes/post-template.php (beginning at line 230 in WP 4.6). Here it is in its entirety (reformatted slightly for presentation here):

function the_content( $more_link_text = null, $strip_teaser = false) {
  $content = get_the_content( $more_link_text, $strip_teaser );

  /**
  * Filters the post content.
  *
  * @since 0.71
  *
  * @param string $content Content of the current post.
  */
  $content = apply_filters( ‘the_content’, $content );
  $content = str_replace( ‘]]>’, ‘]]>’, $content );
  echo $content;
}

As you can see, it’s really just 4 lines of actual code. It calls get_the_content() to retrieve the content, applies filters, does an obscure string replacement (which I think I understand but is not really pertinent here), and then echoes the results out to the page.

It’s pretty clear to me that the problem has to lie in one (or more) of the filters in the 'the_content' stack. I have to admit that even after years of working with it, I only have a rather nebulous understanding of how hooks work, so I’m not even sure where to begin dissecting the filter stack here to pinpoint the source of the trouble.

Whenever I know something works in one place and doesn’t work in another place, the first course of action in troubleshooting is to try to identify all of the differences between the two environments. Obviously we have some big differences here as I noted at the top of this post. But I am going to assume that the problem does not lie at the OS layer. Most likely it’s either a difference between PHP 5.5 and 7.0, or, even more likely, a difference between the PHP configurations on the two servers… specifically, modules that are or are not active. See my previous post on The Hierarchy of Coding Errors for my rationale here. Also keep in mind that I personally was responsible for installing LAMP on the server and configuring PHP, and it’s pretty obvious that we’re looking at the sysadmin equivalent of #1 or #2 in that list.

The next step, were I to care to pursue it much further (and if I didn’t have 200 other more important things to do, now that I have the problem “fixed”), would be to run phpinfo() on both servers and identify all of the differences.

That’s one possible path, at least. Another thing to consider is that the_content() actually is working just fine in other parts of the site, so maybe it would be worth digging into that WordPress filter stack first.

At this point, because as I said I have a few other more important things to work on, I will probably leave the mystery unresolved here. But I’d welcome any ideas from readers as to an explanation for all of this.


Update! I just couldn’t leave well enough alone, so a few minutes after I published this post, with the client’s permission, I restored the old version of the template, turned on WP_DEBUG and installed the Debug Bar plugin. Jackpot! Debug Bar returned the following error message when I was calling the_content(), but not when I had my “fixed” code in place:

Screen Shot 2016-09-01 at 9.24.16 AM

Well, how about that? As it turns out, the problem is due to a filter I myself had added, using a previously written function. (That’s #3 on the hierarchy list.) Combine that with deprecated functionality that was removed in PHP 7.0, and problem solved. And I even figured out why the problem is only occurring on this page and not site-wide… because my filter only runs if there’s an email address link in the content.

Rules, Rules, Rules

I think a lot about rules. I’m not a rigid stickler for rules. I believe a lot in taking rules in context. There are times rules matter, and times they really don’t. But I do think it’s important to understand the rules. There are two things to understand about rules: 1) that they exist to keep things running smoothly, and 2) that there is (or at least should be) a reason behind any good rule. Rules that have no clear, broadly agreeable purpose or that are difficult to follow should be reconsidered.

But a lot of rules are pretty simple. Like the rules of the road. And I think a lot about rules of the road, because I’m on the road a lot, in various ways — in a car, on a bike, or on foot.

The rules of the road are simple, but they don’t seem quite as simple to me living in Minneapolis in the 2010s as they did when I was a kid growing up in a small town in the 1980s. Back then, roads were for cars. The only people who biked were kids and adults who had lost their licenses for DUI. And you biked on the sidewalk.

People only walked with their dogs, and generally only in a 2-block radius of their house, the lone exception being the one Vietnam vet with untreated PTSD who refused to wear shoes or use the sidewalk. He could always be seen around town with his dreadlocks, blanket and bare feet, shuffling along in the boulevard grass. I hope things got better for him.

But I digress. That was the 1980s. In contemporary Minneapolis, everyone uses the roads for just about everything. And sometimes it gets messy. There are many places in the park areas of the city where there are three parallel strips of asphalt: a pedestrian path, a bike path, and the road. All clearly marked for their intended purpose. In most of these places, the road is very narrow — two lanes with no shoulder or parking lanes. But you get pedestrians on the bike path (why? who the hell knows?) and bikes on the road (why? to get away from the dumbass pedestrians! or because they think they’re in the Tour de France!) and things get tangled up.

Even when you’re on regular city streets, biking can be a hazardous endeavor. The rule (whether it’s codified as a city ordinance or just a gentle encouragement on road signs) is “SHARE THE ROAD”. But there are cars that nearly run bikers onto the curb, just as there are bikers who ride side-by-side at such a leisurely speed that I wonder how they keep their balance, backing up car traffic for blocks. SHARE THE ROAD goes both ways.

Me? I’m too scared as a biker to ride on major thoroughfares if I can possibly avoid it. I usually stick to those dedicated bike paths when I can. Otherwise, I try to ride on low-traffic residential streets, generally a block or two over from the major thoroughfare. It just seems much, much safer.

But you do encounter clueless drivers. Drivers who will stop for you at an intersection when they don’t have a stop sign and you (the biker) do, and are clearly coming to a stop. Or, drivers who will breeze right through their own stop signs even when you (the biker) have the right-of-way, either because they didn’t see you or because they live a block away and always breeze through that stop sign.

Yes… you may have guessed that I am not just speaking hypothetically here. Both of those situations in the previous paragraph have happened to me. In fact, they happened on the same street, one block apart. Obviously the latter situation (which happened last year) is far more dangerous, and it led to me braking so abruptly I nearly flipped my bike, followed by a loud string of profanity hurled in the semi-apologetic driver’s direction.

The former situation happened to me just this morning, and prompted today’s rant. I was approaching a stop sign, and slowing to a stop. To my left, a white SUV also came to a stop, even though they didn’t have a stop sign. Presumably they didn’t trust that I was going to stop, even though I was vigorously waving them on with my left arm as I braked with my right. Even though I came to a complete stop, got off my bike, and even more vigorously waved them on. Finally they did go and I got the satisfaction of having successfully enforced the rules. (Sort of. I mean, I didn’t actually give the universal “stop” hand signal. Yes, I broke a rule. But I figured my vigorous waving-on covered it.)

But that got me thinking about the rules themselves. You have the official rules of the road, which tell you that you stop at a stop sign and don’t stop when you don’t have one. Bikes are supposed to follow the same road rules as cars, with (as I recently learned) a few exceptions designed to facilitate faster movement, most notable being that it’s OK for a biker to run a red light, if they have first come to a complete stop and verified that there is no other traffic (cross traffic or oncoming left turns, for example). But I doubt a lot of drivers know this rule, and when they see bikers doing it, probably assume (like I would have before) that it’s one more biker breaking the rules.

Which gets me to the second kind of rules — the unwritten, unspoken rules that grow naturally from collective experience. There are so many bikers who completely ignore all of the rules of the road that many drivers either a) assume the worst out of any biker they encounter and exert excessive caution or b) hit the bikers. (Or, as happened last year, they lose their fucking minds and drive around hurling cinder blocks out their car windows.)

I feel like the situation I ran into today was due to the second type of rule. The driver of the white SUV has encountered enough unpredictable bikers — who are best known for their peculiarly selective blindness to red octagons — that they weren’t going to take any chances with me. So the fact that I did follow the rules and stop for a stop sign actually caused a problem. A minor problem, to be sure, but still enough that it lingered with me all morning. (I wonder what that driver is thinking about right now. Almost certainly not me. This is my affliction.)

So, we are living in a society where we have two types of rules: the official rules, and the unspoken ones. Often in direct conflict. Which rules take precedence? Sadly, as much as I want to live in a world where the official rules are logical, reasonable, fair to all, and easy to follow, I fear that we really live in a world where the official rules are so often inconsistent, incomprehensible, unjust or just simply a burden — not to mention out of touch with the realities of human behavior — that the unspoken rules become the ones that people actually follow.

So then what? Should I just give up on the official rules? Should I breeze through stop signs on my bike because “everyone else is doing it”? Should I stubbornly adhere to my way of doing things and get my dander up every time I have to frantically gesture at someone else to get them to accept their own right-of-way?

Or, should I just lighten the hell up?

In that spirit, I come to the third type of rules. The Rules.

The Rules is a tongue-in-cheek book of… rules… written by a former coworker and bandmate who is obsessed with cycling to a level I will never be. I ride a secondhand bike to get around town. I have become quite a fan of watching the Tour de France every July, in part just because there’s an app for it that I feel really does 21st century sportscasting right and I wish every sport were covered this way. But mostly because I enjoy seeing the French countryside, admiring the intensity and endurance of the riders, and, occasionally, moments like riders punching morons on the sidelines.

Anyway… forget about city ordinances or social norms. The real rules of cycling are another matter entirely. And far more entertaining than my rants will ever be.

Reflections on a particularly rough week for race relations in America in 2016

Note: I initially posted this on Facebook. But things on Facebook have a tendency to get lost in the noise. Better to also preserve it here in the musty silence of my blog.

Seeing some pretty extreme responses on social media from some white people in the wake of the past few days’ events. If I could say anything to white people who are scared and/or angry and/or, God forbid, arming themselves for a race war, it would be this:

Social justice is not a zero sum game. You don’t have to do worse for others to do better. To quote the late, great Paul Wellstone, “We all do better when we all do better.”

You may have a vision of what “America” is, or what an “American” is, and that vision may be a particular color. But Americans who aren’t white are still Americans, just as much as you are.

Black Americans who are reacting to their friends and relatives being gunned down by police at routine — far too routine for many of them — traffic stops have a right to be scared, and angry. But Black Lives Matter is not about revenge. It’s not about starting a war. It’s about JUSTICE. About bringing more PEACE to our streets, our cities, our country. It’s about the “American” in “African American.”

At least, that’s how I see it. And living in the city, I’m probably in much closer proximity to BLM than most white Americans who are themselves scared or angry right now. So please, don’t be. Stop. Listen. Think. Feel. Understand.

Our fellow Americans who weren’t born with the inherited privileges conferred by white skin are living under the burden our ancestors placed on them — a burden that we perpetuate every day that we don’t actively acknowledge and work to counteract it. Hear their voices. Amplify them. Don’t silence them.

And when something happens like the shootings in Dallas, wait for the facts. BLM is a peaceful movement. Dallas PD has a good relationship with BLM. Officers were there to PROTECT the BLM marchers. The shooters do not represent Black Lives Matter or its goals.

I could go on, but I’ve already spoken too long. But don’t stop listening. Seek out those voices that are demanding peace and justice and hear what they are saying. And I will continue trying to do the same.

Web developers: learn how to Google. If no one else has the same problem, the problem is you.

OK, maybe not you per se. This is not a judgment of your merits as a developer, or as a human being. But it does mean the problem is almost certainly something specific to the code you’ve written.

The Hierarchy of Coding Errors

If your code isn’t working, the source of the problem is one of the following, in order from most likely to least likely:

  1. New procedural code you’ve just written
  2. New object-oriented code you’ve just written*
  3. Custom functions or objects you built, but have used before
  4. Third-party/open source add-ons to the software platform you’re using (e.g. WordPress plugins)
  5. Standard functions or objects in the software platform (e.g. WordPress core)
  6. Public code libraries that are included in your chosen software platform (e.g. jQuery)
  7. Browser bugs
  8. OS bugs
  9. Internet protocol bugs
  10. Quantum fluctuations in the fabric of spacetime
  11. Gremlins

You may have guessed correctly at this point that this blog post is not just idle Friday afternoon musings. I’ve spent the majority of the day today troubleshooting a very strange issue with a website I’m currently building. I fixed the problem, but not after being forced to — once again — confront this humbling reality. If something’s not working, it’s probably your own fault. Especially if you’re the only person with the problem.

Googling the issue got me (almost) nowhere… which was the most obvious clue that it was my own fault

Aside from the natural human inclination to deflect blame, the tools we have for troubleshooting these types of problems are not necessarily well suited to forcing us to be honest with ourselves. It’s too easy to blame external forces.

Here’s my scenario. I found out last week while presenting work-in-progress to a client at their office that there was a JavaScript-related problem with the website. It only affected Internet Explorer (and Edge), which I had not yet tested the site in, and, weirdest of all, it didn’t always happen. I’d say maybe 10-20% of the time, the page loaded normally. But the rest of the time, it got an error.

Since this was only affecting one browser, my natural inclination was to start all the way at number 7 on the list, blaming Internet Explorer. But I’ve learned that as much as I want to blame it, issues with IE usually just shine a light on something in my own code that other browsers are more forgiving about. So it was time to walk backwards down the list. (Again… not really, but this is how it played out.)

The error that the browser reported was a “security problem” with jQuery Migrate. First I had to figure out what the hell jQuery Migrate was and why it was being loaded. (Turns out, it’s a place the jQuery team dumped deprecated code it pulled from version 1.9. It’s loaded by default by WordPress.)

With that in mind, this should be affecting every site I’ve built recently, since they’re all in WordPress. But it was only affecting this one site. So I had to try to narrow down where the problem exists. With WordPress, there are two main “variables” in the implementation: themes and plugins. When in doubt, try switching your theme and disabling the plugins you’re using. I started by disabling all of the plugins, one by one. No change. I found the error didn’t occur if I disabled Advanced Custom Fields, but that’s because half of the page didn’t load without it! (That’s another error on my part but let’s ignore that for now, shall we?)

OK, so it’s not a plugin. Next I swapped in the standard Twenty Sixteen theme in place of my custom theme. Not surprisingly, the error didn’t occur, but that didn’t help much because none of my Advanced Custom Fields content was in the pages. I still couldn’t rule out ACF as the culprit. But I tend to reuse field groups from site to site, so once again, if this were attributable to an ACF issue — even something specific to my field groups — it would’ve cropped up on another site.

So now I had little left to do but selectively comment out elements of the theme so I could narrow down where the problem was. (I make this all sound like a logical progression; in fact my debugging process is a lot more chaotic than this description — I actually did this commenting-out process haphazardly and repetitively throughout the afternoon.)

Eventually I pinpointed the troublesome block of code. Yes, it was #1 from the list. But as is usually the case with hard-to-diagnose problems, the complete picture here is that #1 included a combination of #3 and #5, which triggered an error message generated by #6, but only in the context of #7.

Yes. That’s what happened.

In the footer of the page, I had a link to the client’s email address. As is my standard (but by now probably outmoded) practice, I have a custom-built function I wrote years ago to obfuscate the email address by randomly converting most (but not all) of the characters in the string into HTML ampersand entities. My problem was not that function itself, which is tried and true. It’s that in this particular instance I called it on a string that included the mailto: pseudo-protocol, not just the email address itself.

I think the colon in mailto: is particularly significant to the problem, as evidenced by the fact that around 10-20% of the time the problem didn’t occur, and the page loaded normally. Since my obfuscation function randomly leaves characters in the string alone, that’s about how often the colon would have been kept untouched.

But even then, what difference should it make? Browsers decode those entity strings and can handle them in the href attribute of links just fine. However in this particular case I didn’t just use my obfuscation function. Without giving it much thought, in this particular site I had decided to wrap the obfuscated string in the standard WordPress esc_url() function. Trying to properly sanitize things, like a good developer. Right? Except — and I took a quick look at the source code to confirm it — there’s special handling in esc_url() for strings that don’t contain a colon. So the roughly 86% of the time that my string didn’t contain a colon, esc_url() was prepending http:// onto the string.

This situation was causing a particular piece of code in jQuery Migrate to barf… but only in Internet Explorer and Edge, for reasons I still don’t understand, but it has to do with how the different browsers handle security warnings in JavaScript. I found along the way (but before I had pinpointed the real problem) that if I commented out a particular segment of code in jQuery Migrate pertaining to the handling of selectors containing hashtags (see, the HTML ampersand entities again) I could get the page to load normally.

So, like I said, my newly written procedural code (#1), which itself included calls to both an existing custom function I wrote (#3) and a function baked into the WordPress core (#5), caused jQuery Migrate to issue an error (#6) but it was one that only a particular browser (Internet Explorer/Edge) cared to acknowledge (#7).

No wonder it took all afternoon to figure it out.

* The only reason I break out OO from procedural code is that OO has more structured patterns that are less likely to result in sloppy mistakes. Slightly.