When switching servers breaks code: a WordPress mystery

Earlier this week I launched a brand new WordPress site for a long-time client. Break out the champagne! But of course it’s never that simple, is it?

The client’s live server is a newly configured VPS running Ubuntu 16.04 LTS and PHP 7.0; meanwhile, our staging server is still chugging away on Ubuntu 14.04 LTS and PHP 5.5. So, clearly, a difference there. But I was pleased to find that, for the most part, the site functions perfectly on the new server.

But then the client discovered a problem: on one page, content from a custom post type query wasn’t displaying.

Here’s a short version of the pertinent code:

$people = new WP_Query(array(
  ‘order’ => ‘ASC’,
  ‘orderby’ => ‘menu_order’,
  ‘posts_per_page’ => -1,
  ‘post_type’ => ‘person’,
));

if ($people->have_posts()) {
  while ($people->have_posts()) {
    $people->the_post();
    ?>
    <article>

      <header><h2><?php the_title(); ?></h2></header>
      <div><?php the_content(); ?></div>

    </article>
    <?php
  }
}

Strangely, the_title() was working fine, but the_content() wasn’t. It had been — still is, in fact — working on our staging server, all other things within the WordPress context being equal. (Identical, up-to-date versions of the theme files and all plugins, and WP itself.) And the client confirmed that the content was present in WP admin.

I found, confusingly, that get_the_content() works, even though the_content() doesn’t. But of course you don’t get all of the proper formatting (like paragraph breaks) without some WP filters that the_content() applies, so I tried this:

<?php echo apply_filters(‘the_content’, get_the_content()); ?>

Still didn’t work. After a bit more research I was reminded that the pertinent function that filter runs is wpautop(), so I just called that directly:

<?php echo wpautop(get_the_content()); ?>

Now I have the content displaying nicely, but this is clumsy and I really do not get what might be different. I know the new server is running PHP 7.0 and our staging server is running PHP 5.5… but I’m struggling to understand what kind of changes in PHP could cause this specific problem.

Since get_the_content() works, and the_content() doesn’t, the problem has to lie in something that’s happening with the filters on the_content(). Why? Because the_content() calls get_the_content() right up front. In fact, there’s not a lot to the_content() at all. This function lives in wp-includes/post-template.php (beginning at line 230 in WP 4.6). Here it is in its entirety (reformatted slightly for presentation here):

function the_content( $more_link_text = null, $strip_teaser = false) {
  $content = get_the_content( $more_link_text, $strip_teaser );

  /**
  * Filters the post content.
  *
  * @since 0.71
  *
  * @param string $content Content of the current post.
  */
  $content = apply_filters( ‘the_content’, $content );
  $content = str_replace( ‘]]>’, ‘]]>’, $content );
  echo $content;
}

As you can see, it’s really just 4 lines of actual code. It calls get_the_content() to retrieve the content, applies filters, does an obscure string replacement (which I think I understand but is not really pertinent here), and then echoes the results out to the page.

It’s pretty clear to me that the problem has to lie in one (or more) of the filters in the 'the_content' stack. I have to admit that even after years of working with it, I only have a rather nebulous understanding of how hooks work, so I’m not even sure where to begin dissecting the filter stack here to pinpoint the source of the trouble.

Whenever I know something works in one place and doesn’t work in another place, the first course of action in troubleshooting is to try to identify all of the differences between the two environments. Obviously we have some big differences here as I noted at the top of this post. But I am going to assume that the problem does not lie at the OS layer. Most likely it’s either a difference between PHP 5.5 and 7.0, or, even more likely, a difference between the PHP configurations on the two servers… specifically, modules that are or are not active. See my previous post on The Hierarchy of Coding Errors for my rationale here. Also keep in mind that I personally was responsible for installing LAMP on the server and configuring PHP, and it’s pretty obvious that we’re looking at the sysadmin equivalent of #1 or #2 in that list.

The next step, were I to care to pursue it much further (and if I didn’t have 200 other more important things to do, now that I have the problem “fixed”), would be to run phpinfo() on both servers and identify all of the differences.

That’s one possible path, at least. Another thing to consider is that the_content() actually is working just fine in other parts of the site, so maybe it would be worth digging into that WordPress filter stack first.

At this point, because as I said I have a few other more important things to work on, I will probably leave the mystery unresolved here. But I’d welcome any ideas from readers as to an explanation for all of this.


Update! I just couldn’t leave well enough alone, so a few minutes after I published this post, with the client’s permission, I restored the old version of the template, turned on WP_DEBUG and installed the Debug Bar plugin. Jackpot! Debug Bar returned the following error message when I was calling the_content(), but not when I had my “fixed” code in place:

Screen Shot 2016-09-01 at 9.24.16 AM

Well, how about that? As it turns out, the problem is due to a filter I myself had added, using a previously written function. (That’s #3 on the hierarchy list.) Combine that with deprecated functionality that was removed in PHP 7.0, and problem solved. And I even figured out why the problem is only occurring on this page and not site-wide… because my filter only runs if there’s an email address link in the content.

Web developers: learn how to Google. If no one else has the same problem, the problem is you.

OK, maybe not you per se. This is not a judgment of your merits as a developer, or as a human being. But it does mean the problem is almost certainly something specific to the code you’ve written.

The Hierarchy of Coding Errors

If your code isn’t working, the source of the problem is one of the following, in order from most likely to least likely:

  1. New procedural code you’ve just written
  2. New object-oriented code you’ve just written*
  3. Custom functions or objects you built, but have used before
  4. Third-party/open source add-ons to the software platform you’re using (e.g. WordPress plugins)
  5. Standard functions or objects in the software platform (e.g. WordPress core)
  6. Public code libraries that are included in your chosen software platform (e.g. jQuery)
  7. Browser bugs
  8. OS bugs
  9. Internet protocol bugs
  10. Quantum fluctuations in the fabric of spacetime
  11. Gremlins

You may have guessed correctly at this point that this blog post is not just idle Friday afternoon musings. I’ve spent the majority of the day today troubleshooting a very strange issue with a website I’m currently building. I fixed the problem, but not after being forced to — once again — confront this humbling reality. If something’s not working, it’s probably your own fault. Especially if you’re the only person with the problem.

Googling the issue got me (almost) nowhere… which was the most obvious clue that it was my own fault

Aside from the natural human inclination to deflect blame, the tools we have for troubleshooting these types of problems are not necessarily well suited to forcing us to be honest with ourselves. It’s too easy to blame external forces.

Here’s my scenario. I found out last week while presenting work-in-progress to a client at their office that there was a JavaScript-related problem with the website. It only affected Internet Explorer (and Edge), which I had not yet tested the site in, and, weirdest of all, it didn’t always happen. I’d say maybe 10-20% of the time, the page loaded normally. But the rest of the time, it got an error.

Since this was only affecting one browser, my natural inclination was to start all the way at number 7 on the list, blaming Internet Explorer. But I’ve learned that as much as I want to blame it, issues with IE usually just shine a light on something in my own code that other browsers are more forgiving about. So it was time to walk backwards down the list. (Again… not really, but this is how it played out.)

The error that the browser reported was a “security problem” with jQuery Migrate. First I had to figure out what the hell jQuery Migrate was and why it was being loaded. (Turns out, it’s a place the jQuery team dumped deprecated code it pulled from version 1.9. It’s loaded by default by WordPress.)

With that in mind, this should be affecting every site I’ve built recently, since they’re all in WordPress. But it was only affecting this one site. So I had to try to narrow down where the problem exists. With WordPress, there are two main “variables” in the implementation: themes and plugins. When in doubt, try switching your theme and disabling the plugins you’re using. I started by disabling all of the plugins, one by one. No change. I found the error didn’t occur if I disabled Advanced Custom Fields, but that’s because half of the page didn’t load without it! (That’s another error on my part but let’s ignore that for now, shall we?)

OK, so it’s not a plugin. Next I swapped in the standard Twenty Sixteen theme in place of my custom theme. Not surprisingly, the error didn’t occur, but that didn’t help much because none of my Advanced Custom Fields content was in the pages. I still couldn’t rule out ACF as the culprit. But I tend to reuse field groups from site to site, so once again, if this were attributable to an ACF issue — even something specific to my field groups — it would’ve cropped up on another site.

So now I had little left to do but selectively comment out elements of the theme so I could narrow down where the problem was. (I make this all sound like a logical progression; in fact my debugging process is a lot more chaotic than this description — I actually did this commenting-out process haphazardly and repetitively throughout the afternoon.)

Eventually I pinpointed the troublesome block of code. Yes, it was #1 from the list. But as is usually the case with hard-to-diagnose problems, the complete picture here is that #1 included a combination of #3 and #5, which triggered an error message generated by #6, but only in the context of #7.

Yes. That’s what happened.

In the footer of the page, I had a link to the client’s email address. As is my standard (but by now probably outmoded) practice, I have a custom-built function I wrote years ago to obfuscate the email address by randomly converting most (but not all) of the characters in the string into HTML ampersand entities. My problem was not that function itself, which is tried and true. It’s that in this particular instance I called it on a string that included the mailto: pseudo-protocol, not just the email address itself.

I think the colon in mailto: is particularly significant to the problem, as evidenced by the fact that around 10-20% of the time the problem didn’t occur, and the page loaded normally. Since my obfuscation function randomly leaves characters in the string alone, that’s about how often the colon would have been kept untouched.

But even then, what difference should it make? Browsers decode those entity strings and can handle them in the href attribute of links just fine. However in this particular case I didn’t just use my obfuscation function. Without giving it much thought, in this particular site I had decided to wrap the obfuscated string in the standard WordPress esc_url() function. Trying to properly sanitize things, like a good developer. Right? Except — and I took a quick look at the source code to confirm it — there’s special handling in esc_url() for strings that don’t contain a colon. So the roughly 86% of the time that my string didn’t contain a colon, esc_url() was prepending http:// onto the string.

This situation was causing a particular piece of code in jQuery Migrate to barf… but only in Internet Explorer and Edge, for reasons I still don’t understand, but it has to do with how the different browsers handle security warnings in JavaScript. I found along the way (but before I had pinpointed the real problem) that if I commented out a particular segment of code in jQuery Migrate pertaining to the handling of selectors containing hashtags (see, the HTML ampersand entities again) I could get the page to load normally.

So, like I said, my newly written procedural code (#1), which itself included calls to both an existing custom function I wrote (#3) and a function baked into the WordPress core (#5), caused jQuery Migrate to issue an error (#6) but it was one that only a particular browser (Internet Explorer/Edge) cared to acknowledge (#7).

No wonder it took all afternoon to figure it out.

* The only reason I break out OO from procedural code is that OO has more structured patterns that are less likely to result in sloppy mistakes. Slightly.

WordPress dev tip: How to move the Featured Image box up… to just below the Publish box

Whenever I’m doing development on a WordPress site that makes heavy use of taxonomies (it happens with meta data-rich portfolios for architects, for instance, which seems to be a niche for me), I get really annoyed with how much WordPress devalues the Featured Image meta box. I don’t want it shoved way down below all of the taxonomies, mainly because users will probably forget or never even know that it’s there!

What I really want is to have the Featured Image box near — but not at — the top of the sidebar. Specifically, I want it to come just below the Publish meta box.

I’ve found some resources online that almost got me there, but not quite. However a minor tweak to this example solves the problem for me.

I’m taking some shortcuts here, some of which you may not like. First, most tutorials on manipulating meta boxes encourage you to remove them and then add duplicates with a slightly different ID. I think what’s happening here though (not having inspected the source code!) is that your modifications to the add_meta_boxes action run before the standard WordPress meta boxes get loaded (possibly/probably because, as you’ll see, we’re setting the priority to high), so if you’ve created one with the same ID as a default box, yours takes precedence.

The other shortcut I’m taking, which I suspect will be more controversial (but it’s just the way I like to do this) is that I am creating an anonymous function directly within the add_action() call. That’s just a personal preference, but I like to do it because 1) it keeps the code more compact and 2) it avoids creating a bunch of named functions that have no business ever being called anywhere else anyway.

So what’s happening here? First, I’m creating the Publish meta box. Then I’m creating the Featured Image meta box. By giving them both high priority, WordPress makes them the first two meta boxes in the sidebar. The reason I have to create the Publish meta box is that, if I didn’t, Featured Image would come first, above it. I don’t want that.

I’ve set the $screen parameter to null so this will happen on all editing screens, but if you only wanted to move Featured Image up on posts and not on pages, for example, you could set it to 'post'.

Here’s the full code:

add_action('add_meta_boxes', function() {
  add_meta_box('submitdiv', __('Publish'), 'post_submit_meta_box', null, 'side', 'high');
  add_meta_box('postimagediv', __('Featured Image'), 'post_thumbnail_meta_box', null, 'side', 'high');
});

For more background, check out the official documentation on the add_meta_box() function.

Make Advanced Custom Fields smarter about handling date fields

I love Advanced Custom Fields almost as much as I love WordPress itself. But that’s not to say it doesn’t have its problems. Most are obscure, and minor… and incredibly aggravating once you stumble upon them.

Here’s one such case. Date Picker fields are great, but no one seems to be able to agree on how to store dates in a database… other than insisting on avoiding Unix timestamps, the obvious choice.

ACF stores its dates, for some reason, in YYYYMMDD format (or, as we’d express it in PHP Land, Ymd). No delimiters at all. If you’re not going to use Unix timestamps, why not at least use the MySQL convention of Y-m-d H:i:s? But I digress.

I’m presently working on a project that merges some functionality of ACF and Gravity Forms, along with some custom code, to create a jobs board. It’s super-slick how Gravity Forms can create posts from a form submission, and even set them to pending review so a site editor can come in and review them before publishing.

But… dates. Jobs boards have a lot of dates. And while Gravity Forms offers a wealth of options for date string format, Ymd isn’t one of them. So it ends up storing the date value in the database in a format ACF doesn’t like. Because ACF is very picky. It wants that format, and no other. If the value in the field is not in Ymd format, the value displayed on the admin editing screen is just… blank. And then when you save, whatever was previously saved in that field is erased.

It doesn’t have to be this way. And thanks to the following bit of code, it won’t be. Now bear in mind, this is only altering what ACF renders on the editing screen. Once you’ve saved again from that point, the date will be stored in ACF’s preferred format, but up until then, it will be in whatever other format it was in when it landed in the database.

If you’re writing your front end code proactively, that won’t matter. Because you’re already assuming data inconsistency and using strtotime() to standardize any dates you’re working with in your templates, right? Of course you are.

OK, then. So the real goal here is just to get ACF to display the correct, saved date when you go in to edit the post, so it doesn’t then wipe out the date when you hit Save Changes.

In your functions.php file, or wherever you think is best (a plugin would be nice), do this:

function acf_smart_dates($field) {
  if ($field['value']) {
    $field['value'] = date('Ymd',strtotime($field['value']));
  }
  return $field;
}
add_filter('acf/prepare_field/type=date_picker','acf_smart_dates');

That’ll do.

The simple way to add a “force SSL” option in WordPress

There are plugins for just about everything in WordPress. But cluttering up your site with plugins isn’t always such a great idea. They add bloat that can slow down your site, and if poorly written can cause potential conflicts.

Plus, a lot of them are simply over-engineered. I hate that.

That said, I love good plugins, especially Advanced Custom Fields by Elliot Condon. I consider it essential to every WordPress site I build… which is a lot these days.

A site I’m currently working on has some pages that require SSL encryption. And I want the client to be able to turn SSL on or off on a per-page basis. But I didn’t want to use a plugin (besides ACF) to do it. [Disclaimer: you don’t actually need to use ACF for this; the standard Custom Fields capability will do.]

Before getting started, make sure your site actually has an SSL certificate installed and properly configured. And for this to work as shown, the cert needs to use the same domain name as the site itself. If it’s different, your redirect URLs will be a bit more complicated and will require some customization of this code, but it’ll still work.

First, set up your custom field. I’m using force_https as the field name, but it can be whatever. Make it a True/False field. (If you’re not using ACF, just remember you’ll be entering 0 or 1 as the value.)

Next, in your functions.php file, add the following:

add_action('template_redirect', 'my_force_https');
function my_force_https() {
  if (is_ssl()) { return null; }
  global $post;
  if (!empty($post->ID)) {
    if (!empty(get_field('force_https',$post->ID))) {
      $ssl_url = str_replace('http://','https://',get_the_permalink($post->ID));
      wp_redirect($ssl_url); exit;
    }
  }
  return false;
}

Picking this apart: We’re using the template_redirect hook. By this point the post has been loaded. We have all of the data we need, and it’s kind of our “last chance” to get WordPress to do a redirect before it starts rendering the page.

First we run the built-in WordPress function is_ssl(), because if we’re already on an SSL connection, there’s nothing to do. (And, if we didn’t do this check, we’d create a redirect loop.)

Next, if we have a post ID, we use the ACF function get_field() to check whether the force_https custom field is checked for this page. If it is, then we modify the page’s permalink to start with https:// and trigger the redirect. That’s it! [Note: If you’re not using ACF, you’ll need to use the built-in WordPress function get_post_meta() instead of get_field().]

Of course, that’s not quite it. You may notice after you have this working that your page loads with an https:// URL, but you’re still not getting the reassuring lock icon in your browser’s address bar. What gives?

Well, that’s because you probably have assets in the page that are loading over a non-SSL connection. In my experience, this is almost always because of images in your content that have a full, non-SSL URL. Browsers won’t give you the lock icon unless every asset on the page was loaded over an SSL connection.

So I’ve added this second function that strips the protocol from any instance of src="http://... in a text string. If you’re not familiar with “protocol-less” URLs, modern browsers allow you to omit the protocol — http:// or https:// — from URLs in your HTML, using just // instead, and the browser will automatically handle those with the same protocol used to load the page.

Using the filter the_content, this will automatically get applied to most page content and you’ll probably be good.

add_filter('the_content', 'my_strip_protocol');
function my_strip_protocol($content) {
  if (is_ssl()) {
    $content = str_replace('src="http://','src="//',$content);
  }
  return $content;
}

If you add this and you’re still not getting the lock, it means you probably have <img> tags within your theme that include full non-SSL URLs, or possibly some CSS or JavaScript assets that are being loaded over non-SSL connections. You’ll have to troubleshoot that yourself, but with the developer tools built into modern web browsers, that shouldn’t be too difficult. Remember, now that you have it, you can always use this my_strip_protocol() function directly in your theme files as well.

Just a side note about one of my idiosyncratic coding conventions: I always use !empty() when checking for a value that evaluates to true, but you don’t have to. It comes from my prior experience working with CakePHP. The benefit is that you won’t trigger any PHP warnings if the variable you’re evaluating hasn’t been declared.