When switching servers breaks code: a WordPress mystery

Earlier this week I launched a brand new WordPress site for a long-time client. Break out the champagne! But of course it’s never that simple, is it?

The client’s live server is a newly configured VPS running Ubuntu 16.04 LTS and PHP 7.0; meanwhile, our staging server is still chugging away on Ubuntu 14.04 LTS and PHP 5.5. So, clearly, a difference there. But I was pleased to find that, for the most part, the site functions perfectly on the new server.

But then the client discovered a problem: on one page, content from a custom post type query wasn’t displaying.

Here’s a short version of the pertinent code:

$people = new WP_Query(array(
  ‘order’ => ‘ASC’,
  ‘orderby’ => ‘menu_order’,
  ‘posts_per_page’ => -1,
  ‘post_type’ => ‘person’,
));

if ($people->have_posts()) {
  while ($people->have_posts()) {
    $people->the_post();
    ?>
    <article>

      <header><h2><?php the_title(); ?></h2></header>
      <div><?php the_content(); ?></div>

    </article>
    <?php
  }
}

Strangely, the_title() was working fine, but the_content() wasn’t. It had been — still is, in fact — working on our staging server, all other things within the WordPress context being equal. (Identical, up-to-date versions of the theme files and all plugins, and WP itself.) And the client confirmed that the content was present in WP admin.

I found, confusingly, that get_the_content() works, even though the_content() doesn’t. But of course you don’t get all of the proper formatting (like paragraph breaks) without some WP filters that the_content() applies, so I tried this:

<?php echo apply_filters(‘the_content’, get_the_content()); ?>

Still didn’t work. After a bit more research I was reminded that the pertinent function that filter runs is wpautop(), so I just called that directly:

<?php echo wpautop(get_the_content()); ?>

Now I have the content displaying nicely, but this is clumsy and I really do not get what might be different. I know the new server is running PHP 7.0 and our staging server is running PHP 5.5… but I’m struggling to understand what kind of changes in PHP could cause this specific problem.

Since get_the_content() works, and the_content() doesn’t, the problem has to lie in something that’s happening with the filters on the_content(). Why? Because the_content() calls get_the_content() right up front. In fact, there’s not a lot to the_content() at all. This function lives in wp-includes/post-template.php (beginning at line 230 in WP 4.6). Here it is in its entirety (reformatted slightly for presentation here):

function the_content( $more_link_text = null, $strip_teaser = false) {
  $content = get_the_content( $more_link_text, $strip_teaser );

  /**
  * Filters the post content.
  *
  * @since 0.71
  *
  * @param string $content Content of the current post.
  */
  $content = apply_filters( ‘the_content’, $content );
  $content = str_replace( ‘]]>’, ‘]]>’, $content );
  echo $content;
}

As you can see, it’s really just 4 lines of actual code. It calls get_the_content() to retrieve the content, applies filters, does an obscure string replacement (which I think I understand but is not really pertinent here), and then echoes the results out to the page.

It’s pretty clear to me that the problem has to lie in one (or more) of the filters in the 'the_content' stack. I have to admit that even after years of working with it, I only have a rather nebulous understanding of how hooks work, so I’m not even sure where to begin dissecting the filter stack here to pinpoint the source of the trouble.

Whenever I know something works in one place and doesn’t work in another place, the first course of action in troubleshooting is to try to identify all of the differences between the two environments. Obviously we have some big differences here as I noted at the top of this post. But I am going to assume that the problem does not lie at the OS layer. Most likely it’s either a difference between PHP 5.5 and 7.0, or, even more likely, a difference between the PHP configurations on the two servers… specifically, modules that are or are not active. See my previous post on The Hierarchy of Coding Errors for my rationale here. Also keep in mind that I personally was responsible for installing LAMP on the server and configuring PHP, and it’s pretty obvious that we’re looking at the sysadmin equivalent of #1 or #2 in that list.

The next step, were I to care to pursue it much further (and if I didn’t have 200 other more important things to do, now that I have the problem “fixed”), would be to run phpinfo() on both servers and identify all of the differences.

That’s one possible path, at least. Another thing to consider is that the_content() actually is working just fine in other parts of the site, so maybe it would be worth digging into that WordPress filter stack first.

At this point, because as I said I have a few other more important things to work on, I will probably leave the mystery unresolved here. But I’d welcome any ideas from readers as to an explanation for all of this.


Update! I just couldn’t leave well enough alone, so a few minutes after I published this post, with the client’s permission, I restored the old version of the template, turned on WP_DEBUG and installed the Debug Bar plugin. Jackpot! Debug Bar returned the following error message when I was calling the_content(), but not when I had my “fixed” code in place:

Screen Shot 2016-09-01 at 9.24.16 AM

Well, how about that? As it turns out, the problem is due to a filter I myself had added, using a previously written function. (That’s #3 on the hierarchy list.) Combine that with deprecated functionality that was removed in PHP 7.0, and problem solved. And I even figured out why the problem is only occurring on this page and not site-wide… because my filter only runs if there’s an email address link in the content.