From the Stupid PHP Tricks files: rounding numbers and creeping inaccuracy

This morning as I walked to the studio I was doing what geeks do best: pondering a slightly esoteric mathematical quandary.

Glass Half Full by S_novaIngraining the American spirit of optimism at a young age, and under dubious circumstances, our schools always taught rounding numbers in a peculiar way. You always round your decimal values to the nearest integer. That part makes sense. But what if the decimal is .5 — exactly half? In my education, at least until late in high school (or was it college?), we were always taught to round up! The glass is half full. Optimism.

Eventually — far later than it should have been, I think — the concept was introduced that always rounding .5 up is not really that accurate, statistically speaking. It might be nice in the case of a single number to be an optimist and think a solid half is good as a whole, but in aggregate this thinking introduces a problem.

If you have a whole lot of numbers, and you’re always rounding your halves up, eventually your totals are going to be grossly inaccurate.

Of course, the same would happen if you were ever the pessimist and always rounded down.

The solution, I later learned, was to round halves up or down, depending upon the integer value that precedes them. Which way you go doesn’t really matter, as long as you’re consistent, but as it happens, I learned it as such: if the integer is odd, round up; if it is even, round down.

In my work, I write a lot of PHP code. Most of it is of the extremely practical variety; I’m building websites for clients, after all. But every once in a while I like to indulge my coding abilities in a bit of frivolous experimentation, and so today I produced a little PHP script that generates 10,000 random numbers between 1 and 100, with one decimal place, and then it shows the actual sum and average of those numbers, along with what you get as the sum and average if you go through all 10,000 numbers and round them to whole integers by the various methods described above. Try it for yourself!

Any time the rounded average is different from the “precise” (and I use that term somewhat loosely) average, it is displayed in red. Interestingly, and not at all surprisingly, when you always round halves in one direction or the other, at least one of those directions will (almost) always yield an incorrect average. Yet if you use the “even or odd” methods, both of those methods will almost always yield a correct average.

It’s all about the aggregate.

Sendmail not working? Maybe your server’s IP is on a block list

This is pretty arcane, even for me, but since I spent several hours troubleshooting this problem this week — and the solution was nowhere to be found on Google — I figured it was worth sharing.

My CMS, cms34, as I’ve mentioned a few times before, is built on CakePHP. Some features of cms34 include automatically generated email messages. CakePHP has a nice email component that facilitates a lot of this work. It can be configured to use an SMTP server, but by default it sends mail directly from the web server using whatever you have installed on the server, either the ubiquitous sendmail or the more powerful (and capitalized) Postfix. Don’t unleash a deluge of flame comments on me, but I’m using sendmail. So be it.

All was working well until a few weeks ago, when suddenly none of the mails were being sent. There were no errors on the website; the messages just wouldn’t go through. What was more confusing was that messages being sent to my own domain did go through, but for those being sent to my clients’ domains, nothing.

Nothing except log entries, that is. Specifically, the mail log was filling up with lines like this:

Sep 13 13:45:56 redacted sm-mta[28158]: o8DIjsx0028156:
to=<redacted@redacted.com>, ctladdr=<redacted@redacted.com>
(33/33), delay=00:00:02, xdelay=00:00:01,
mailer=esmtp, pri=120799, relay=redacted.com.
[123.456.789.000], dsn=5.7.1, stat=Service unavailable

(Note that I’ve removed the real email and IP addresses to protect the innocent, namely myself.)

“Service unavailable,” huh? I researched that error extensively, without finding much. Eventually I was led to believe it may be an issue with my hostname, hosts, hosts.allow and/or hosts.deny files.

A few relevant points: 1) my hosts.allow file only contains one (uncommented) line: sendmail: LOCAL and 2) likewise, the hosts.deny file only contains: ALL: PARANOID. I’ll save you some time right here: the problem I had ended up having nothing whatsoever to do with any of these host-related files. Leave ’em alone.

After following a number of these dead ends, I was inspired to check the mail file on the server for the user Apache runs as, in my case www-data. On Ubuntu Linux (and probably other flavors), these mail files can be found in /var/mail. Indeed, there were some interesting things to be found there, namely, a number of references to this URL:

http://www.spamhaus.org/query/bl?ip=123.456.789.000

(Again, the IP address has been changed… and yes, I know that’s not a valid IP address. That’s the point.)

I was not previously aware of The Spamhaus Project, but perhaps I should have been. The reason my messages weren’t getting through was because my server’s IP address was on the PBL: Policy Block List. Essentially, that is a list of all of the IP addresses (or IP ranges) in the world that, according to a well-defined set of rules, have no business acting like SMTP (Simple Mail Transfer Protocol*) servers — the servers that send mail out.

It stands to reason that my server was on this list; technically it’s not an SMTP server. But it’s perfectly legitimate for a web server to be running sendmail or Postfix or something of that nature, and sending messages out from the web applications it runs. Fortunately, it’s easy to get legitimate servers removed from the PBL. Simply fill out a form, verify your identity (via a code sent to you in an email message), and within about an hour, the changes will propagate worldwide.

Success! So if you’re in the same kind of situation I was in, where everything seems to be configured properly but your messages just aren’t going out for some reason, try checking Spamhaus to see if your IP is on the PBL.

* If you made it this far in the post, I shouldn’t have to explain the acronym. But I will anyway, as is my wont.

The new site design, phase two

Lots to choose from...As I mentioned the other day, the initial launch of this new site design was just phase one — window dressing. Window dressing I happen to like a lot, but still… same old clunky underlying structure. But not anymore.

WordPress has a pretty rudimentary home page structure by default. Everything’s just *SPLAT* right out there on the home page. Sure, you can use the <!--more--> tag to trigger some automagic stuff with “Read more” links, but overall, it’s not too fancy. Which isn’t to say it’s bad. Now, the default page layouts for some other open source CMSes like Drupal or Joomla, on the other hand… yeah, they’re bad. (Or at least they were, the last time I bothered to care.)

So while WordPress out of the box doesn’t do much fancy stuff with the home page layout, it’s extensible enough that a crafty developer (or a well-read tinkerer) can do some pretty nifty stuff with it. And that was my goal with this new redesign: I’ve got hundreds of posts dating back over seven years now, and most are eternally buried in the archives. I’m hopeful this new approach will bring some of that older content to light, with the random articles list on the home page and the related articles list at the end of each article page.

I didn’t do it all alone… I had some help from a nifty article from Smashing Magazine: 10 Exceptional WordPress Hacks. In particular I made use of numbers 5 through 8 on this list… with some modification. Some of my changes were purely to suit my taste, but others were to improve the usability of the features and in at least one case to fix a bug in the provided sample code. It’s probably worth discussing the details here.

5. Display Related Posts Without A Plug-In

I set this up in single.php so it will appear at the end of each of my articles. I experimented a bit with matching both tags and categories, but I found (for reasons I didn’t dig deep enough to explain) that WP_Query() does well with either tags or categories, but not both.

And, most importantly, I found (or actually, SLP did) that this sample as given breaks comments on the page. Some commenters on the original post mentioned this problem too, along with its solution: you need to call wp_reset_query(); at the end, to tell WordPress to go back to working with the original post’s content.

I also modified the example to look at all of the tags associated with the post, not just the first (can’t really figure out why the original version did that), and tweaked the HTML/CSS output to suit my design. Here’s a simplified version of the code I’m running:

<?php
// Get related posts
$tags = wp_get_post_tags($post->ID);
if (!empty($tags)) {
  $tag_list = array();
  foreach ((array)$tags as $tag) {
    $tag_list[] = $tag->term_id;
  }
  $args = array(
    'tag__in' => (array)$tag_list,
    'post__not_in' => (array)$post->ID,
    'showposts' => 5,
    'caller_get_posts' => 1,
    'orderby' => 'date'
  );
  $related_posts = new WP_Query($args);
  if ($related_posts->have_posts()) {
    ?>
    <h2>Related Posts</h2>
    <ul>
      <?php
      while ($related_posts->have_posts()) {
        $related_posts->the_post();
        ?>
        <li>
          <a href="<?php the_permalink() ?>"
          rel="bookmark" title="<?php the_title_attribute(); ?>"><?php the_title(); ?></a>
        </li>
        <?php
      }
      ?>
    </ul>
    <?php
  }
  wp_reset_query();
}
?>

My version also uses a special truncation function I wrote to display a short excerpt of each post, not shown here. (And yes, eventually I am going to get syntax highlighting set up.)

6. Automatically Retrieve The First Image From Posts On Your Home Page

This one worked pretty well. I changed the function name to something a little less quirky, and I also added some code to verify that the image actually exists, instead of just looking for an <img src> and assuming everything’s OK. This involved inserting the following block of code into the function immediately before if (empty($first_img)) {:

// Check that file exists
if (!empty($first_img)) {
  // remove http/ https/ ftp
  $src = preg_replace("/^((ht|f)tp(s|):\/\/)/i", "", $first_img);
  // remove domain name from the source url
  $host = $_SERVER["HTTP_HOST"];
  $src = str_replace($host, "", $src);
  $host = str_replace("www.", "", $host);
  $src = str_replace($host, "", $src);
  if (!file_exists(ABSPATH . $src)) {
    unset($first_img);
  }
}

If some of that code looks familiar, that’s because I copied it straight out of the next item. If I were writing it myself, or bothering to rewrite it, I would swap out that slightly cumbersome-looking three lines of str_replace() with a single preg_replace() — although perhaps the original coder knows something I don’t, like that doing it this way is actually faster. It very well could be; I know regular expressions are significantly slower than straight-up string replacements.

7. Resize Images On The Fly

I’m using this in conjunction with all of the other items here, no big surprise. I left it mostly as-is, although I did make one small change. I know well the dangers of scaling images up — they look like crap, basically. But a little scaling up doesn’t hurt. At least, it’s much less glaring than having a big set of uniform-looking images marred by one image that’s slightly smaller than the rest. That happened to me as I was putting this together, so I tweaked the script to allow images to be enlarged up to 1.5 times their original size. In timthumb.php I changed lines 103-109 to be:

// don't allow new width or height to be more than 50% greater than the original
if( $new_width > $width * 1.5 ) {
  $new_width = $width * 1.5;
}
if( $new_height > $height * 1.5 ) {
  $new_height = $height * 1.5;
}

8. Get Your Most Popular Posts Without A Plug-In

There was something about the way this one was written that really bothered me. Maybe it’s just that I have a knee-jerk reaction to seeing SQL code appear directly in a page template. But mostly it was that this didn’t match #5 closely enough. Luckily I discovered along the way that regardless of whether you retrieved post data using WP_Query (which returns an object) or with the $wpdb->get_results() method (which returns an array), you can use the same post functions once you’ve called setup_postdata(). So I kept everything from this example up through that, and then I used my modified version of the output code from #5 and it worked like a charm.

One other thing I’d recommend changing: it’s kind of silly to have the if ($commentcount != 0) conditional in there. Much better to just put WHERE comment_count > 0 in the SQL. I also added a date range to the SQL, to keep the list dynamic. In my case, it’s only looking at the most popular posts over a 3-month range. More active blogs could shorten that time frame. My full query looks like this:

$popular = $wpdb->get_results("SELECT * FROM " . $wpdb->posts .
" WHERE post_date > '" . date('Y-m-d',mktime(0,0,0,date('n')-3,date('j'),date('Y'))) .
"' AND comment_count > 0 ORDER BY comment_count DESC LIMIT 4");

There may have been some other changes I made that were relevant here but I think I covered all of the major ones. The tips on the Smashing Magazine site were invaluable in kick-starting my overhaul of the home page. It was uncanny that I stumbled upon this page today, just as I was getting down to this task anyway. It saved me a ton of time. But it still kept me on my toes, since all of the so-called “hacks” required some additional hacks of my own to get them working, or at least, to get them working the way I wanted!

CakePHP paginator sorting problem solved!

I’ll keep this brief, because I need to get back to writing code, but I wanted to share the solution I found to a CakePHP problem that has been nagging me for a while and for which I had never found a simple resolution.

I’m using the $paginator->sort() method to create links in the column headers for tables of paginated search results in my CMS. The method works great, except for one small problem: I could never get it to reverse the order. Intuitively, with links like these, you should be able to click them once to sort in ascending order, and then click again to reverse into descending order. But the reverse was never working for me.

Some research showed that you can pass in all sorts of options to the method, but I wanted to avoid having to make a change like that to about a dozen views; plus, it just didn’t seem right — the method is supposed to do the reverse by default.

At last I have discovered the source of my trouble: you need to explicitly name the model in the parameter that defines the sort field. I’ve been getting more diligent about always naming my models explicitly, but back when I set up these paginated tables (which was about the first thing I did in writing the application), I wasn’t doing it yet. Adding in the model, the reverse order on a second click works perfectly. Here’s a before-and-after example to illustrate the problem and its resolution.

Before:

<?php echo $paginator->sort('Title', 'title'); ?>

After:

<?php echo $paginator->sort('Title', 'Article.title'); ?>

On upgrading WordPress (and WordPress plugins) automatically over SSH/SFTP

For the most part, I love managing my own server. Even though it requires digging into the muck of Linux configuration files with my bare hands (so to speak), and if it goes down, I have no one to blame (or call on for help) but myself, it’s great to have full control and flexibility.

One downside I discovered as a WordPress user, however, is that the super-slick automatic upgrade feature of WordPress was broken on my server. WordPress only supports FTP and the (as I see it) somewhat pointless FTPS. Insecure as it is, my old host supported regular ol’ FTP, and that made WordPress upgrades painless.

There’s no way I’m going to implement FTP on my own server. It’s easy enough to install the package at the command line (really, it is), but I just see no reason to open myself up to the security risks. Granted, there aren’t really that many security risks (beyond one very big one — intercepted passwords) with a well-configured FTP server. But I don’t care to investigate the steps necessary to ensure an FTP server is well-configured.

The obvious choice is to use SFTP/SSH, but at first it looked to me as if WordPress simply didn’t support it. But as I’ve learned (and since proven with my own server), WordPress does support it if your PHP installation has the proper extensions installed. And here’s a guide to get you started.

Once your PHP install is upgraded to support SSH connections, the option will automatically become available in the WordPress upgrade tools, and it works perfectly!