I spent 5 hours troubleshooting this WordPress problem so you don’t have to (starring: WooCommerce Action Scheduler)

Sorry for that “clickbaity” headline. I added the parenthetical so it might be at least marginally useful. Since my WordPress-related posts are always about how I solved a particularly weird or obscure WP issue, I usually consider their titles carefully. “What would I have googled to find a solution to this problem?” But honestly, I spent 5 hours on this yesterday partly because I wasn’t sure what to google. (And I use lowercase “google” as a generic for “conduct an Internet search”; I normally use DuckDuckGo.)

OK, so here’s the situation. This particular site is — among my normally very-low-traffic clients — one of the busiest I work on. It’s a WooCommerce site with hundreds of products and 20+ daily orders. (Yeah, 20+ orders a day is not huge, but on the scale I normally deal with, it’s a lot.)

This site runs on its own virtual private server, with 8 GB RAM and 4 vCPUs. Pretty substantial for a single site. And yet, for weeks it has been maxing out RAM and CPU resources. Not to the point where the site was in crisis mode that demanded my immediate attention, but it was frustratingly slow. Just slightly below the threshold of me dropping work on other new projects to try to fix this. (At this point I feel obliged to note that I did not actually build or maintain this site for its first couple of years of existence, so I don’t know its inner workings as well as I normally would. I just know it’s running way too many plugins and desperately needs some TLC I have not had time to give it.)

Yesterday things finally got to the breaking point. For me, at least. The client had contacted me about an unrelated issue, but as I was dealing with that, I got frustrated by seeing all of this inexplicable resource usage, so I had to address it.

As it happens, this post is actually a bit of a sequel to my last post, about getting Apache’s mod_status and mod_rewrite to play nicely on a WordPress site. About three weeks ago I finally got mod_status working on this site, and had planned to come back, when I had a chance, to investigate this issue.

If you are not familiar with mod_status, you should check it out. Apache is generally a bit of a “black box” but this lets you see exactly what’s happening with each thread — the requested URL, the requesting IP address, connection time, resource usage, etc.

I noticed an absurd number of threads were coming from the localhost and were requesting wp-admin/admin-ajax.php with a query string referencing WooCommerce’s Action Scheduler. But what to do with that information?

I’ll admit, this is where I wasted a bunch of time in fruitless searches, because I don’t know a lot about Action Scheduler. I read a few threads on the WordPress support forums and StackOverflow that kind of danced around the problem I was having but never really got at it.

Eventually I ended up in phpMyAdmin, scrutinizing the wp_actionscheduler_actions table, and trying to figure out where all of the wc_facebook_regenerate_feed actions were coming from. I used my old favorite Search-Replace-DB to try to find any instances of “facebook” in the database. (This was an utter failure, for reasons I can’t explain. But that failure was critical to why this took me so long to resolve.)

I went to Tools > Scheduled Actions in WP admin and discovered there were over 200,000 actions, although there were only about 70,000 (only!) showing up in wp_actionscheduler_actions. Mystery!

I went to wp_actionscheduler_actions again, saw that those wc_facebook_regenerate_feed actions had all been scheduled weeks ago, and decided to just chuck out the lot. I truncated the table, but within seconds it started filling up again with hundreds of wc_facebook_regenerate_feed actions, with the same weeks-old scheduled dates. Where were they coming from???

What was especially maddening to me about all of this was that I had already, weeks ago, determined that the plugin that had created these — Facebook for WooCommerce — had been causing some kind of trouble, and I had deactivated it. Yesterday I even went so far as to delete it. I scoured the theme code for references to Facebook. I looked in the file system for stray files that might be responsible. And as I mentioned before, I tried to search the database for any references to Facebook. I was getting nowhere.

Eventually I realized Search-Replace-DB was having problems, so I dove into phpMyAdmin directly and started searching individual fields, in individual tables, for “Facebook”. And that’s where I finally figured it out.

WordPress puts everything in wp_posts, and that’s a problem.

I’ve complained over the years about the database architecture in WordPress. Having built multiple custom CMSes in the years prior to when I finally, fully embraced WordPress in 2014, I have a fair bit of experience designing databases. And two things I learned in that experience were: 1) clearly define what your data tables are for, and 2) indexes make databases efficient. WordPress is awesome for many things, but it has far, far outgrown its original conception as blogging software. Custom Post Types and Custom Fields make it super-flexible, but shoving everything they create into wp_posts and wp_postmeta can create a disastrous situation.

Case in point, WooCommerce scheduled actions. In earlier iterations, those were custom posts! (As are, still, WooCommerce orders, which is totally f***ed up, if you ask me.) At some point Woo or Automattic realized scheduled actions don’t belong in wp_posts, so they created four new tables specifically for managing them. Plugins that use scheduled actions had to create new scheduled actions for migrating the old wp_posts scheduled actions into the new tables.

And that’s where I found myself. Through some curious set of circumstances with this particular site, which probably at some point included someone other than me disabling WP-Cron to try to fix some other problem, 200,000+ scheduled actions from the Facebook for WooCommerce plugin (in the wp_posts table) got queued up for migration to the new tables. And as quickly as I was deleting them from the new tables, Action Scheduler (which runs once a minute!!!) was dutifully refilling them.

(And obviously they were never actually running… perhaps because I had deactivated the plugin? Or because they were simply timing out? Who knows? But here’s something I see as a flaw with Action Scheduler: it should check to see if the plugin that scheduled an action is currently active, and if not, purge the action immediately.)

At last here was the fix. I had to run this SQL query in phpMyAdmin. (Proceed with caution! Don’t just use this code… look in your database for exactly what is causing problems and adjust accordingly.)

DELETE FROM `wp_posts` WHERE `post_title` = 'wc_facebook_regenerate_feed';

Note: I’m doing this from memory — and a glance back at my browser history from yesterday. I didn’t keep notes on exactly what the title was.

For a more generalized — and drastic — approach, you could also do this:

DELETE FROM `wp_posts` WHERE `post_type` = 'scheduled-action';

I’ll just conclude here with a nice little graph of the site’s CPU and RAM usage over the past 24 hours. It was 6 PM when I finally figured this out!

How to get Apache’s mod_status and mod_rewrite to play nicely on a WordPress site

Apache’s mod_status can be very handy for monitoring exactly what’s going on inside of Apache on a busy website, but it can be a bit difficult to set up, if your site runs something like WordPress that also relies heavily on Apache’s mod_rewrite.

Specifically even though I had set up mod_status according to the official instructions, and specifically had also added the code to the virtual hosts, I still found that trying to access a site’s /server-status URL was just redirecting me to the WordPress 404 error page.

Here’s the fix. Maybe there’s a “better” way, but this worked for me. I just needed to hijack the rewrite rules in the site’s .htaccess file.

If you’ve already got IP or Auth based access restrictions configured in the virtual host, you probably don’t need the RewriteCond line, but I prefer to err on the side of caution. I used my VPN’s IP address (masked as 9’s here, which of course is not a valid IP address)… you’ll want to fill in whatever IP address(es) you want to allow in.

RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^(999\.999\.999\.999)$
RewriteRule ^server-status$ – [L]

Put this before the WordPress rewrite rules, or it won’t do any good. And of course this is missing the <IfModule mod_rewrite.c> wrapper you probably should include, but if you’re doing this you already know mod_rewrite is enabled, so I don’t bother.

SPF for dummies (i.e. me)

For a while I’ve known that (legitimate) outgoing email messages originating from my web server were occasionally not reaching their intended recipients. I also knew that there was a DNS change you could make to help prevent this problem, but I didn’t know any more about it and it was a marginal enough problem that I could just put it off.

Finally today I decided to deal with it. And I was (re)introduced to the SPF acronym. No, that’s not Sun Protection Factor, or Spray Polyurethane Foam, or even Single Point of Failure (although in my case perhaps that last one is accurate). No, it stands for Sender Policy Framework, and it’s an add-on to the core capabilities of DNS that provides a way to positively identify the originating servers of outgoing email messages.

My situation is simple: I have a domain name that needs to be able to send mail from either my mail server or my web server. Most of the tutorials I found for SPF were far too convoluted to address this simple arrangement. Then I found this post by Cyril Mazur which provided the very simple answer:

v=spf1 a mx ~all

Simply add the above as a new TXT record in your DNS zone file, and you should be set.

A follow-up on Apache not starting on my web server

About 6 weeks ago, I wrote about a problem I was having with Apache not starting with SSLEngine on. I ended the post somewhat ominously with the following:

I’m a little concerned that Apache is going to require manual input of these pass phrases again whenever it restarts (e.g. if the server reboots). I hope not, but for now I am at least able to move forward knowing it works at all.

This morning, a little before 6 AM, that happened. I was awakened by notifications (with their attendant beeps and nightstand vibrations) on my iPhone that my web server was down. Great. Half-awake, I fired up my hosting provider’s handy iPhone app, tapped the “Hard Reboot” button, and tried to go back to sleep. Except, the notifications kept coming. Eventually I was awake enough to realize that the server was coming back up, but Apache wasn’t. Time to get up and deal with this problem from a real computer.

SSHed in, I tried manually starting Apache, and got this:

(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs

What the crap? After spending a half hour visually scanning log and configuration files, to no avail, I decided I needed to try to find out what was running on port 80. This page was helpful in that regard. I ran the command lsof +M -i4 and found that, whaddayknow, Apache was running. Apparently. But I couldn’t shut it down, and I couldn’t restart it. There were no signs of any compromise of the system’s security, so I just chalked this up to some minor problem deeply buried somewhere in a configuration file that I have yet to track down (but which is probably my fault). At any rate, lsof gave me what I really wanted: the process ID that was listening on port 80. Time for the dreaded kill -9 command.

After that, I tried starting Apache again, and it worked… and, as I suspected, it did ask for the pass phrases again. But now, all is well. (Except for the nagging feeling of not knowing what caused this to happen in the first place. Stay tuned…)

My strange solution to Apache not starting on Ubuntu Linux server with SSLEngine on… (YMMV)

The situation: I’m running a web server on Ubuntu Linux using Apache 2. I have two sites on the server that need SSL. I obtained a second IP address (since you can only have one SSL certificate per IP address) and configured Apache accordingly. I was able to get regular old port 80 non-SSL pages to load just fine on virtual hosts configured to use both IP addresses.

I created my key files, got the certificates from the CA (GeoTrust, in this case), all that business. Put the files in the right places, configured the Apache files, all that jazz. Made sure mod_ssl was enabled, yes. All of that. Trust me, I did it. Don’t bother asking. And yet, whenever I tried to run Apache with SSL configured… nothing.

And I mean… nothing.

I’d restart Apache at the command line, and nothing. No error messages of any kind. But Apache wasn’t running. I checked all of the log files (and I mean all of the log files), nothing. DOA.

Eventually I tracked down the culprit as the SSLEngine on line in the Apache config file. With it in there, Apache wouldn’t start. Comment it out, Apache starts up just fine, but of course you don’t have SSL.

I’m using the arrangement of Apache config files as they’re installed in a default Ubuntu build. That means /etc/apache2/httpd.conf is actually empty, and most of its usual contents are in /etc/apache2/apache2.conf, with a few other settings dispersed into a number of adjacent files. There are some critical settings in /etc/apache2/ports.conf and then everything else is in the individual config files I’ve created for each site on the server, stored in the /etc/apache2/sites-available directory with symbolic links for the active ones in /etc/apache2/sites-enabled.

Well… that turned out to be the problem. I’m not sure why it matters, but I was putting the VirtualHost configurations for the SSL sites in the respective sites’ existing configuration files. But no… all of the SSL-related (port 443) <VirtualHost> blocks needed to be put in the 000-default file. That made all the difference.

Well, almost all the difference. My private key files are encrypted with pass phrases, and Apache needed me to enter them when starting up. But, funny thing… it didn’t ask me for them all right away. I had to fiddle around with starting and stopping it a couple of times (which I bothered to do because it still wasn’t running), but eventually it did ask me to enter the pass phrase for both sites, and after I did that, everything is working. Both SSL sites, all of my non-SSL sites, it all works.

I’m a little concerned that Apache is going to require manual input of these pass phrases again whenever it restarts (e.g. if the server reboots). I hope not, but for now I am at least able to move forward knowing it works at all.