Noted for future reference: Fixing a slow-to-boot Linux server

I have a few Ubuntu Linux VPSes that were originally spun up on the then-latest-and-greatest 16.04 LTS. Over the past year I’ve been belatedly upgrading them to 20.04 LTS. Almost without exception, all now have a really irritating flaw: where they used to reboot almost instantly — making me capriciously run OS updates involving reboots at any time of day, even on servers with a bunch of sites on them, since it only meant a blip of about 5 to 7 seconds — they were now consistently taking 2 to 5 minutes to reboot. Yikes!

Poking around, I learned that cloud-init was timing out, causing that delay, but since systems administration is just a small sliver of the work I do, I never had a chance to investigate why it was happening or really what cloud-init was even for. I just resigned myself to having to do those reboots in the middle of the night when no one would notice.

Well, I finally decided I needed to get an answer, and I found it. If I’m reading this correctly, cloud-init is really only needed during the initial creation of the VPS, and can safely be disabled after that. So, let’s do it!

touch /etc/cloud/cloud-init.disabled

I’m pleased to say, it works perfectly. I ran it on a test mirror of my biggest server and it worked, so I then applied it to the live server and… capriciously rebooted it, right in the middle of the day!

Shush. It worked.

Update (August 3, 2022): Mayyyyybe it’s a bit more complicated than what I described above. I just went into another server I had previously updated to 20.04, and I just went ahead and pre-emptively ran this before an update that required reboot. After the reboot, I could not connect to the server at all, other than through Digital Ocean’s direct access console. Thank goodness for that. It did not initially occur to me that this change might be why I couldn’t connect, but after trying a few other fixes without success, I just went in and deleted this new /etc/cloud/cloud-init.disabled file and rebooted, and everything came up just fine… and without any kind of delay on boot. Weird.

How to execute a no-nonsense upgrade to PHP 7.4 on Ubuntu 16.04 LTS

Yeah, yeah. Ubuntu 16.04 LTS is getting pretty long in the tooth. Long-term support ends in less than a year.

But if you’re anything like me (I’m sorry), you’re managing multiple VPSes that are, at the moment, still running it. And now WordPress is giving all of your clients scary warnings about needing to upgrade their version of PHP. What to do?

I’ve distilled the process down to 11 lines that you can just copy-paste straight into the command line. It’s not entirely hands-off; there are a few steps where you’ll be asked to confirm whether you want to keep your existing configuration files (YES!) and such. And — very important — you’ll want to review the set of PHP-related packages I’ve got listed here to make sure they’re ones you need, and that they’re all the ones you need. If you’re not sure whether or not there are others you may want, I suggest running apt update and then apt-cache search php7.4 and reviewing the list of results before proceeding.

Now then… here we go. I’ll break it all down after the code sample.

CAVEAT EMPTOR: I’ve just run this series of commands on three servers and it seemed to work fine, but this code is provided AS IS and you’re on your own if anything gets screwed up.

This assumes you’re already in sudo mode. If not, start with a sudo -s and FEEL THE POWER.

apt update
apt -y install software-properties-common
add-apt-repository -y ppa:ondrej/php
add-apt-repository -y ppa:ondrej/apache2
apt update
apt -y dist-upgrade
apt -y autoremove
apt -y install php7.4 libapache2-mod-php7.4 php7.4-mysql php-imagick php7.4-cgi php7.4-cli php7.4-common php7.4-curl php7.4-gd php7.4-json php7.4-mbstring php7.4-opcache php7.4-soap php7.4-xml
a2dismod php7.0
a2enmod php7.4
service apache2 restart

OK, what are we doing here? Let’s break it down.

apt update

Updating our package cache. Gotta do this first, always.

apt -y install software-properties-common

You may already have this installed. I’m not entirely sure what it’s for but the other articles I read had me doing that before the next steps so who am I to argue?

add-apt-repository -y ppa:ondrej/php
add-apt-repository -y ppa:ondrej/apache2

We are adding external package repositories created by Ond?ej Surý that allow versions of Ubuntu Linux to install newer versions of PHP than what comes with the standard Canonical set.

apt update
apt -y dist-upgrade
apt -y autoremove

Gotta do this again, since we’ve added new repositories. We’re doing a full-blown update of any outdated packages in the OS, and using the -y switch means we’re not going to be asked to manually confirm before proceeding. Be careful!

apt -y install php7.4 libapache2-mod-php7.4 php7.4-mysql php-imagick php7.4-cgi php7.4-cli php7.4-common php7.4-curl php7.4-gd php7.4-json php7.4-mbstring php7.4-opcache php7.4-soap php7.4-xml

This is the big one. We’re installing PHP 7.4 as well as a bunch of related packages we probably need. If you don’t know what all of these do, I encourage you to research them. You may not need them all. You may need others not included here. But these seem to do the trick for a typical WordPress setup.

a2dismod php7.0
a2enmod php7.4

Here we’re telling Apache to stop using PHP 7.0 and to use PHP 7.4 instead. This assumes you’re currently running PHP 7.0, which would be the case if you’re still on the default Ubuntu 16.04 LTS packages.

service apache2 restart

Let’s restart Apache and get that PHP 7.4 goodness! Hopefully everything works! But I suppose we should also be forward-thinking. This command is deprecated and I believe removed completely in Ubuntu 20.04, so you could use the more modern (but to my eye, decidedly less friendly) systemctl restart apache2 instead.

Postscript

One more thing… along the way you might have updated some packages that recommend a restart. If that’s the case, throw in one last command for fun:

reboot

Obviously if your server gets a ton of traffic you may not want to reboot in the middle of the day. But then you shouldn’t have been doing any of this in the middle of the day. The Digital Ocean VPSes I use typically reboot in less than 10 seconds, so I am never too hesitant to reboot at any time. Some of the other commands above, however, may shut down Apache or MySQL for a longer period (probably not more than a minute or two).

Post-postscript

This should also work more or less the same for any other version of Ubuntu you’re trying to keep fresh past its sell-by date. The main thing you might need to look at is the a2dismod php7.0 line. You’re probably running a different version of PHP. You can use php -v to see which version you’re running, and you can run ls /etc/php to see which version(s) you have installed.

I spent 5 hours troubleshooting this WordPress problem so you don’t have to (starring: WooCommerce Action Scheduler)

Sorry for that “clickbaity” headline. I added the parenthetical so it might be at least marginally useful. Since my WordPress-related posts are always about how I solved a particularly weird or obscure WP issue, I usually consider their titles carefully. “What would I have googled to find a solution to this problem?” But honestly, I spent 5 hours on this yesterday partly because I wasn’t sure what to google. (And I use lowercase “google” as a generic for “conduct an Internet search”; I normally use DuckDuckGo.)

OK, so here’s the situation. This particular site is — among my normally very-low-traffic clients — one of the busiest I work on. It’s a WooCommerce site with hundreds of products and 20+ daily orders. (Yeah, 20+ orders a day is not huge, but on the scale I normally deal with, it’s a lot.)

This site runs on its own virtual private server, with 8 GB RAM and 4 vCPUs. Pretty substantial for a single site. And yet, for weeks it has been maxing out RAM and CPU resources. Not to the point where the site was in crisis mode that demanded my immediate attention, but it was frustratingly slow. Just slightly below the threshold of me dropping work on other new projects to try to fix this. (At this point I feel obliged to note that I did not actually build or maintain this site for its first couple of years of existence, so I don’t know its inner workings as well as I normally would. I just know it’s running way too many plugins and desperately needs some TLC I have not had time to give it.)

Yesterday things finally got to the breaking point. For me, at least. The client had contacted me about an unrelated issue, but as I was dealing with that, I got frustrated by seeing all of this inexplicable resource usage, so I had to address it.

As it happens, this post is actually a bit of a sequel to my last post, about getting Apache’s mod_status and mod_rewrite to play nicely on a WordPress site. About three weeks ago I finally got mod_status working on this site, and had planned to come back, when I had a chance, to investigate this issue.

If you are not familiar with mod_status, you should check it out. Apache is generally a bit of a “black box” but this lets you see exactly what’s happening with each thread — the requested URL, the requesting IP address, connection time, resource usage, etc.

I noticed an absurd number of threads were coming from the localhost and were requesting wp-admin/admin-ajax.php with a query string referencing WooCommerce’s Action Scheduler. But what to do with that information?

I’ll admit, this is where I wasted a bunch of time in fruitless searches, because I don’t know a lot about Action Scheduler. I read a few threads on the WordPress support forums and StackOverflow that kind of danced around the problem I was having but never really got at it.

Eventually I ended up in phpMyAdmin, scrutinizing the wp_actionscheduler_actions table, and trying to figure out where all of the wc_facebook_regenerate_feed actions were coming from. I used my old favorite Search-Replace-DB to try to find any instances of “facebook” in the database. (This was an utter failure, for reasons I can’t explain. But that failure was critical to why this took me so long to resolve.)

I went to Tools > Scheduled Actions in WP admin and discovered there were over 200,000 actions, although there were only about 70,000 (only!) showing up in wp_actionscheduler_actions. Mystery!

I went to wp_actionscheduler_actions again, saw that those wc_facebook_regenerate_feed actions had all been scheduled weeks ago, and decided to just chuck out the lot. I truncated the table, but within seconds it started filling up again with hundreds of wc_facebook_regenerate_feed actions, with the same weeks-old scheduled dates. Where were they coming from???

What was especially maddening to me about all of this was that I had already, weeks ago, determined that the plugin that had created these — Facebook for WooCommerce — had been causing some kind of trouble, and I had deactivated it. Yesterday I even went so far as to delete it. I scoured the theme code for references to Facebook. I looked in the file system for stray files that might be responsible. And as I mentioned before, I tried to search the database for any references to Facebook. I was getting nowhere.

Eventually I realized Search-Replace-DB was having problems, so I dove into phpMyAdmin directly and started searching individual fields, in individual tables, for “Facebook”. And that’s where I finally figured it out.

WordPress puts everything in wp_posts, and that’s a problem.

I’ve complained over the years about the database architecture in WordPress. Having built multiple custom CMSes in the years prior to when I finally, fully embraced WordPress in 2014, I have a fair bit of experience designing databases. And two things I learned in that experience were: 1) clearly define what your data tables are for, and 2) indexes make databases efficient. WordPress is awesome for many things, but it has far, far outgrown its original conception as blogging software. Custom Post Types and Custom Fields make it super-flexible, but shoving everything they create into wp_posts and wp_postmeta can create a disastrous situation.

Case in point, WooCommerce scheduled actions. In earlier iterations, those were custom posts! (As are, still, WooCommerce orders, which is totally f***ed up, if you ask me.) At some point Woo or Automattic realized scheduled actions don’t belong in wp_posts, so they created four new tables specifically for managing them. Plugins that use scheduled actions had to create new scheduled actions for migrating the old wp_posts scheduled actions into the new tables.

And that’s where I found myself. Through some curious set of circumstances with this particular site, which probably at some point included someone other than me disabling WP-Cron to try to fix some other problem, 200,000+ scheduled actions from the Facebook for WooCommerce plugin (in the wp_posts table) got queued up for migration to the new tables. And as quickly as I was deleting them from the new tables, Action Scheduler (which runs once a minute!!!) was dutifully refilling them.

(And obviously they were never actually running… perhaps because I had deactivated the plugin? Or because they were simply timing out? Who knows? But here’s something I see as a flaw with Action Scheduler: it should check to see if the plugin that scheduled an action is currently active, and if not, purge the action immediately.)

At last here was the fix. I had to run this SQL query in phpMyAdmin. (Proceed with caution! Don’t just use this code… look in your database for exactly what is causing problems and adjust accordingly.)

DELETE FROM `wp_posts` WHERE `post_title` = 'wc_facebook_regenerate_feed';

Note: I’m doing this from memory — and a glance back at my browser history from yesterday. I didn’t keep notes on exactly what the title was.

For a more generalized — and drastic — approach, you could also do this:

DELETE FROM `wp_posts` WHERE `post_type` = 'scheduled-action';

I’ll just conclude here with a nice little graph of the site’s CPU and RAM usage over the past 24 hours. It was 6 PM when I finally figured this out!

How to get Apache’s mod_status and mod_rewrite to play nicely on a WordPress site

Apache’s mod_status can be very handy for monitoring exactly what’s going on inside of Apache on a busy website, but it can be a bit difficult to set up, if your site runs something like WordPress that also relies heavily on Apache’s mod_rewrite.

Specifically even though I had set up mod_status according to the official instructions, and specifically had also added the code to the virtual hosts, I still found that trying to access a site’s /server-status URL was just redirecting me to the WordPress 404 error page.

Here’s the fix. Maybe there’s a “better” way, but this worked for me. I just needed to hijack the rewrite rules in the site’s .htaccess file.

If you’ve already got IP or Auth based access restrictions configured in the virtual host, you probably don’t need the RewriteCond line, but I prefer to err on the side of caution. I used my VPN’s IP address (masked as 9’s here, which of course is not a valid IP address)… you’ll want to fill in whatever IP address(es) you want to allow in.

RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^(999\.999\.999\.999)$
RewriteRule ^server-status$ – [L]

Put this before the WordPress rewrite rules, or it won’t do any good. And of course this is missing the <IfModule mod_rewrite.c> wrapper you probably should include, but if you’re doing this you already know mod_rewrite is enabled, so I don’t bother.

Getting Google to remove fake hack URLs from its indexes for your site

As a web developer/systems admin, dealing with a hacked site is one of the most annoying parts of the job. Partly that’s on principle… you just shouldn’t have to waste your time on it. But also because it can just be incredibly frustrating to track down and squash every vector of attack.

Google adds another layer of frustration when they start labeling your site with a “This site may be hacked” warning.

A lot of times, this is happening because the hack invented new URLs under your domain that Google indexed, and for various reasons, Google may not remove these pages from its index after it crawls your thoroughly-cleaned site, even though those URLs are no longer there and are not in your sitemap.xml file. This issue may be exacerbated by the way your site handles redirecting users when they request a non-existent URL. Be sure your site is returning a 404 error in those cases… but even a 404 error may not be enough to deter Google from keeping a URL indexed, because the 404 might be temporary.

410 Gone

Enter the 410 Gone HTTP status. It differs from 404 in one key way. 404 says, “What you’re looking for isn’t here.” 410 says, “What you’re looking for isn’t here and never will be again, so stop trying!”

Or, to put it another way…

A quick way to find (some of) the pages on your site that Google has indexed is to head on over to Google (uh, yeah, like you need me to provide a link) and just do an empty site search, like this:

site:blog.room34.com

Look for anything that doesn’t belong. And if you find some things, make note of their URLs.

A better way of doing this is using Google Search Console. If you run a website, you really need to set yourself up on Google Search Console. Just go do it now. I’ll wait.

OK, welcome back.

Google Search Console lets you see URLs that it has indexed. It also provides helpful notifications, so if Google finds your site has been hacked, it will let you know, and even provide you with (some of) the affected URLs.

Now, look for patterns in those URLs.

Why look for patterns? To make the next step easier. You’re going to edit your site’s .htaccess file (assuming you’re using Apache, anyway… sorry I’m not 1337 enough for nginx), and set up rewrite rules to return a 410 status for these nasty, nasty URLs. And you don’t want to create a rule for every URL if you can avoid it.

When I had to deal with this recently, the pattern I noticed was that the affected URLs all had a query string, and each query string started with a key that was one of two things: either a 3-digit hexadecimal number, or the string SPID. With that observation in hand, I was able to construct the following code to insert into the .htaccess file:

# Force remove hacked URLs from Google
RewriteCond %{QUERY_STRING} ^([0-9a-f]{3})=
RewriteRule (.*) – [L,R=410]
RewriteCond %{QUERY_STRING} ^SPID=
RewriteRule (.*) – [L,R=410]

Astute observers (such as me, right now, looking back on my own handiwork from two months ago) may notice that these could possibly be combined into one. I think that’s true, but I also seem to recall that regular expressions work a bit differently in this context than I am accustomed to, so I kept it simple by… um… keeping it more complicated.

The first RewriteCond matches any query string that begins with a key consisting of a 3-digit hex number. The second matches any query string that begins with a key of SPID. Either way, the response is a 410 Gone status, and no content.

Make that change, then try to cajole Google into recrawling your site. (In my case it took multiple requests over several days before they actually recrawled, even though they’re “supposed” to do it every 48-72 hours.)

Good luck!

UNDERDOG of PERFECTION

a blog on technology, music and geek culture from room34

Tag Archives: sysadmin