Why you’ve stopped receiving emails from your website

Email is terrible. Just don’t use it.

OK, that’s not realistic. Unfortunately. So I’ll address the question, and offer somewhat of an explanation.

First off, you need to understand how email works. I don’t think a lot of people do. Email doesn’t just leave your device, shoot across the Internet, and land directly in the recipient’s inbox. There are servers involved on both ends. (A server is a specialized computer sitting in a large data center that runs software for purposes like this.)

When you hit “Send,” your message goes from your device to your sending mail server. Then your sending mail server looks at the domain name on the recipient’s email address — the part after @ — and figures out where that domain’s receiving mail server is. It sends the message to that server.

Then, when the recipient wants to check their email, their device connects to their receiving mail server, which sends over the new messages. (The way this works is a lot different now than it was in the early days of the Internet, with the switch from POP3 to IMAP, but if you don’t know what those acronyms mean, just be thankful and move on. It’s not really relevant to this post.)

All of this is relatively straightforward when you’re directly creating the email message in your mail app. In most cases, your own sending and receiving server are one and the same. But it’s different when the email is coming from your website.

Next, you need to understand how email coming from a website works. When you’re getting an email such as an automated notification that someone has filled out a form on your website, who is “sending” the email? It’s not the site visitor who filled out the form. It’s the website itself. So the email doesn’t go through the visitor’s sending server. The website has to have its own sending server.

In the past (up until around early 2020), this was straightforward. Web servers — yes, another specialized computer in a data center, this time the one where your website “lives” — typically would also be running sending mail server software too. The most common early software for this purpose was called, creatively, Sendmail. But there were some serious shortcomings with Sendmail that needed to be fixed, and the replacement software was called, creatively, Postfix.

For years I would set up web servers running Postfix, and there was never any problem. A website user would submit a form, the site would generate a notification email and put it in the Postfix queue, Postfix would send it along to the recipient’s (i.e. the website administrator’s) receiving mail server, and the recipient would receive it.

So why did this stop working? In a word, spam. This kind of setup was very slick and easy to build. So easy, in fact, that hosting providers — especially the ones offering “Virtual Private Servers” where you can configure any software you want — became havens for spammers.

For years there were increasingly convoluted methods to validate these servers as legitimate senders. (Here are some more fun acronyms you don’t want to know: SPF, DKIM, DMARC.) But it was a cat-and-mouse game as spammers continually found ways around every new restriction.

Eventually, so many spammers started using hosting providers like Digital Ocean to send out their garbage, that large mail providers like Gmail and Microsoft Office 365 just decided it wasn’t worth dealing with, and started automatically flagging any email that originated anywhere on Digital Ocean’s network (to cite one example) as being spam.

Now, legitimate websites, sending only legitimate emails, are getting flagged as spam, solely because they happen to exist on the same network as spammers.

OK, so shouldn’t Digital Ocean (and other similar VPS providers) do something about this? Yes. Yes, they should. But instead they’ve decided to just throw up their hands and say, “you should not be using our network to send out email.” Other VPS providers like Linode actually just block port 25 altogether. (Again, if you don’t know what that is, just be happy and move on.)

Is there a solution? Yes. Stop using email. OK, I still know that’s not realistic. There is a solution, but it is cumbersome to set up. You need to configure your website to route outgoing emails through a real mail account on a real mail server. If you’re on WordPress, plugins like WP Mail SMTP and WPO365 can help — but bear in mind that this does mean connecting a real, actual email account to your website through these tools. (And an interesting side effect is that you’ll see these outgoing messages in the Sent folder in your mail app.)

Alternately, you can use a service like Amazon SES or Sendgrid. But however you choose to do it, there are extra configuration steps, extra technical knowledge required, and extra costs. What used to be straightforward and easy is now complicated and costly, and we have sleazy spammers and intransigent corporations to thank for it.

You’re never too experienced to make a boneheaded mistake, especially when the action that initiates it looks extremely benign

A big scary warning message

That warning bar doesn’t exist. But if it did, I maybe wouldn’t have made the stupid mistake I made on Sunday morning.

The big mistake was actually the result of my hasty efforts to patch up a much smaller mistake. As it goes.

I had mistakenly deleted a bunch of WordPress themes on a multisite (but not Multisite; the difference is worth discussing in more detail at a later time) installation that Sara and I use for all of our blogs — including this one.

I didn’t want Sara to see that her themes had been trashed, so I quickly logged into my Digital Ocean account and attempted to restore a copy of the latest backup of the server. My intention was to spin up a new copy of the server, from the most recent backup, go in there to grab the theme files I had accidentally deleted, copy them into the live server, then destroy that temporary copy of the server.

Here are the options I was presented with:

Given that menu, what would you do? Never mind the fact that I’ve been a Digital Ocean client for over a decade and have dealt with this exact menu numerous times before. I should have known better. But I was in a hurry and not thinking clearly.

Did you select Restore Droplet? Congratulations. Just like me, you have now destroyed the live server.

Maybe all of this wouldn’t have been so bad, but the weekly backups on this particular server take place on Monday nights. Which means we were working from a 6-day-old backup. And wouldn’t you know it, both Sara and I (but especially Sara) had been unusually active on our blogs this past week.

So yeah, I clicked Restore Droplet, which immediately, without any additional warning or confirmation, completely and permanently wiped out the live version of the server and replaced it with the contents of that 6-day-old backup. A week’s worth of blog posts and site edits, vaporized with absolutely no possibility of recovery.

What was the correct action? As I already knew, it was to select Convert to snapshot, then navigate in the side menu to Snapshots and from there, create a new “Droplet” (Digital Ocean’s term for a virtual private server) from the snapshot. As I said, I’ve done this exact process many times over the past several years, when a client accidentally deleted something and I needed to help them recover it.

Ugh.

Anyway… if this blog happens to have any keen observers, you/they may have noticed that the two most recent blog posts had disappeared. No, this was not intentional. And they are gone forever. Fun.

Update: According to Digital Ocean support, the Restore Droplet option does present an additional warning before continuing, but (in my opinion) it is not nearly aggressive enough to make you stop what you’re doing if you’re being hasty like I was.

Noted for future reference: Fixing a slow-to-boot Linux server

I have a few Ubuntu Linux VPSes that were originally spun up on the then-latest-and-greatest 16.04 LTS. Over the past year I’ve been belatedly upgrading them to 20.04 LTS. Almost without exception, all now have a really irritating flaw: where they used to reboot almost instantly — making me capriciously run OS updates involving reboots at any time of day, even on servers with a bunch of sites on them, since it only meant a blip of about 5 to 7 seconds — they were now consistently taking 2 to 5 minutes to reboot. Yikes!

Poking around, I learned that cloud-init was timing out, causing that delay, but since systems administration is just a small sliver of the work I do, I never had a chance to investigate why it was happening or really what cloud-init was even for. I just resigned myself to having to do those reboots in the middle of the night when no one would notice.

Well, I finally decided I needed to get an answer, and I found it. If I’m reading this correctly, cloud-init is really only needed during the initial creation of the VPS, and can safely be disabled after that. So, let’s do it!

touch /etc/cloud/cloud-init.disabled

I’m pleased to say, it works perfectly. I ran it on a test mirror of my biggest server and it worked, so I then applied it to the live server and… capriciously rebooted it, right in the middle of the day!

Shush. It worked.


Update (August 3, 2022): Mayyyyybe it’s a bit more complicated than what I described above. I just went into another server I had previously updated to 20.04, and I just went ahead and pre-emptively ran this before an update that required reboot. After the reboot, I could not connect to the server at all, other than through Digital Ocean’s direct access console. Thank goodness for that. It did not initially occur to me that this change might be why I couldn’t connect, but after trying a few other fixes without success, I just went in and deleted this new /etc/cloud/cloud-init.disabled file and rebooted, and everything came up just fine… and without any kind of delay on boot. Weird.

How to get your upgrade from Ubuntu 16.04 LTS to 20.04 LTS to work properly

So, like me you’ve decided to procrastinate on the OS upgrade on your server running Ubuntu 16.04 LTS. Yes, support for it ended almost 6 months ago. Maybe you’ve even set up ESM (Extended Security Maintenance) to avoid the upgrade. But if you’re like me you may also have discovered that you can no longer access new PHP versions from the “ondrej” repository once LTS ends. So you reluctantly acknowledge that it would be a very good idea to do the OS upgrade.

But it doesn’t work.

That’s OK, I’m here to help. A few months ago I scraped together tips from a few different resources (which I regrettably did not make note of for citation here), and came up with a rough script for myself that seems to work. I’m running it again today on another server, so I thought I’d consolidate it into a blog post here while I’m at it. Let’s go.

Before the 16.04 to 18.04 update

First off, caveat emptor. This is what I did, and it worked. If you do the same and it breaks something, don’t blame me. Also note that I am brazen and always run sudo -s before I get started. If you don’t like to do that, you may need to prefix some of these commands with sudo in order for them to work.

It’s probably also a good idea first to do a full backup of your server that you can restore from if things go totally off the rails.

Yes, this is going to be a two-step upgrade. First we’ll upgrade from Ubuntu 16.04 LTS to 18.04 LTS. Then we can do the 18.04 LTS to 20.04 LTS upgrade to get ourselves current. Then we won’t have to think about this for another 3 1/2 years! Whew!

There’s some stuff to do before you run that do-release-upgrade command. First, edit /etc/apt/sources.list in your text editor of choice (I like nano) and add these lines at the end:

deb http://archive.ubuntu.com/ubuntu/ xenial main universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-security main universe multiverse

Save that. Then if you’re running ufw, which you should be, do this to give yourself a back door in case something goes wrong (assuming you’re running this update over ssh, which of course Ubuntu will warn you is a Bad Idea).

ufw allow 1022


Hold on! Failure may be looming in the near future. Before doing anything else, try this:

apt-get update
apt-get upgrade

I almost always run apt-get dist-upgrade instead of just apt-get upgrade because I always assumed the former does everything the latter does, and more. Not so. I’ve run into a problem on some servers with messages that cloud-init was held back. My fix for that is to do this:

apt-mark unhold cloud-init
apt-get upgrade
reboot

When things come back, let’s do this:

apt-get dist-update
apt-get autoremove

Then proceed as normal…


Now we’re ready to get started. Run these commands:

apt update
do-release-upgrade

Don’t walk away at this point… you’re going to have to answer some prompts along the way. In general I would always recommend keeping your existing versions of any files it asks you about. Once all of that is over, your server will reboot, and you should be able to log back into a fully functioning Ubuntu 18.04 LTS install. I would recommend testing whatever services you have running on the server, just to be sure everything is working properly, before you continue to the 20.04 LTS upgrade.

After the 16.04 to 18.04 update, before the 18.04 to 20.04 update

Run all of the regular updates, just to be sure you’re fully current. If you’re brazen like me you’ll tackle that with one command:

apt update; apt -y dist-upgrade; apt -y autoremove; reboot

Once you’re back up and running, throw this in. I don’t recall why I needed it now, and if you’re not using cURL you may not need it at all, but anyway, I did this here:

apt install libcurl4

Now, just to be safe, let’s back up the existing /etc/apt/sources.list file, because we’re going to replace its entire contents with this:

deb http://in.archive.ubuntu.com/ubuntu/ focal main restricted universe multiverse
deb-src http://in.archive.ubuntu.com/ubuntu/ focal main restricted universe multiverse

deb http://in.archive.ubuntu.com/ubuntu/ focal-updates main restricted universe multiverse
deb-src http://in.archive.ubuntu.com/ubuntu/ focal-updates main restricted universe multiverse

deb http://in.archive.ubuntu.com/ubuntu/ focal-security main restricted universe multiverse
deb-src http://in.archive.ubuntu.com/ubuntu/ focal-security main restricted universe multiverse

deb http://in.archive.ubuntu.com/ubuntu/ focal-backports main restricted universe multiverse
deb-src http://in.archive.ubuntu.com/ubuntu/ focal-backports main restricted universe multiverse

deb http://archive.canonical.com/ubuntu focal partner
deb-src http://archive.canonical.com/ubuntu focal partner

(Looking just now at that in.archive.ubuntu.com domain, I’m not sure if that’s a mirror in India and you could just use archive.ubuntu.com, or what. Just an observation. Do as you will.)

Once you’ve saved those changes, run these commands:

apt update
apt install –reinstall ubuntu-keyring
do-release-upgrade

Again with the do-release-upgrade command you’ll need to follow the prompts, keeping your existing copies of config files when asked.

I’ve found that after running this, the server might not come back up on its own. I had to log into my Digital Ocean account after a few minutes and do a hard reboot, but once I did that, everything came back fine and I was on Ubuntu 20.04 LTS!

One additional note: one server I ran this on needed all of the PHP 7.4 libraries reinstalled for a reason I could not determine. Your exact setup may vary, but this is what I ran to resolve that issue:

apt install php7.4 php7.4-cgi php7.4-cli php7.4-common php7.4-curl php7.4-dev php7.4-gd php7.4-gmp php7.4-json php7.4-mysql php7.4-odbc php7.4-bcmath php7.4-bz2 php7.4-mbstring php7.4-soap php7.4-zip

And then I needed to edit the corresponding php.ini file with my preferred settings.

Slow server? Don’t overthink it. (And don’t forget what’s running on it.)

I’ve just spent the better part of a week troubleshooting server performance problems for one of my clients. They’re running a number of sites on a dedicated server, with plenty of RAM and CPU power. But lately the sites have been really slow, and the server has frequently run out of memory and started the dreaded process of thrashing.

Fearing inefficient code in cms34 may be to blame, I spent a few days trying to optimize every last bit of code that I could, which did make a slight improvement, but didn’t solve the problem.

Then I spent a few more days poring over the Apache configuration, trying to optimize the prefork settings and turning off unnecessary modules. Still, to no avail, although getting those prefork settings optimized, and thus getting Apache under control, did allow me to notice that MySQL was consuming CPU like mad, which I had previously overlooked.

Hmmm… that got me thinking. I fired up phpMyAdmin and took a look at the running processes. Much to my surprise, almost every MySQL process was devoted to an abandoned phpBB forum. Within moments I realized the forum must be the source of the trouble, which was confirmed when I found that it had over 500,000 registered users and several million posts, almost all of which were spam.

As quickly as I discovered the problem, I was back in the Apache configuration, shutting down the forum. Then a quick restart of MySQL (and Apache, for good measure), and the sites were faster than I’ve seen them in months.

The moral of the story: if you have a web server that suddenly seems to be grinding to a halt, don’t spend days optimizing your code before first looking for an abandoned forum that’s been overrun by spammers.