A WordPress URL rewrite rule to phase out year/month folders in the Media Library

This is one of those fixes where it is probably worth me explaining the very specific use case I needed it for first, to serve as an example of exactly why anyone would need this, because on the surface it may seem pointless.

Picture this: Restaurant client. Table cards with QR codes linking directly to menu PDFs, so customers can look at the menu on their phones instead of using a physical print menu that has been handled by dozens of other people.

Note to the future: I’m not sure what the restaurant experience looks like in your world. What I’m describing may be ubiquitous for you now, or may be a complete head-scratcher. Assuming it’s safe for you to touch your head. For context, I am writing this in the midst of the 2020 COVID-19 pandemic.

Here’s the problem: Menus change. URLs referenced by a QR code do not. By default, WordPress automatically creates year and month subfolders inside wp-content/uploads and puts files in the folder for the year and month the post they’re attached to was created, or if you’re uploading directly into the Media Library, not attached to a post, then the year and month the file was uploaded.

So that means that the URLs embedded in my client’s QR codes contain 2020/09. But now it’s October, so if they upload a replacement file today, its URL will contain 2020/10 and the QR code will not work. I should note at this point that I do not like the default WordPress behavior of putting files into these subfolders, but I sometimes forget to turn off this setting when I’m creating a new site, or — as is the case here — I’m working on a site someone else originally set up.

My solution: Turn off year/month folders, so that any newly uploaded PDFs with the same filename will have the same URL. (Assuming the client deletes the old one first!)

You may be thinking, well, that’s great, if you had done this before the QR codes were created. Yes, exactly. That’s where this rewrite rule comes in.

When you turn off the year/month folder setting, it doesn’t move any existing files or change any code that links to them. This purely affects new uploads going forward. So what I need is a rewrite rule that will allow existing file URLs with the year/month path to continue working, while automatically removing that bit from the URL and trying to find the same file in the main uploads folder, if there’s no file at the year/month URL.

OK, here’s the code:

# Redirect file URLs from year/month subfolders to base uploads folder if not found
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /wp-content/uploads/
RewriteRule ^index\.php$ – [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([0-9]{4})/([0-9]{2})/(.*) /wp-content/uploads/$3 [L,R=301]
</IfModule>

This should go not in your main .htaccess file, but in an .htaccess placed inside your wp-content/uploads folder.

Let’s assess what’s going on here, line by line.

# Redirect file URLs from year/month subfolders to base uploads folder if not found

Just a comment so we remember what this is all about. You may think you’ll remember. But you probably won’t. Comments are your future friend.

<IfModule mod_rewrite.c> and </IfModule>

Apache configuration conditionals wrapper for all of our actions, to make sure this code doesn’t run if mod_rewrite isn’t enabled. Honestly I often leave this out because… come on, the entire site is going to be broken if mod_rewrite isn’t enabled.

RewriteEngine On

If you don’t know what this is about, RTFM. (The “F” is directed at the manual, not you. I hate the Apache documentation.)

RewriteBase /wp-content/uploads/

This is the reference point for the ^ later on. Needs to be the relative path of the uploads folder below your WordPress site’s base URL.

RewriteRule ^index\.php$ - [L]

Honestly we probably don’t need this line, as there shouldn’t be any index.php files inside your uploads folder anyway, but it just feels weird not to include it. This just says “don’t do any rewrites to the index.php file.”

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

These lines are very common in this type of rewrite instruction set, and in fact come straight from the default WordPress rewrite rules. They are saying, only apply the following rules to URLs that don’t match existing real files or directories under this path. This is critical to keep all of your existing Media Library URLs working.

RewriteRule ^([0-9]{4})/([0-9]{2})/(.*) /wp-content/uploads/$3 [L,R=301]

This is what we’re here for. Note we are using the magic of regular expressions to make this work. There are three parenthetical groupings, though we technically could eliminate the first two sets of parentheses and change $3 to $1, but I just like having the parentheses to help me keep things straight.

([0-9]{4}) is matching a 4-digit number, representing the year folder.

([0-9]{2}) is matching a 2-digit number, representing the month folder.

And (.*) is matching… anything, representing the filename. That’s the bit we want to reference in the replacement string, /wp-content/uploads/$3 which tells Apache to serve up the filename from the year/month URL directly out of the uploads folder itself.

The final bit, if you’re not familiar with rewrite rules, [L,R=301], just says this is the last rule the previous set of conditions applies to, and that it should return an HTTP 301 (permanent redirect) status along with the redirect, which is good SEO karma.

I spent 5 hours troubleshooting this WordPress problem so you don’t have to (starring: WooCommerce Action Scheduler)

Sorry for that “clickbaity” headline. I added the parenthetical so it might be at least marginally useful. Since my WordPress-related posts are always about how I solved a particularly weird or obscure WP issue, I usually consider their titles carefully. “What would I have googled to find a solution to this problem?” But honestly, I spent 5 hours on this yesterday partly because I wasn’t sure what to google. (And I use lowercase “google” as a generic for “conduct an Internet search”; I normally use DuckDuckGo.)

OK, so here’s the situation. This particular site is — among my normally very-low-traffic clients — one of the busiest I work on. It’s a WooCommerce site with hundreds of products and 20+ daily orders. (Yeah, 20+ orders a day is not huge, but on the scale I normally deal with, it’s a lot.)

This site runs on its own virtual private server, with 8 GB RAM and 4 vCPUs. Pretty substantial for a single site. And yet, for weeks it has been maxing out RAM and CPU resources. Not to the point where the site was in crisis mode that demanded my immediate attention, but it was frustratingly slow. Just slightly below the threshold of me dropping work on other new projects to try to fix this. (At this point I feel obliged to note that I did not actually build or maintain this site for its first couple of years of existence, so I don’t know its inner workings as well as I normally would. I just know it’s running way too many plugins and desperately needs some TLC I have not had time to give it.)

Yesterday things finally got to the breaking point. For me, at least. The client had contacted me about an unrelated issue, but as I was dealing with that, I got frustrated by seeing all of this inexplicable resource usage, so I had to address it.

As it happens, this post is actually a bit of a sequel to my last post, about getting Apache’s mod_status and mod_rewrite to play nicely on a WordPress site. About three weeks ago I finally got mod_status working on this site, and had planned to come back, when I had a chance, to investigate this issue.

If you are not familiar with mod_status, you should check it out. Apache is generally a bit of a “black box” but this lets you see exactly what’s happening with each thread — the requested URL, the requesting IP address, connection time, resource usage, etc.

I noticed an absurd number of threads were coming from the localhost and were requesting wp-admin/admin-ajax.php with a query string referencing WooCommerce’s Action Scheduler. But what to do with that information?

I’ll admit, this is where I wasted a bunch of time in fruitless searches, because I don’t know a lot about Action Scheduler. I read a few threads on the WordPress support forums and StackOverflow that kind of danced around the problem I was having but never really got at it.

Eventually I ended up in phpMyAdmin, scrutinizing the wp_actionscheduler_actions table, and trying to figure out where all of the wc_facebook_regenerate_feed actions were coming from. I used my old favorite Search-Replace-DB to try to find any instances of “facebook” in the database. (This was an utter failure, for reasons I can’t explain. But that failure was critical to why this took me so long to resolve.)

I went to Tools > Scheduled Actions in WP admin and discovered there were over 200,000 actions, although there were only about 70,000 (only!) showing up in wp_actionscheduler_actions. Mystery!

I went to wp_actionscheduler_actions again, saw that those wc_facebook_regenerate_feed actions had all been scheduled weeks ago, and decided to just chuck out the lot. I truncated the table, but within seconds it started filling up again with hundreds of wc_facebook_regenerate_feed actions, with the same weeks-old scheduled dates. Where were they coming from???

What was especially maddening to me about all of this was that I had already, weeks ago, determined that the plugin that had created these — Facebook for WooCommerce — had been causing some kind of trouble, and I had deactivated it. Yesterday I even went so far as to delete it. I scoured the theme code for references to Facebook. I looked in the file system for stray files that might be responsible. And as I mentioned before, I tried to search the database for any references to Facebook. I was getting nowhere.

Eventually I realized Search-Replace-DB was having problems, so I dove into phpMyAdmin directly and started searching individual fields, in individual tables, for “Facebook”. And that’s where I finally figured it out.

WordPress puts everything in wp_posts, and that’s a problem.

I’ve complained over the years about the database architecture in WordPress. Having built multiple custom CMSes in the years prior to when I finally, fully embraced WordPress in 2014, I have a fair bit of experience designing databases. And two things I learned in that experience were: 1) clearly define what your data tables are for, and 2) indexes make databases efficient. WordPress is awesome for many things, but it has far, far outgrown its original conception as blogging software. Custom Post Types and Custom Fields make it super-flexible, but shoving everything they create into wp_posts and wp_postmeta can create a disastrous situation.

Case in point, WooCommerce scheduled actions. In earlier iterations, those were custom posts! (As are, still, WooCommerce orders, which is totally f***ed up, if you ask me.) At some point Woo or Automattic realized scheduled actions don’t belong in wp_posts, so they created four new tables specifically for managing them. Plugins that use scheduled actions had to create new scheduled actions for migrating the old wp_posts scheduled actions into the new tables.

And that’s where I found myself. Through some curious set of circumstances with this particular site, which probably at some point included someone other than me disabling WP-Cron to try to fix some other problem, 200,000+ scheduled actions from the Facebook for WooCommerce plugin (in the wp_posts table) got queued up for migration to the new tables. And as quickly as I was deleting them from the new tables, Action Scheduler (which runs once a minute!!!) was dutifully refilling them.

(And obviously they were never actually running… perhaps because I had deactivated the plugin? Or because they were simply timing out? Who knows? But here’s something I see as a flaw with Action Scheduler: it should check to see if the plugin that scheduled an action is currently active, and if not, purge the action immediately.)

At last here was the fix. I had to run this SQL query in phpMyAdmin. (Proceed with caution! Don’t just use this code… look in your database for exactly what is causing problems and adjust accordingly.)

DELETE FROM `wp_posts` WHERE `post_title` = 'wc_facebook_regenerate_feed';

Note: I’m doing this from memory — and a glance back at my browser history from yesterday. I didn’t keep notes on exactly what the title was.

For a more generalized — and drastic — approach, you could also do this:

DELETE FROM `wp_posts` WHERE `post_type` = 'scheduled-action';

I’ll just conclude here with a nice little graph of the site’s CPU and RAM usage over the past 24 hours. It was 6 PM when I finally figured this out!

How to get Apache’s mod_status and mod_rewrite to play nicely on a WordPress site

Apache’s mod_status can be very handy for monitoring exactly what’s going on inside of Apache on a busy website, but it can be a bit difficult to set up, if your site runs something like WordPress that also relies heavily on Apache’s mod_rewrite.

Specifically even though I had set up mod_status according to the official instructions, and specifically had also added the code to the virtual hosts, I still found that trying to access a site’s /server-status URL was just redirecting me to the WordPress 404 error page.

Here’s the fix. Maybe there’s a “better” way, but this worked for me. I just needed to hijack the rewrite rules in the site’s .htaccess file.

If you’ve already got IP or Auth based access restrictions configured in the virtual host, you probably don’t need the RewriteCond line, but I prefer to err on the side of caution. I used my VPN’s IP address (masked as 9’s here, which of course is not a valid IP address)… you’ll want to fill in whatever IP address(es) you want to allow in.

RewriteEngine on
RewriteCond %{REMOTE_ADDR} ^(999\.999\.999\.999)$
RewriteRule ^server-status$ – [L]

Put this before the WordPress rewrite rules, or it won’t do any good. And of course this is missing the <IfModule mod_rewrite.c> wrapper you probably should include, but if you’re doing this you already know mod_rewrite is enabled, so I don’t bother.

SPF for dummies (i.e. me)

For a while I’ve known that (legitimate) outgoing email messages originating from my web server were occasionally not reaching their intended recipients. I also knew that there was a DNS change you could make to help prevent this problem, but I didn’t know any more about it and it was a marginal enough problem that I could just put it off.

Finally today I decided to deal with it. And I was (re)introduced to the SPF acronym. No, that’s not Sun Protection Factor, or Spray Polyurethane Foam, or even Single Point of Failure (although in my case perhaps that last one is accurate). No, it stands for Sender Policy Framework, and it’s an add-on to the core capabilities of DNS that provides a way to positively identify the originating servers of outgoing email messages.

My situation is simple: I have a domain name that needs to be able to send mail from either my mail server or my web server. Most of the tutorials I found for SPF were far too convoluted to address this simple arrangement. Then I found this post by Cyril Mazur which provided the very simple answer:

v=spf1 a mx ~all

Simply add the above as a new TXT record in your DNS zone file, and you should be set.

A follow-up on Apache not starting on my web server

About 6 weeks ago, I wrote about a problem I was having with Apache not starting with SSLEngine on. I ended the post somewhat ominously with the following:

I’m a little concerned that Apache is going to require manual input of these pass phrases again whenever it restarts (e.g. if the server reboots). I hope not, but for now I am at least able to move forward knowing it works at all.

This morning, a little before 6 AM, that happened. I was awakened by notifications (with their attendant beeps and nightstand vibrations) on my iPhone that my web server was down. Great. Half-awake, I fired up my hosting provider’s handy iPhone app, tapped the “Hard Reboot” button, and tried to go back to sleep. Except, the notifications kept coming. Eventually I was awake enough to realize that the server was coming back up, but Apache wasn’t. Time to get up and deal with this problem from a real computer.

SSHed in, I tried manually starting Apache, and got this:

(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
Unable to open logs

What the crap? After spending a half hour visually scanning log and configuration files, to no avail, I decided I needed to try to find out what was running on port 80. This page was helpful in that regard. I ran the command lsof +M -i4 and found that, whaddayknow, Apache was running. Apparently. But I couldn’t shut it down, and I couldn’t restart it. There were no signs of any compromise of the system’s security, so I just chalked this up to some minor problem deeply buried somewhere in a configuration file that I have yet to track down (but which is probably my fault). At any rate, lsof gave me what I really wanted: the process ID that was listening on port 80. Time for the dreaded kill -9 command.

After that, I tried starting Apache again, and it worked… and, as I suspected, it did ask for the pass phrases again. But now, all is well. (Except for the nagging feeling of not knowing what caused this to happen in the first place. Stay tuned…)