PHP script to blast through a directory and shrink all of the images

This scenario needs a bit of setup, because without knowing the context, it would be logical to say, “Wait, you really should do this a different way.” Trust me. I tried all of the other ways.

I have a client who manages a grant program, and the grant recipients go to my client’s WordPress website to submit receipts and photos via Gravity Forms, in order to get their reimbursements.

The thing is, when you have hundreds of people who each need to submit multiple (sometimes many, sometimes “how the hell could you possibly need to submit that many???”) photos that they’ve taken on their phones, you’re going to end up with thousands of unnecessarily large JPEGs filling up your server.

I didn’t know, until the disk was already near capacity, that the client was even doing this, so I just needed to figure out a way to deal with in excess of 100 GB of disk usage. The client and their partner agency need to be able to continue accessing these files through the Gravity Forms admin, so I can’t move them somewhere else or even change the filenames — Gravity Forms’ gatekeeping download script won’t be able to find them.

There are some Gravity Forms add-ons for processing file uploads, but they generally need to be put in place before people start uploading files. So I resorted to a brute-force, command line PHP script to get the job done.

Fortunately the GD library is great for manipulating images. Its PHP functions are perfect for opening JPEGs and PNGs, creating new ones, scaling them down, cranking up the compression, etc. These images don’t have to be hi-res and beautiful; they just need to be legible. So I can really go to town on them, reducing the file size of many of them by 90% or more.

Here’s my script, with some added comments. It’s a bit quick-and-dirty. The shrink.php script file has to be in the same directory as the images, and it doesn’t work recursively. This restriction was mainly just to get something cobbled together, but I actually see it as a benefit because if something goes wrong, you’re limiting your losses to a single directory.

Initially I had a minor error that would cause the script to continue to cut the dimensions of images in half if you ran it multiple times, which I only realized after I accidentally ran it twice on the same directory — oops. (I was thankful at that moment that the single-directory restriction exists!)

Anyway… here’s what the script does. It blasts through all of the files in the same directory as the script itself, looking only for PNG and JPEG images.

For each image, it creates a new version at the same aspect ratio, but with a width of 1600, 800, or 400 pixels — scaling down to the nearest size from whatever the original size was. (It doesn’t do anything with images under 400 pixels wide.)

Then it re-saves them with more extreme compression applied. 70% quality for JPEG, and… well… as much compression as PNG allows. (I don’t really know much about PNG compression.)

If it finds that it didn’t actually reduce the file size, it restores the original. Otherwise, the original is gone, and your disk space is reclaimed.

It provides some verbose output telling you how much it reduced each file, and then gives an overall total reduction in MB when it’s complete.

This script could be polished up more, but even as it is I think it’s a lot better than the confusing and error-riddled suggestion I found on StackOverflow (on which it’s very loosely based, combined with a healthy dose of consulting the official PHP documentation). Enjoy!

// Get the list of files from the current directory
$images = glob('*');

// We'll tally up our savings
$total = 0;

// Loop through all of the files and run the function on them
foreach ($images as $file) {
  $formats = array('png', 'jpg', 'jpeg');
  $ext = strtolower(pathinfo($file, PATHINFO_EXTENSION));
  
  // We only want to process PNG or JPEG images
  if (in_array($ext, $formats)) {
    $total = $total + resize_image($file);
  }
}

// All done; tell us how we did
echo "=====================\n" .
     "SHRINK COMPLETE\n" .
     "TOTAL REDUCTION: " . round($total / 1024, 2) . " MB" .
     "\n\n";


// The resizing function (obviously)
function resize_image($file) {
  $formats = array('png', 'jpg', 'jpeg');
  $ext = strtolower(pathinfo($file, PATHINFO_EXTENSION));
  
  // Yes this is a redundant format check, just to be safe
  if (in_array($ext, $formats)) {
  
    // Get the old file size for later reference
    $old_size = round(filesize($file) / 1024, 2);
    
    // Set our maximum allowed width
    $max_w = 1600;
    
    // Get the dimensions and aspect ratio of the original file
    list($old_w, $old_h) = getimagesize($file);
    $ratio = $old_h / $old_w;
    
    // Set the new dimensions
    // Shrink to $max_w, 1/2 of $max_w or 1/4 of $max_w, depending on original size
    // >= is important so we don't keep cutting the dimensions in half if we run it again!
    $new_w = ($old_w >= $max_w) ? $max_w : (($old_w >= $max_w / 2) ? $max_w / 2 : $max_w / 4);
    $new_h = $new_w * $ratio;
    
    // Create our new image canvas
    $new_img = imagecreatetruecolor($new_w, $new_h);
    
    // Get the original image, minding the source format
    $old_img = ($ext == 'png') ? imagecreatefrompng($file) : imagecreatefromjpeg($file);
    
    // Scale down the original image and copy it into the new image
    imagecopyresampled($new_img, $old_img, 0, 0, 0, 0, $new_w, $new_h, $old_w, $old_h);
    
    // Keep a backup of the old file, for the moment
    rename($file, $file . '_BAK');
    
    // Save the newly reduced PNG...
    if ($ext == 'png') {
      imagepng($new_img, $file, 9, PNG_ALL_FILTERS);
    }
    // ...or the newly reduced JPEG
    else {
      imagejpeg($new_img, $file, 70);
    }
    
    // Did it work?
    if (file_exists($file)) {
    
      // Get the size of the new file, and the difference
      $new_size = round(filesize($file) / 1024, 2);
      $diff = $old_size - $new_size;
      
      // Hold up -- the old one is smaller! We'll restore it
      if ($diff < 0) {
        echo "Unable to reduce size of " . $file . "; original file restored.\n";
        unlink($file);
        rename($file . '_BAK', $file);
        
        // We're returning the KB savings, which is 0 in this case
        return 0;
      }
      
      // We reduced the file successfully, let's report success and delete the backup
      else {
        $pct = round(($diff / $old_size) * 100, 2);
        unlink($file . '_BAK');
        echo "Shrunk " . $file . " from " . $old_size . " KB to " . $new_size . " KB (" . $pct . "% reduction)\n";
      }
    }
    
    // It didn't work; report the error and restore the backup
    else {
      echo "Error processing " . $file . "; original file restored.\n";
      rename($file . '_BAK', $file);
      return 0;
    }
    
    // Pass back the KB savings so we can calculate a grand total
    return $diff;
  }
  
  // We didn't do anything so return 0 to keep the running total going
  return 0;
}

P.S. As always, all code samples are provided as-is with no warranty. Don't blame me if you use this and it blows something up!

P.P.S. I finally bothered to install prism.js on here for this. Now I need to back through and edit all of my older posts with code samples in them. Ugh.

A.I. smoke and mirrors

These days in tech, everything has to be A.I. Whatever that means.

When a term doesn’t really mean anything, you can literally make it mean anything.

At the level of the huge tech corporations, A.I. really means LLMs, which are essentially just massive prediction machines. They definitely can’t “think,” but the way they work is not entirely understood even by the people who built them, and is not understood at all by the average person, so their responses to our prompts land somewhere in our perception between “magic” and “human-like consciousness.” In reality they’re neither, but it doesn’t matter, as long as they promise massive profits at some point in the future… even if they require tremendous input of money and energy resources in the meantime.

And then at a lower level, companies that definitely do not have the resources of Google or Meta are nonetheless suddenly offering “A.I.” in just about everything they do.

But those of us who have been around tech as long as I have remember the early 2000s, when “wizards” started appearing everywhere. Wizards were really nothing more than step-by-step scripts with some simple branching logic. But they were dressed up with friendly on-screen text and whimsical designs in ways that gave the impression that the computer was interacting with you on almost human terms.

It’s clear today that all of this low-budget “A.I.” is really just wizards by another name, which, again, are just simple scripts with branching logic, by another name.

Nothing in this industry is real.

Below is a composite screenshot of the “conversation” I had with the so-called “AI Troubleshooter” from a hosting company that shall remain nameless. (Let’s just say… it’s more than a hosting company, but it’s not the “hostingest” company.)

It’s clear to me that there is not really anything more to this than a script that fires off when a 500 error gets logged. The script parses the log error, scans the site for common, well-known configuration issues, and applies standard corrections and/or resets the configurations to their defaults, then checks again for errors. The script is dressed up in exactly the same kind of “friendly,” quasi-human language of those classic “wizard” scripts from before “A.I.” became the buzzword du jour.

I’d insert the eyeroll emoji here except I have WordPress emoji disabled on this site, courtesy of my No Nonsense plugin. Oh God, I just did self-promotion. But I promise No Nonsense includes absolutely zero A.I. B.S.

Solution to the overflow-x: hidden; overflow-y: visible problem

File this one under “notes to my future self,” because I’m sure I’ll search for it again, and having a blog post written about it in my own words makes it much more likely that my own idiosyncratic search for it will turn up this post.

For reasons I don’t care to understand, the CSS overflow-x: hidden forces the browser to assume overflow-y: auto (and vice-versa). If you set one dimension’s overflow value to be hidden, the other will, by necessity, be auto, and there’s nothing you can do about it.

I have a situation where — possibly due to some other stupid mistake I made in the midst of a big code push 3 years ago — animations generated by a certain popular library cause the page to add an undesirable horizontal scroll. So I added overflow: hidden on the outer container for the page content.

But I also have this theme coded to support a semi-transparent header bar, and if that’s turned on, and the first element on the page is a cover image, I want the cover image to tuck behind the header bar, by way of negative top margin. Of course, if the container element has overflow: hidden applied, that negative top margin on the child element gets hidden.

So I tried changing overflow: hidden to overflow-x: hidden and added overflow-y: visible. Which, as I have seen, doesn’t work.

Enter CSS guru Chris Coyier. I don’t know why I didn’t find his post on this matter earlier, but I am relieved that I did, because there’s a very simple solution:

Don’t use overflow-x: hidden. Use overflow-x: clip instead. Then overflow-y: visible works, and everyone’s happy!


Update (July 11, 2025): Then again, maybe not everyone’s happy. Specifically Safari users, which includes me. There’s some way that the Animate on Scroll code is tangled up in this, but with that removed (or in some cases with that not removed, but just depending on its settings), this still doesn’t work reliably in Safari, even though Safari does supposedly support clip now.

Anyway, I realized that for my purposes there’s a simpler solution. Just set overflow-x: hidden (or maybe clip but honestly I don’t care at this point) on the body tag itself, preventing any horizontal scrolling on the page. Then you can put overflow-y: visible on the element you want to exceed its vertical boundaries, and everything seems to work. At least today. I’m sure I’ll come back to this exact code again next week and it won’t be working, and I’ll have no idea why.

I’m getting too old for this shit.

Crazy WordPress tricks: Exporting ACF Flexible Content field data with MySQL

I have a client whose WordPress website is basically shutting down, and they need to keep some archives of the existing version of the site.

I’m making three different archives, for various purposes:

  1. Full backup. Of course, I’m doing this. A full backup of the database using mysqldump and a full archive of the site files using tar -cvzf. I like to do these things at the command line, and that’s what I’m doing here, but if I don’t have terminal access, I use the Updraft Plus plugin. This backup is not human-readable, but it allows us to spin up the site again on a new server in the future, if needed.
  2. Crawled static snapshot. Once again I’m at the command line, this time using wget to crawl the entire site and make a static copy we can just stick on a thumb drive and the client can navigate like it’s a live site. This is the most useful for day-to-day reference for the client, or if they need to show anyone what content existed on the site at the point when it was taken down.
  3. CSV export. This is, honestly, the clunkiest element, but it’s not without its uses. Yes, exporting content directly from the database into a spreadsheet means a lot of raw HTML. But it’s handy as a reference tool if the client needs to quickly find a bit of text that might have appeared anywhere on the site.

The tricky thing with the CSV export is, how do we go about that? That’s what this post is about.

My go-to tools for any WordPress import/export task are WP All Import and WP All Export. These are super flexible plugins that let you get really granular with selecting and organizing the data you’re working with, and they’re especially great in situations like where a client needs to go through and update a bunch of SEO content all at once. They can just blast through a spreadsheet instead of tediously editing individual pages and posts one-at-a-time.

I especially like the fact that there are add-ons that support Advanced Custom Fields. I make extensive use of ACF (and did even more so before Gutenberg matured). My old theme this client’s site runs on stores almost all of its content in ACF Flexible Content fields rather than in the standard WordPress content block.

But here’s where things get messy. The standard content block is a single field in the post’s main record in the wp_posts database table. But ACF stores each individual content field as a separate record in the wp_postmeta table. And a Flexible Content field isn’t just one ACF field; it’s an indefinitely large group of fields, depending on how complicated your page content is.

So there’s a one-to-many relationship between posts and Flexible Content records in the database. WP All Export makes it easy to pull in all of the Flexible Content subfields, but… yikes… it makes each one a separate column in the resulting CSV.

I ran WP All Export on this client’s content, and the CSV contained over 4,000 columns. Not at all easy to work with. I wondered, “Is there a way to concatenate all of the Flexible Content subfields into one column?” Granted, you’re going to lose the keys, but depending on your purposes — such as mine, here — that doesn’t matter. I just need the content, I want it in a single cell for each post, and I don’t care if there’s extra clutter in there too (e.g. booleans, key IDs for attached images, etc.). As long as the full text content is there, I’m good.

I quickly realized that, even if WP All Export does have a way to do this (although it seems like it doesn’t), it would probably take me less time to figure out a direct SQL query for the task rather than figuring out how to do it with WP All Export and custom PHP functions.

Yes, there is a way!

The key is the MySQL GROUP_CONCAT function. I can perform a subquery on the wp_postmeta table for my Flexible Content subfields and concatenate that into a single field in the query output.

Here’s my query. (Note: This is based directly on the example I found here, modified to be WordPress-specific.)

SET SESSION group_concat_max_len = 32767;
SELECT DISTINCT
  p.`ID`,
  p.`post_type`,
  p.`post_title`,
  p.`post_name`,
  p.`post_content`,
  (SELECT DISTINCT
    GROUP_CONCAT(m.`meta_value` SEPARATOR '|')
    FROM
      `wp_postmeta` m
    WHERE
      m.`post_id` = p.`ID`
      AND m.`meta_key` LIKE 'modular\_content\_%'
    GROUP BY
      m.`post_id`
  ) as `modular_content`,
  p.`post_excerpt`,
  p.`guid`,
  p.`post_status`,
  p.`post_date`
FROM
  `wp_posts` p
WHERE
  p.`post_type` IN ('page', 'post', 'tribe_events')
  AND p.`post_status` NOT IN ('inherit', 'revision', 'trash')
ORDER BY
  p.`post_type` ASC, p.`post_title` ASC;

What do you need to know about this?

First off, GROUP_CONCAT by default truncates the data at 1024 characters. The first SQL statement bumps that up to 32,767. Guess what, that’s probably not long enough, either, but since it’s the maximum character length of a cell in Excel, which is what my client will most likely be using to view the file, I went with it. As it happens, out of more than 1700 results returned, only 6 exceeded that length. If you’re not going to be using Excel, you can make the number much larger… just be sure to check if the spreadsheet application you’re using has its own per-cell character limit.

In my main query, I identified specific fields I wanted to include in the output, since there are some fields in the wp_posts table I don’t really care about here, such as ping_status or menu_order. But if you just want everything, you can select p.* and leave it at that.

The subquery where GROUP_CONCAT comes in has a few specifics to note: first, I chose the pipe character as my delimiter, but you could use whatever you want. Also, in the WHERE clause I’m specifying only rows that have a meta_key that starts with modular_content_. (The backslashes are to escape the underscores, which is a single-character wildcard in SQL.) That’s because my Flexible Content field’s key is modular_content and each of its subfields has that string prepended. (I included the trailing underscore because I don’t want modular_content itself, which just stores a number indicating how many subfields are in that particular instance.)

The main query’s WHERE clause has a couple of additional limits on the scope of the query. I only want pages and posts, plus The Events Calendar events, so I am limiting the post_type to those three. And I also don’t want to include posts that are revisions or in the trash, so I limit the post_status as well.

As always, the standard caveat: Running direct SQL queries can permanently alter your database. You should be pretty safe with this one since SELECT is a read-only action. But take anything here with a grain of salt, and don’t run code you found on the Internet in a production environment unless you’re sure you know what it’s going to do!

MySQL startup loops indefinitely and consumes a ton of CPU? This might be the fix

I’ve been having some issues with a particular server lately where it keeps going down. I probably should have given it serious attention sooner, but it’s a “personal” server (runs this site, my wife’s blogs, and a few other sites I’m hosting as a courtesy to friends), and I’ve had a lot going on lately.

This morning it seemed to be worse off. MySQL just wouldn’t start. My monitoring script that fires off every 10 minutes so I don’t have to be on-call 24/7 was doing its best, but it just kept restarting in vain.

Time to look into the issue. I found that MySQL was running, consuming over 100% CPU (it’s a multi-CPU machine so the maximum percentage is over 100), but nothing was loading.

Running systemctl status mysql.service showed this, which kind of surprised me:

Status: "Server startup in progress"

So, something was causing MySQL to just get stuck in the startup process and never actually get up and running. I figure that is usually a corrupt database, which could be a nightmare, especially since I’ve been ignoring this issue for a week or so. Having to restore from a week-or-more-old backup would be a minor inconvenience to me, but my wife writes a lot on her blog and she would not be happy to lose several days of work.

I needed to check out the MySQL log file. At 17 GB — yikes — that meant using tail to just check out the last few hundred lines.

Here’s something interesting:

2025-06-08T14:33:14.651332Z 1
[ERROR] [MY-011899] [InnoDB] [FATAL] Unable to read page [page id:
space=0, page number=5] into the buffer pool after 100 attempts. The
most probable cause of this error may be that the table has been
corrupted. Or, the table was compressed with with an algorithm that is
not supported by this instance. If it is not a decompress failure, you
can try to fix this problem by using innodb_force_recovery. Please see
http://dev.mysql.com/doc/refman/8.0/en/ for more details. Aborting…
2025-06-08T14:33:14.651347Z 1 [ERROR] [MY-013183] [InnoDB] Assertion
failure: buf0buf.cc:4110:ib::fatal triggered thread 140583111841344
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.

I went straight to the last link to the MySQL docs, Forcing InnoDB Recovery. Since this site has a bunch of WordPress databases that all use InnoDB tables, that would — hopefully — be the solution.

It’s pretty simple, once you find the right configuration file to put this line of code in:

innodb_force_recovery = 1

My server is running Ubuntu, so the file I wanted (I had to hunt around a bit) was:

/etc/mysql/mysql.conf.d/mysqld.cnf

I added that line at the end of the file, ran systemctl start mysql and much to my surprise, after about 3 seconds, the command prompt returned, with no errors. I fired up Safari and checked out my site and… well, since I’m writing this here, you can guess the rest.

Of course, is this really a solution? I was hoping so. The name of the parameter sounds like it’s, y’know, going to fix any problems it encounters. But reading the documentation further, it looks like it is really designed just to bypass certain safety mechanisms in order to allow the system to run so you can do your own troubleshooting.

Unfortunately I’m not quite sure where to begin with this troubleshooting. There are over 30 databases on this server, so I’m looking at somewhere over 500 tables, any of which could be the culprit, and the log files don’t give any indication of which table — or even which database — is the source of the problem.

So, when in doubt, I like to start as simple as possible. Since innodb_force_recovery is supposed to be only a temporary setting and it limits certain functionality, I knew I would eventually have to turn it off again. Let’s just try that now and see what happens.

I commented out the line I had just added to the config file, tried restarting MySQL, and… it worked. I’m not sure if starting up with innodb_force_recovery did do something that cleaned up the problem, or if just using that setting to get past whatever was hanging things up before allowed the normal boot process to do some standard cleanup, but in any case, it seems to be working fine now.

But if I get another alert that things have gone down, I’m not going to wait a week to investigate this time, no matter how much more pressing work I have going on.


Update (June 12, 2025): Although the server hasn’t completely crashed again, I’ve been observing over the past couple of days that my monitoring script is still restarting services a couple of times a day, so there’s still a problem.

I realized that the most likely database to be having issues was the one running this site, since it was originally installed with WordPress 2.1 back in 2007, and the core tables were still running the MyISAM engine (instead of the modern standard InnoDB), and several tables also had odd character set collation settings. Those have been causing problems in recent months, such as the odd issue I had with a previous WordPress update where it was mangling image uploads. (Yes, that actually had to do with the data table structure!)

I decided to rebuild the database: the tl;dr is that I exported the data to my computer, dropped all of the tables in the live database, and then re-imported the data to create clean new tables.

The long version is, I did a bit more than that. Those old tables had several NOT NULL datetime fields where the default value was 0000-00-00 00:00:00 — a value that was acceptable in some earlier version of MySQL but now is not. (The dates have to map to a valid Unix timestamp, so the new default is 1970-01-01 00:00:00. Although I went with 2000-01-01 00:00:00 as my new defaults.)

I also took this opportunity to change all of the tables to using the InnoDB engine and utf8mb4_unicode_520_ci as the collation. (Yes, I probably could have/should have used utf8mb4_0900_ai_ci but I went with the newest value I saw in other tables.)

I’ll need to give it a day or two to see if this resolves the issue. If not, I think it will mean that the principle is correct, but I just haven’t found the damaged database yet. So, I just have about 29 more to test after this!)


Update (August 2, 2025, Ron Howard voiceover): It did not resolve the issue.

Well, it did for a while, but the same problem happened again two months later. So I haven’t really pinpointed the cause yet. I’m inclined to just encourage the client to ditch their VPS and move to managed hosting elsewhere.


Update (August 4, 2025): Now that it’s Monday and I’m no longer just in “weekend emergency support” mode, I took a fresh look at this. I’m not a database administration expert, but I think my folly the first time around was that I didn’t actually do anything after getting things back up and running. The MySQL documentation emphasizes using innodb_force_recovery to get the database running so you can dump the tables. I am hoping that just dumping the tables and then restoring from the dump will clear up whatever ticking time bomb was lurking in the database. Fortunately that’s super easy with a couple of shell commands:

mysqldump -u USERNAME -p DATABASE_NAME > DATABASE_NAME.sql
mysql -u USERNAME -p DATABASE_NAME < DATABASE_NAME.sql

The first (mysqldump) command exports the database contents to a .sql file. The second (mysql) command uses that .sql file to overwrite the contents of the database. Someone please correct me if there's something more you have to do to get the data tables to be rewritten fresh from the file, but I believe that's it. (Running these two commands freed up around 200 MB of space on the disk, so it obviously did something significant.)