I have a client whose WordPress website is basically shutting down, and they need to keep some archives of the existing version of the site.
I’m making three different archives, for various purposes:
- Full backup. Of course, I’m doing this. A full backup of the database using
mysqldump
and a full archive of the site files usingtar -cvzf
. I like to do these things at the command line, and that’s what I’m doing here, but if I don’t have terminal access, I use the Updraft Plus plugin. This backup is not human-readable, but it allows us to spin up the site again on a new server in the future, if needed. - Crawled static snapshot. Once again I’m at the command line, this time using
wget
to crawl the entire site and make a static copy we can just stick on a thumb drive and the client can navigate like it’s a live site. This is the most useful for day-to-day reference for the client, or if they need to show anyone what content existed on the site at the point when it was taken down. - CSV export. This is, honestly, the clunkiest element, but it’s not without its uses. Yes, exporting content directly from the database into a spreadsheet means a lot of raw HTML. But it’s handy as a reference tool if the client needs to quickly find a bit of text that might have appeared anywhere on the site.
The tricky thing with the CSV export is, how do we go about that? That’s what this post is about.
My go-to tools for any WordPress import/export task are WP All Import and WP All Export. These are super flexible plugins that let you get really granular with selecting and organizing the data you’re working with, and they’re especially great in situations like where a client needs to go through and update a bunch of SEO content all at once. They can just blast through a spreadsheet instead of tediously editing individual pages and posts one-at-a-time.
I especially like the fact that there are add-ons that support Advanced Custom Fields. I make extensive use of ACF (and did even more so before Gutenberg matured). My old theme this client’s site runs on stores almost all of its content in ACF Flexible Content fields rather than in the standard WordPress content block.
But here’s where things get messy. The standard content block is a single field in the post’s main record in the wp_posts
database table. But ACF stores each individual content field as a separate record in the wp_postmeta
table. And a Flexible Content field isn’t just one ACF field; it’s an indefinitely large group of fields, depending on how complicated your page content is.
So there’s a one-to-many relationship between posts and Flexible Content records in the database. WP All Export makes it easy to pull in all of the Flexible Content subfields, but… yikes… it makes each one a separate column in the resulting CSV.
I ran WP All Export on this client’s content, and the CSV contained over 4,000 columns. Not at all easy to work with. I wondered, “Is there a way to concatenate all of the Flexible Content subfields into one column?” Granted, you’re going to lose the keys, but depending on your purposes — such as mine, here — that doesn’t matter. I just need the content, I want it in a single cell for each post, and I don’t care if there’s extra clutter in there too (e.g. booleans, key IDs for attached images, etc.). As long as the full text content is there, I’m good.
I quickly realized that, even if WP All Export does have a way to do this (although it seems like it doesn’t), it would probably take me less time to figure out a direct SQL query for the task rather than figuring out how to do it with WP All Export and custom PHP functions.
Yes, there is a way!
The key is the MySQL GROUP_CONCAT
function. I can perform a subquery on the wp_postmeta
table for my Flexible Content subfields and concatenate that into a single field in the query output.
Here’s my query. (Note: This is based directly on the example I found here, modified to be WordPress-specific.)
SET SESSION group_concat_max_len = 32767; SELECT DISTINCT p.`ID`, p.`post_type`, p.`post_title`, p.`post_name`, p.`post_content`, (SELECT DISTINCT GROUP_CONCAT(m.`meta_value` SEPARATOR '|') FROM `wp_postmeta` m WHERE m.`post_id` = p.`ID` AND m.`meta_key` LIKE 'modular\_content\_%' GROUP BY m.`post_id` ) as `modular_content`, p.`post_excerpt`, p.`guid`, p.`post_status`, p.`post_date` FROM `wp_posts` p WHERE p.`post_type` IN ('page', 'post', 'tribe_events') AND p.`post_status` NOT IN ('inherit', 'revision', 'trash') ORDER BY p.`post_type` ASC, p.`post_title` ASC;
What do you need to know about this?
First off, GROUP_CONCAT
by default truncates the data at 1024 characters. The first SQL statement bumps that up to 32,767. Guess what, that’s probably not long enough, either, but since it’s the maximum character length of a cell in Excel, which is what my client will most likely be using to view the file, I went with it. As it happens, out of more than 1700 results returned, only 6 exceeded that length. If you’re not going to be using Excel, you can make the number much larger… just be sure to check if the spreadsheet application you’re using has its own per-cell character limit.
In my main query, I identified specific fields I wanted to include in the output, since there are some fields in the wp_posts
table I don’t really care about here, such as ping_status
or menu_order
. But if you just want everything, you can select p.*
and leave it at that.
The subquery where GROUP_CONCAT
comes in has a few specifics to note: first, I chose the pipe character as my delimiter, but you could use whatever you want. Also, in the WHERE
clause I’m specifying only rows that have a meta_key
that starts with modular_content_
. (The backslashes are to escape the underscores, which is a single-character wildcard in SQL.) That’s because my Flexible Content field’s key is modular_content
and each of its subfields has that string prepended. (I included the trailing underscore because I don’t want modular_content
itself, which just stores a number indicating how many subfields are in that particular instance.)
The main query’s WHERE
clause has a couple of additional limits on the scope of the query. I only want pages and posts, plus The Events Calendar events, so I am limiting the post_type
to those three. And I also don’t want to include posts that are revisions or in the trash, so I limit the post_status
as well.
As always, the standard caveat: Running direct SQL queries can permanently alter your database. You should be pretty safe with this one since SELECT
is a read-only action. But take anything here with a grain of salt, and don’t run code you found on the Internet in a production environment unless you’re sure you know what it’s going to do!