Migrating primary keys in a CakePHP site’s database from GUIDs to integers

If you are a CakePHP developer, and find yourself in a situation where your database is using GUIDs but you want to switch to using integer-based primary keys instead, this post will describe how I dealt with that exact situation. (Actually, the post will do that even if you’re not a CakePHP developer; it just probably won’t be of any interest to you.)

Some unnecessary background details you can skip if you’re in a hurry

I may be the only CakePHP developer silly enough ever to have used GUIDs (those crazy pseudo-unique 36-character hexadecimal strings) instead of plain auto-increment integers as primary keys. Maybe not. In any case, it only took me one CakePHP project to realize the folly of the approach and I quickly abandoned GUIDs on my second CakePHP site.

Now the time has come, nearly four years later, to finally update that first CakePHP site to the latest version of cms34. There will surely be numerous challenges involved in the upgrade, given the extensive evolution the system has undergone in the intervening years. But without a doubt the biggest, or at least the first, of those challenges is getting the data converted from using GUIDs to integer-based primary keys.

The reason I used GUIDs in the first place was that I have a few multi-purpose cross-reference tables in the structure, and I wanted to just be able to refer to a single ID field as a foreign key, without having to keep track of both the table/model and an integer ID. Kind of silly, in retrospect, especially since I quickly realized how important it was to be able to easily identify which table the foreign key pointed to. At the time I was singularly focused on trying to keep the database as stripped-down basic as possible, to a self-defeating point.

But my early CakePHP learning curve is a bit beside the point here. The goal of this post is to document the process I am undertaking to migrate the site, for the benefit of anyone else who finds themselves in a similar situation. (Hopefully, for you, the idiot who used GUIDs was not yourself.)

Step 1: Preliminary set-up

First, a qualifier: the way I am doing this will result in a lot of gaps in the sequence of integers used in the individual tables after the migration. If the actual values of your integer primary keys matters to you, you may not want to take this approach. I can’t imagine a reason why the values would really matter though.

The idea in this step is, in essence, to convert each GUID in the system into a unique integer that can be used later in the update process. There’s no reason why the integers have to be unique across tables any longer; it’s just that having a definitive reference, in a single table, of how each GUID maps to an integer will make some of the later steps in the process easier. If you are compelled not to skip numbers in individual tables, you could add an extra field called table_id, and increment table-specific IDs in that field. But that would require extra code in the next step, and it just seems like an extra complication without any real benefits.

Do not do any of this on a live site. Set up a duplicate copy of your site somewhere else, and do all of this conversion work there. Also discourage users from updating the existing site while you’re doing the migration, if possible. This conversion can be set up as a repeatable process, however, so you could always do some test runs, and then once you have everything working reliably, get a fresh dump of the live database again, minimizing downtime.

In your copy database, create all of the tables for your old database and load in the data. Also set up empty tables for the new database (which hopefully doesn’t have overlapping table names; if it does, add a prefix to the table names for the old database). Once you have both the old and new tables in your database, and the old tables loaded with data, you’re ready to proceed.

Step 2: Build and populate a GUID map table

Assuming you’re using MySQL, this will take care of it:

CREATE TABLE IF NOT EXISTS `guid_map` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`table` varchar(255) NOT NULL,
`guid` char(36) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;

If you have adequate permissions to create the table from within a PHP script, you may as well start building one, because you’ll be adding to it next anyway:

<?php
// Connect to database (replace values as appropriate)
$db_name = 'database'; // Set to your database name
$dbconn = mysql_connect('localhost','username','password',true);
$db = mysql_select_db($db_name);

// Build guid_map table
mysql_query("CREATE TABLE IF NOT EXISTS `guid_map` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
    `table` varchar(255) NOT NULL,
    `guid` char(36) NOT NULL,
    PRIMARY KEY (`id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1"
);

Next, you’ll want to build a PHP script that will crank through all of your old tables, grab all of the GUIDs, and write them into this new table. Here’s a snippet of PHP to get the job done. You’ll need to decide how you want to run it; it assumes a database connection is already established, so you could just drop it in as an action in one of your controllers or you could build a stand-alone PHP script and add the necessary database connection code at the top.

Note: This works with direct SQL queries, not CakePHP’s model methods. That’s just how I did it. Also, when I build PHP utilities like this, I run them in a browser and like to dump status information as basic HTML. If you prefer to run this at the command line, or don’t care about a verbose dump of the details, you could modify/remove any lines here that contain HTML.

// Clear current GUID map data (makes this process repeatable)
mysql_query('TRUNCATE TABLE guid_map');
if (mysql_error()) { echo '<p>' . mysql_error() . '</p>'; exit; }
echo '<p>GUID map cleared.</p>';

// Get tables
$old_prefix = 'old_'; // Set to prefix for old table names
$rs = mysql_query('SHOW TABLES');
$tables = array();
if (!empty($rs)) {
    while ($row = mysql_fetch_array($rs)) {
        if (strpos($row[0],$old_prefix) === 0 && $row[0] != 'guid_map') {
            $tables[] = $row[0];
        }
    }
}

// Loop through tables, retrieving all GUIDs
foreach ((array)$tables as $table) {
    $sql = 'SELECT id FROM `' . $table . '`';
    $rs = mysql_query($sql);
    
    // Loop through rows, adding to GUID map
    $error_count = 0;
    $mapped_count = 0;
    if (mysql_num_rows($rs) > 0) {
        echo '<hr /><h3>Processing ' . mysql_num_rows($rs) . ' rows in table: ' . $table . '</h3>';
        while ($row = mysql_fetch_assoc($rs)) {
            $sql = 'INSERT INTO `guid_map` (`table`, `guid`) VALUES (\'' . mysql_real_escape_string($table) . '\', \'' . mysql_real_escape_string($row['id']) . '\');';
            mysql_query($sql);
            if (mysql_error()) { echo '<p>' . mysql_error() . '</p>'; }
            else { $mapped_count++; }
        }
        echo '<p>Mapped ' . $mapped_count . ' rows with ' . $error_count . ' error(s).</p>';
    }
    else {
        echo '<p>No data rows found in table: ' . $table . '</p>';
    }
}

Once you’ve run this script, your guid_map table will now contain a row for every row of every table in your database. And you now have an integer ID you can use for each of them. And what’s extra cool about it is that it’s repeatable. Run it as many times as you want. Get a new dump of data from the old site and run it again. Fun!

The next step is where things get a lot more complicated, and a lot more customized to your specific data structures. I’ll be using one of my actual data tables as an example, but bear in mind that your system is unlikely to match this schema exactly. This is just a demonstration to work from in building your own script.

Teaser: My least-readable blog post may be coming soon

Aside

I have a topic for an impending blog post that is likely to be my most arcane, geeky, unreadable post ever. Even if you know what I’m talking about. It will be the blog post equivalent of Metal Machine Music. Even I won’t be able to read it after it is written.

The topic: migrating the primary keys in a CakePHP site’s MySQL database from GUIDs to integers. I can’t wait.

File under “Don’t Make Me Do This Myself”: a comparison of “TTW” (Through The Web) WYSIWYG text editors

I really don’t want to have to spend time thinking about this, but there’s such a dearth of useful information out there on this topic — based on my searches of Google and Bing, which return little more than uncritical lists of 40+ different TTW text editors, usually displayed on such hideously designed or woefully outdated websites that I discount their validity on sight — that I feel compelled to step in.

The question today is TTW (through-the-web) WYSIWYG (what-you-see-is-what-you-get) text editors. If all of that sounds like 10 letters of gibberish to you, feel free to stop here. But if you’re a web developer, especially of the custom CMS variety, you’re certainly aware of the situation: how do you give users of your system a usable tool that allows them to easily edit site content without having to muck around directly with HTML? (That is, after all, kind of the whole point of a CMS.)

It’s something I’ve struggled with for over a decade. At one point I was actually rolling my own. But that’s a little more JavaScript than I care to deal with directly, and I long ago left the project of building a WYSIWYG editor to those who really love that kind of thing.

That puts me in a position where I need to select the best available option for a pre-built, drop-in WYSIWYG editor. Fortunately things have come a long way in this regard over the past decade. I’ve been — more or less happily — using TinyMCE to solve this particular problem for the past 3-plus years. But lately “less happily” has been outweighing “more,” and I’ve been exploring my options.

So far the only viable alternative I’ve found (or had recommended to me) is CKEditor. It’s the successor to one of the really early TTW WYSIWYG editors, FCKEditor, which I tried ages ago and never really liked.

Today I took a major step forward with cms34, my custom CMS, by setting up a configurable site option that allows users to select the editor of their choice: TinyMCE, CKEditor, or raw HTML. As inclined as I am to use raw HTML myself, I’m giving CKEditor a whirl for now.

So far I am inclined to say CKEditor is just the remedy I’ve been looking for to cure my TinyMCE malaise. As good as TinyMCE is, it just gets a little wonky sometimes. It especially seems to have trouble figuring out where to put closing tags when you’re switching between block elements, and especially when you’re inserting new content. I find myself often switching to the HTML pane to fix its quirks manually, but I can’t expect clients to do that.

My experience with CKEditor is still pretty limited at this point, but I have to say I really like how it’s set up for customizing the interface (which buttons to show, especially), in addition to its better handling of switching between elements than TinyMCE. They’re both pretty similar, actually, in how they’re configured, and in the overall user experience. But CKEditor has a little more polish, a little more flexibility. It almost feels like “TinyMCE done right,” although perhaps it’s too early for me to make such a proclamation.

So, that’s it for me, for now. The only two options in this realm that I really have any experience with. I know there are others out there. Some may even be good. Even better than TinyMCE or CKEditor. What’s your favorite?

In UX we trust: Netflix as a case study in how good search isn’t good enough

Last night, prompted by a Dan Benjamin tweet, I felt inclined to watch one of my favorite ’80s movies, Fletch. I own Fletch on DVD (two versions, in fact), but I didn’t feel like busting them out. I wanted to watch it on my iPhone in bed, so I decided to check the iTunes Store and Netflix.

Unsurprisingly, iTunes did have it, but only for purchase, not for rental. And I’m not inclined to pay $14.99 to buy a digital copy of a movie I already own twice over on disc. While on iTunes, I saw a recommendation for The Sting. Intrigued, since I have never seen it (gasp!), I considered it as a possible alternative, and was pleased to see iTunes had it as a $3.99 rental. But before dropping four bucks on it I decided to check its availability on Netflix.

I checked the iTunes Store first, because I have learned to assume Netflix won’t have what I want available for instant streaming. Or, more accurately, I have learned not to trust that Netflix will have what I want.

And that’s where the problem lies: I don’t trust Netflix. It’s not that I think they’re up to anything nefarious (it’s not the same as the distrust I have these days for Google, for instance). And it’s not even entirely that I have become jaded by past experience discovering just how woefully limited their selection of streaming content is.

It’s that I don’t trust their interface to really show me what’s available.

Why not? They have a search box, after all. I can just search for what I want. To the best of my knowledge, the search box works pretty well. If they have something, it comes up. If they don’t, it doesn’t.

As usual, I couldn’t keep my frustration to myself, so I took it to Twitter:

So, again, why don’t I trust Netflix? I’ve been pondering that question all morning, and I think I have it figured out. It’s because good search, alone, isn’t good enough. Search is open-ended. It’s also kind of like standing outside a locked door and whispering through the keyhole to someone inside. You know there’s a lot of stuff inside the room. You even know that the room probably holds things you want. But you can’t see for yourself what’s inside, and the person on the other side of the door is only answering yes-or-no questions.

Of course, Netflix does offer more than search. But on mobile devices like the iPhone and iPad, it doesn’t offer a whole lot more. Sure, it has recommendations. And you can browse by genre. But I don’t really give a crap about their “recommendations.” That’s the person behind the door, offering a little bit more information, but it’s far from flinging the door wide open. And browsing by genre is great, if you know what genre the movie is in. If it even fits a genre. (I honestly don’t know what genre I’d find The Sting in. Is it a comedy? Drama? Action? I don’t know enough about the film to find it by genre.) And once you’ve selected a genre, you’re dumped into an experience not unlike rummaging through the cutout bin at a record store. (And if you’re too young to understand that analogy, get off my lawn.)

I took a few screenshots on both the iPhone and iPad, as well as on the Netflix website on my computer, to demonstrate what I’m talking about.

First, search results:

Well, that’s lovely. No results. OK. Did I spell it wrong? Is it case-sensitive? Am I hallucinating and this movie never even existed? Is anybody out there?

OK, well… hmm. What should I do now? Maybe I should browse comedies.

Really… that’s where we start? Can I fine-tune my selection a bit? Sort them into some kind of meaningful order? No, why would I want to do that. I’ll just flip through all of these aimlessly until I find something I can tolerate for the next two hours… I mean no, wait. Help me find what I’m looking for, dammit!

Well, OK. I’ve used enough apps between the iPhone and iPad to understand that the experience is often stripped down on the iPhone due to the smaller screen. That’s understandable. What about if I do the same search on the iPad?

You’re joking, right?

Let’s back up a step and see what Netflix presents to the user when they first enter the “Comedies” genre:

Well… um… that’s… a little better… I guess… than the iPhone experience. This is actually pretty close to what you get when you visit Netflix on the Apple TV, as well, and is somewhat of an improvement — aesthetically, at least — over the old version of Netflix for iPad. But still… it’s just that person behind the door, or the cutout bin.

Open the damn door and let me see for myself what’s in there.

Not that I think this is an adequate solution, let me say that right away, but I decided as a last resort to see if the desktop web interface for Netflix offered a superior experience. Here, where Netflix acknowledges that DVDs do, in fact, exist, the results are a bit better:

Thank you for at least acknowledging that the movie I asked about exists. Thank you for telling me that it’s not available for instant viewing but is available on DVD. Would it be so hard to do that on the mobile apps? I recognize that DVDs are useless on an iPhone or iPad, but simply providing this information reassures the user that their search worked. Now I can move on with my life.

What about browsing? Will you finally just open the door and let me see what’s in the room? At long last, sir, will you please just do that?

Yes!

You may note here that not only am I (after a few extra clicks) able to get a simple, alphabetized list, I am even able to browse subgenres! Who knew such wonders existed?

Sadly, browsing by title within a subgenre is probably the best way to get at what may be an ulterior motive behind the limited browsing interface Netflix presents in its mobile apps, as expressed in my tweet last night: their selection of streaming movies kind of sucks. There are plenty of reasons for this, and I’m not going to criticize Netflix for the challenges involved in working out licensing deals to stream thousands of movies for a very small, flat monthly fee. Netflix is a pretty amazing thing, when you look at what cable TV was like when I was a kid. (What am I saying? Look at what cable TV is like right now!)

Ultimately, though, whether or not Netflix is deliberately hiding its poor selection behind a mediocre browsing interface, it still has a mediocre browsing interface. Who cares? you say. Just search for what you’re looking for. Have you been reading this at all? I reply. Search, no matter how good it is, by itself is not good enough. Users need to be able to get their bearings, see what’s inside the room. We need an understanding of the scope of information we’re dealing with in order to make a meaningful search, and to make sense of the results we’re given, when we can’t find what we’re looking for.

So, a couple of final thoughts on how all of this ranting translates into a meaningful lesson in UX (user experience):

1. Don’t just rely on having a search box as an excuse not to organize and display your content in an easy-to-browse way.

2. Give meaningful feedback when a search fails. Don’t just tell the user “no results.” That’s obvious. Help them out. Give suggestions for alternate searches. And if there’s anything relevant in your database about the user’s search terms, even if it’s not directly available to them in their current context, at least let them know as much.

P.S. As it turns out, Netflix has neither Fletch nor The Sting available for instant viewing. I ended up not watching anything last night, and played around with this synthesizer app instead.