Migrating primary keys in a CakePHP site’s database from GUIDs to integers

If you are a CakePHP developer, and find yourself in a situation where your database is using GUIDs but you want to switch to using integer-based primary keys instead, this post will describe how I dealt with that exact situation. (Actually, the post will do that even if you’re not a CakePHP developer; it just probably won’t be of any interest to you.)

Some unnecessary background details you can skip if you’re in a hurry

I may be the only CakePHP developer silly enough ever to have used GUIDs (those crazy pseudo-unique 36-character hexadecimal strings) instead of plain auto-increment integers as primary keys. Maybe not. In any case, it only took me one CakePHP project to realize the folly of the approach and I quickly abandoned GUIDs on my second CakePHP site.

Now the time has come, nearly four years later, to finally update that first CakePHP site to the latest version of cms34. There will surely be numerous challenges involved in the upgrade, given the extensive evolution the system has undergone in the intervening years. But without a doubt the biggest, or at least the first, of those challenges is getting the data converted from using GUIDs to integer-based primary keys.

The reason I used GUIDs in the first place was that I have a few multi-purpose cross-reference tables in the structure, and I wanted to just be able to refer to a single ID field as a foreign key, without having to keep track of both the table/model and an integer ID. Kind of silly, in retrospect, especially since I quickly realized how important it was to be able to easily identify which table the foreign key pointed to. At the time I was singularly focused on trying to keep the database as stripped-down basic as possible, to a self-defeating point.

But my early CakePHP learning curve is a bit beside the point here. The goal of this post is to document the process I am undertaking to migrate the site, for the benefit of anyone else who finds themselves in a similar situation. (Hopefully, for you, the idiot who used GUIDs was not yourself.)

Step 1: Preliminary set-up

First, a qualifier: the way I am doing this will result in a lot of gaps in the sequence of integers used in the individual tables after the migration. If the actual values of your integer primary keys matters to you, you may not want to take this approach. I can’t imagine a reason why the values would really matter though.

The idea in this step is, in essence, to convert each GUID in the system into a unique integer that can be used later in the update process. There’s no reason why the integers have to be unique across tables any longer; it’s just that having a definitive reference, in a single table, of how each GUID maps to an integer will make some of the later steps in the process easier. If you are compelled not to skip numbers in individual tables, you could add an extra field called table_id, and increment table-specific IDs in that field. But that would require extra code in the next step, and it just seems like an extra complication without any real benefits.

Do not do any of this on a live site. Set up a duplicate copy of your site somewhere else, and do all of this conversion work there. Also discourage users from updating the existing site while you’re doing the migration, if possible. This conversion can be set up as a repeatable process, however, so you could always do some test runs, and then once you have everything working reliably, get a fresh dump of the live database again, minimizing downtime.

In your copy database, create all of the tables for your old database and load in the data. Also set up empty tables for the new database (which hopefully doesn’t have overlapping table names; if it does, add a prefix to the table names for the old database). Once you have both the old and new tables in your database, and the old tables loaded with data, you’re ready to proceed.

Step 2: Build and populate a GUID map table

Assuming you’re using MySQL, this will take care of it:

`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`table` varchar(255) NOT NULL,
`guid` char(36) NOT NULL,

If you have adequate permissions to create the table from within a PHP script, you may as well start building one, because you’ll be adding to it next anyway:

// Connect to database (replace values as appropriate)
$db_name = 'database'; // Set to your database name
$dbconn = mysql_connect('localhost','username','password',true);
$db = mysql_select_db($db_name);

// Build guid_map table
mysql_query("CREATE TABLE IF NOT EXISTS `guid_map` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
    `table` varchar(255) NOT NULL,
    `guid` char(36) NOT NULL,
    PRIMARY KEY (`id`)

Next, you’ll want to build a PHP script that will crank through all of your old tables, grab all of the GUIDs, and write them into this new table. Here’s a snippet of PHP to get the job done. You’ll need to decide how you want to run it; it assumes a database connection is already established, so you could just drop it in as an action in one of your controllers or you could build a stand-alone PHP script and add the necessary database connection code at the top.

Note: This works with direct SQL queries, not CakePHP’s model methods. That’s just how I did it. Also, when I build PHP utilities like this, I run them in a browser and like to dump status information as basic HTML. If you prefer to run this at the command line, or don’t care about a verbose dump of the details, you could modify/remove any lines here that contain HTML.

// Clear current GUID map data (makes this process repeatable)
mysql_query('TRUNCATE TABLE guid_map');
if (mysql_error()) { echo '<p>' . mysql_error() . '</p>'; exit; }
echo '<p>GUID map cleared.</p>';

// Get tables
$old_prefix = 'old_'; // Set to prefix for old table names
$rs = mysql_query('SHOW TABLES');
$tables = array();
if (!empty($rs)) {
    while ($row = mysql_fetch_array($rs)) {
        if (strpos($row[0],$old_prefix) === 0 && $row[0] != 'guid_map') {
            $tables[] = $row[0];

// Loop through tables, retrieving all GUIDs
foreach ((array)$tables as $table) {
    $sql = 'SELECT id FROM `' . $table . '`';
    $rs = mysql_query($sql);
    // Loop through rows, adding to GUID map
    $error_count = 0;
    $mapped_count = 0;
    if (mysql_num_rows($rs) > 0) {
        echo '<hr /><h3>Processing ' . mysql_num_rows($rs) . ' rows in table: ' . $table . '</h3>';
        while ($row = mysql_fetch_assoc($rs)) {
            $sql = 'INSERT INTO `guid_map` (`table`, `guid`) VALUES (\'' . mysql_real_escape_string($table) . '\', \'' . mysql_real_escape_string($row['id']) . '\');';
            if (mysql_error()) { echo '<p>' . mysql_error() . '</p>'; }
            else { $mapped_count++; }
        echo '<p>Mapped ' . $mapped_count . ' rows with ' . $error_count . ' error(s).</p>';
    else {
        echo '<p>No data rows found in table: ' . $table . '</p>';

Once you’ve run this script, your guid_map table will now contain a row for every row of every table in your database. And you now have an integer ID you can use for each of them. And what’s extra cool about it is that it’s repeatable. Run it as many times as you want. Get a new dump of data from the old site and run it again. Fun!

The next step is where things get a lot more complicated, and a lot more customized to your specific data structures. I’ll be using one of my actual data tables as an example, but bear in mind that your system is unlikely to match this schema exactly. This is just a demonstration to work from in building your own script.

A follow-up on Apache not starting on my web server

About 6 weeks ago, I wrote about a problem I was having with Apache not starting with SSLEngine on. I ended the post somewhat ominously with the following:

I’m a little concerned that Apache is going to require manual input of these pass phrases again whenever it restarts (e.g. if the server reboots). I hope not, but for now I am at least able to move forward knowing it works at all.

This morning, a little before 6 AM, that happened. I was awakened by notifications (with their attendant beeps and nightstand vibrations) on my iPhone that my web server was down. Great. Half-awake, I fired up my hosting provider’s handy iPhone app, tapped the “Hard Reboot” button, and tried to go back to sleep. Except, the notifications kept coming. Eventually I was awake enough to realize that the server was coming back up, but Apache wasn’t. Time to get up and deal with this problem from a real computer.

SSHed in, I tried manually starting Apache, and got this:

(98)Address already in use: make_sock: could not bind to address
no listening sockets available, shutting down
Unable to open logs

What the crap? After spending a half hour visually scanning log and configuration files, to no avail, I decided I needed to try to find out what was running on port 80. This page was helpful in that regard. I ran the command lsof +M -i4 and found that, whaddayknow, Apache was running. Apparently. But I couldn’t shut it down, and I couldn’t restart it. There were no signs of any compromise of the system’s security, so I just chalked this up to some minor problem deeply buried somewhere in a configuration file that I have yet to track down (but which is probably my fault). At any rate, lsof gave me what I really wanted: the process ID that was listening on port 80. Time for the dreaded kill -9 command.

After that, I tried starting Apache again, and it worked… and, as I suspected, it did ask for the pass phrases again. But now, all is well. (Except for the nagging feeling of not knowing what caused this to happen in the first place. Stay tuned…)

You say you want a revolution? I say you want an iPad, even if you don’t think so yet

By now, lots of people have had lots to say about Apple’s latest revelation: the iPad. Opinions run the gamut from hating it (calling it a big iPod touch), to embracing it as a game-changing, revolutionary device (and calling it, essentially what Apple was working on all along).

The build-up

I’ve been thinking a lot about this device over the past couple of months, pretty much ever since I praised the litl. Even at that point, though I was skeptical that Apple was working on a tablet device, I suspected that, if they were, it would kill the litl. (And, somewhat presciently, I proposed an ideal target price of $499, which is exactly where the entry-level iPad is.)

A week earlier, I had also written about the debacle with the CrunchPad JooJoo. At that point I was still extremely skeptical that Apple was working on a tablet. At the time I wrote:

All the rumors say Apple’s tablet will be based on the iPhone OS, which seems more likely to me than a Mac OS tablet. But there’s too much about the iPhone “ecosystem” that just wouldn’t seem to translate to a larger tablet device, most obvious being the fixed display resolution. No way is Apple going to produce a device with a 10-inch screen and 480×320 resolution (even the original 1984 Macs had 512×384 displays), but by that same token, I don’t see the iPhone OS interface suddenly supporting multiple resolutions when there are over 100,000 apps all built around this one fixed resolution.

Of course now we know that Apple was working on a tablet, and it in fact does run all (well, most) iPhone apps. You can run them either at original size or pixel-doubled. But well before yesterday’s announcement, I had come to accept the general idea that Apple was working on a tablet device, that it would be based on the iPhone OS, and that, in many ways, it would resemble a “really big iPod touch.” At the same time, I also did come to believe that this was the device Apple had been aiming for all along; that the iPhone was just an intermediary step either to establish a market or simply to turn the huge amount of R&D that was going into this thing into a marketable product and bring some revenue into the project to help sustain the additional 3 years of work it required.

I’ve admitted before that up to the very minute of the iPhone’s unveiling I was denying that Apple would make a phone. I knew better than that this time around. The other half of that story is that, by the end of the keynote where the iPhone was revealed, I already desperately wanted one. I’m not so desperately craving an iPad (though I certainly would like one, especially based on first-hand reports of how amazing the user experience is), but that’s probably because I’m not the target of this device. But I know a lot of people who are.

What is a computer, and is that what you want?

Think about all of the things that a computer can do. And then think about all of the things you need to do to make a computer do the things you want it to do. Then think about all of the other things the computer can do that you have no use for. It’s all kind of a big headache, isn’t it? If you’re a “power user” like me — a programmer, a creator, a tinkerer — you’ll probably always want the flexibility, freedom, and power of a full-fledged computer. But think about people in your life who aren’t hardcore tech geeks. You probably have friends or family members who went kicking and screaming into the world of computing, most likely because it was the only reasonable way to access the Internet. The fact remains, plain and simple, that most computer users don’t need all of the things a computer can do; they don’t enjoy the hoops they have to jump through to get it to work; and, ultimately, they don’t understand their computers very well.

This is one of the expressed purposes of the litl: it’s a device techies can give to their non-techie relatives to do all of the basic computing tasks they want without constantly needing to call up the techie relative for support. But the litl falls short of that goal in two key ways: first, it is not designed to live in a household where there isn’t also a “real” computer; and second, at $699, it’s about $200 more than I think this kind of niche device should cost. There’s also an additional problem: it only offers WiFi, no 3G, so it’s not intended to leave the house — one of its creators even said as much in a comment here:

Why no 3G (at least not yet)? Wifi is still the best and most prevalent wireless networking technology at home where our device is intended to live (it’s not meant for road warriors). 3G has severe limitations when it comes to streaming video – wifi is superior here. Most home wifi is on all the time and the litl webbook is intended to remain on also. It’s a designer appliance for your home.

The litl is a designer appliance for your home. And yet it’s also supposed to be what you give Grandpa so he can look at the pictures of the grandkids that you’re posting on Flickr. You can’t have it both ways.

That dichotomy doesn’t exist with the iPad. The iPad has optional 3G, so you can take it anywhere (if you want), but most importantly, I can see the iPad existing by itself in a household that doesn’t own another computer.

Think about the things that most non-power users do with computers: They browse the web and send email. They listen to music and watch movies. They play games and read books. Occasionally they fire up Word (or whatever “lite” office suite shipped with their computers) to write a letter or do a little work from home. Guess what, the iPad does all of those things.

¡Viva la revoluçión!

I think the iPad revolutionizes the average consumer computer experience in a couple of ways. First, it eliminates all of the headaches of maintaining a computer or, for that matter, even learning how to interact with one. More on that in a minute. Second, it completely changes how computer users buy and install software. Everyone, even power users, hates the process of installing software. It’s tedious and slow, confusing and usually is presented in a convoluted and inconsistent way. But browsing the iPhone App Store is fun, and buying and installing software couldn’t be easier. The iPad brings that experience to general computing.

Back on the matter of maintenance and basic computer interaction: The biggest frustration I’ve had in providing support for friends and relatives who are not so computer savvy is the constant struggle they have with the basic interface of the computer. For people who haven’t devoted their careers to computing, like I have, the whole idiom of the graphical user interface (GUI) is perpetually confusing. What’s a window? What’s a cursor? What’s a menu? What’s a dialog box? What’s a scrollbar? I stare at a computer screen all day long; these things are as intuitive to me as the objects I deal with daily in the physical world. But that’s not the case for everyone. Even the process of using a mouse to move a cursor and interact with on-screen objects — arguably the most fundamental aspect of the GUI — is a level of abstraction a lot of users balk at. But the iPhone interface changed that. You acutally touch things with your fingers, move them around, pinch and stretch them. It’s fun, it’s intuitive, and it’s dead simple.

Sure, touchscreen interfaces have been around for years, and Windows-based touchscreen tablets have been available (if not exactly common for most of the last decade. But the GUI has essentially remained unchanged for over a quarter of a century. Just allowing a user to drag the on-screen cursor with their finger rather than with a mouse does not revolutionize the interface. Apple has reinvented the computer interface from the ground up with the iPhone and now the iPad. You were waiting for that inevitable revolution that would finally replace the GUI? Well, here it is.

And, for the geeks among us…

There’s one other, much more technical, reason why I am totally geeking out on the iPad though: HTML5. HTML5 is the “next generation” language of websites, promising new levels of interactivity and integration of multimedia into web pages that have up to this point been a tangled mess of proprietary and inconsistent plug-ins. HTML5 has been on the radar for years, but we web developers have had to drag our feet due to the glacial pace of adoption of new browsers. As long as a majority of users were still running Internet Explorer 6 — an ancient web browser that even Microsoft itself has by now denounced — our hands were tied regarding making full use of these new technologies. But the surging popularity of mobile devices, most importantly the iPhone but also Android-based smartphones, has opened up a huge new market where IE6 is irrelevant and HTML5-friendly browsers are the norm. Sure, you could use Firefox, Safari or Chrome (the only options for Mac users, and many smart Windows users have already made the switch), but here’s a brand new computing platform that brings all of these capabilities to a full-resolution (1024×768) screen.

Apple has wholeheartedly embraced HTML5 with the iTunes LP format, and it’s at the core of iBooks. Up to now, electronic books have typically been PDF-based, or some other, similar proprietary format. PDF is great, but it’s also an old format, and is fairly limited. HTML5 provides an easy way for content creators to enhance their presentations with fully-integrated audio and video, not to mention the interactive possibilities that CSS3, JavaScript, and offline data storage allow. This format makes it possible to create full, standalone applications as easily as creating a website (which, believe me, is easier than creating a full-fledged application in a traditional programming language). Sure, others have embraced this kind of web model, notably Palm, but only Apple has just the right mix of factors — market share, hardware/software integration, and, let’s face it, vision — to push something like this in the way to make it catch on.

You want it, you just don’t know what “it” is yet

There’s plenty of criticism of some of Apple’s practices — the iPad, like the iPhone, is a closed system; there’s DRM all over it; Apple is the gatekeeper for just about anything that goes in or out of the system. I can’t argue with those criticisms, other than to say, no one is forcing you to buy an Apple product. But those limitations are a trade-off for what Apple’s products offer: a uniquely integrated, incredibly polished, revolutionary experience. And, despite Apple’s lockdown of the top layers of the system, there’s openness at the core: Mac OS X and iPhone OS are based on an open source core and Apple is aggressively promoting the use of open standards like HTML5/CSS3 as the way to do things. Could it be more open? Of course it could. But then it wouldn’t be Apple. Open platforms are chaotic platforms. If you want to tinker with the system, or you just fundamentally believe in the principle of open software, then go get a Nexus One (and try to convince yourself that Google, deep down, believes in open systems too).

The arguments over open platforms could go on all day, but in the end I think it comes down to this: it has been Apple’s (and, largely, Steve Jobs’) vision for amazing — in Jobs’ long-echoed words, “insanely great” — technology devices that has driven these markets forward throughout the past decade. Do you think MP3 players would be where they are today if Apple hadn’t produced the iPod? It wasn’t the first MP3 player, but it fundamentally changed what an MP3 player is. Do you think we’d be talking about “apps” and that everyone would be carrying the Internet in their pockets if Apple hadn’t produced the iPhone? It wasn’t the first smartphone with Internet access, but again, it fundamentally changed what a smartphone is.

And now, the coup de grâce: the iPad. I won’t go so far as to say it changes what a computer is, on a fundamental level. But it creates something new: a consumer device that is a permanent replacement for consumer-grade computers. It’s what most people have wanted all along, but settled for a computer because what they wanted didn’t exist. Yet. And now it does.

The good, the bad, and the Apple

Ultimately, is it “all that”? I think the experience of using it, and what it represents for transforming the consumer computer industry, transcends what a list of its tech specs and features can convey. But in basic, concrete terms, it pretty much ended up being exactly what I was anticipating by now in terms of what it looks like, what it does, and how you interact with it. I was least surprised by its name — I had long suspected (though it’s easy to claim so after the fact) that it would be called the iPad, connotations of feminine hygiene products notwithstanding. There were, however, three things that genuinely surprised me, two good and one bad:

The good: 1) Price. I was hoping for, but not expecting, a price under $500. Granted, that’s just the entry-level model; they run as high as $829. But the fact that you can get an iPad for $499 is huge. 2) The A4 chip. I had absolutely no expectation that Apple would be developing a custom processor for this thing; it was not on my radar whatsoever. But from what I’ve read, this custom-built, highly-optimized chip is the key to the iPad’s blazing speed and overall awesomeness.

The bad: AT&T is the exclusive provider of 3G access. Seriously? This is a bit of a double-edged sword. I’m glad 3G is an option at all; the iPad easily could have shipped as a WiFi-only device, like the litl. But I, and many others, expected yesterday’s announcement to include Apple’s long-awaited untethering* from AT&T for the iPhone, and, needless to say, for the new iPad as well. Boo.

The Apple: No Flash. OK, this didn’t surprise me one bit, so I didn’t mention it above when I cited three surprises. But I needed to complete the pun I started in the header of this section, so here you go. No Flash. Never had it, never will. And like John Gruber (and for exactly the same reasons), I believe that’s a good thing. I recognize the seeming contradiction of criticizing Adobe for a closed system like Flash while praising Apple’s own closed systems, but there are some fundamental differences that, well, make all the difference. Apple’s closed systems are at the hardware and (locally-installed) software level. Adobe’s closed system is on the Internet — in the “cloud” in contemporary parlance. Adobe’s closed system is something that floats around out in the otherwise open, standards-based world of the Internet. It’s a way for Adobe to wall off part of the Internet in a bubble that it controls.

This is bad for a hardware maker like Apple, because as Gruber says, it prevents them from being able to fix problems caused by the fact that Adobe’s bubble isn’t sealed up quite as tightly as it should be (and, of course, it is a back door to allow people to bypass Apple’s systems). It’s also bad for us content creators because we’re beholden to Adobe to get our content online (in the form of having to buy Adobe’s high-priced software), and we’re dependent upon Adobe’s continued existence (and goodwill) to keep things running. What if Adobe goes out of business, or just abandons Flash? What happens to our Flash-based content then? HTML and JavaScript will never go out of business, because there’s no single corporate owner acting as gatekeeper over those technologies. And that is a fundamental difference between what Adobe is keeping closed vs. what Apple is.

* Pun intended, and kudos to you, geek that you are, for picking up on it.

Wheeler Kearns website launches

Wheeler Kearns ArchitectsI usually avoid mixing business and Blather, but I want to make an exception in this case. As of today, the new website for Wheeler Kearns Architects has gone live. Wheeler Kearns is an award-winning Chicago-based architecture firm. I love their clean, geometric designs and the open, welcoming spaces they create. I could spend hours looking through the vast archive of photos, sketches, models and renderings of their projects featured on the site.

But I have a special interest in this website as well, because I developed it. The project consisted of a custom, CakePHP-based CMS with a WYSIWYG editor (TinyMCE), extensive tagging and image management tools, and a polished user interface with lots of cool interactivity — fading slideshows, custom scrollbars, sliding navigation menus, etc. — courtesy of jQuery.