Migrating primary keys in a CakePHP site’s database from GUIDs to integers

If you are a CakePHP developer, and find yourself in a situation where your database is using GUIDs but you want to switch to using integer-based primary keys instead, this post will describe how I dealt with that exact situation. (Actually, the post will do that even if you’re not a CakePHP developer; it just probably won’t be of any interest to you.)

Some unnecessary background details you can skip if you’re in a hurry

I may be the only CakePHP developer silly enough ever to have used GUIDs (those crazy pseudo-unique 36-character hexadecimal strings) instead of plain auto-increment integers as primary keys. Maybe not. In any case, it only took me one CakePHP project to realize the folly of the approach and I quickly abandoned GUIDs on my second CakePHP site.

Now the time has come, nearly four years later, to finally update that first CakePHP site to the latest version of cms34. There will surely be numerous challenges involved in the upgrade, given the extensive evolution the system has undergone in the intervening years. But without a doubt the biggest, or at least the first, of those challenges is getting the data converted from using GUIDs to integer-based primary keys.

The reason I used GUIDs in the first place was that I have a few multi-purpose cross-reference tables in the structure, and I wanted to just be able to refer to a single ID field as a foreign key, without having to keep track of both the table/model and an integer ID. Kind of silly, in retrospect, especially since I quickly realized how important it was to be able to easily identify which table the foreign key pointed to. At the time I was singularly focused on trying to keep the database as stripped-down basic as possible, to a self-defeating point.

But my early CakePHP learning curve is a bit beside the point here. The goal of this post is to document the process I am undertaking to migrate the site, for the benefit of anyone else who finds themselves in a similar situation. (Hopefully, for you, the idiot who used GUIDs was not yourself.)

Step 1: Preliminary set-up

First, a qualifier: the way I am doing this will result in a lot of gaps in the sequence of integers used in the individual tables after the migration. If the actual values of your integer primary keys matters to you, you may not want to take this approach. I can’t imagine a reason why the values would really matter though.

The idea in this step is, in essence, to convert each GUID in the system into a unique integer that can be used later in the update process. There’s no reason why the integers have to be unique across tables any longer; it’s just that having a definitive reference, in a single table, of how each GUID maps to an integer will make some of the later steps in the process easier. If you are compelled not to skip numbers in individual tables, you could add an extra field called table_id, and increment table-specific IDs in that field. But that would require extra code in the next step, and it just seems like an extra complication without any real benefits.

Do not do any of this on a live site. Set up a duplicate copy of your site somewhere else, and do all of this conversion work there. Also discourage users from updating the existing site while you’re doing the migration, if possible. This conversion can be set up as a repeatable process, however, so you could always do some test runs, and then once you have everything working reliably, get a fresh dump of the live database again, minimizing downtime.

In your copy database, create all of the tables for your old database and load in the data. Also set up empty tables for the new database (which hopefully doesn’t have overlapping table names; if it does, add a prefix to the table names for the old database). Once you have both the old and new tables in your database, and the old tables loaded with data, you’re ready to proceed.

Step 2: Build and populate a GUID map table

Assuming you’re using MySQL, this will take care of it:

CREATE TABLE IF NOT EXISTS `guid_map` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`table` varchar(255) NOT NULL,
`guid` char(36) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1;

If you have adequate permissions to create the table from within a PHP script, you may as well start building one, because you’ll be adding to it next anyway:

<?php
// Connect to database (replace values as appropriate)
$db_name = 'database'; // Set to your database name
$dbconn = mysql_connect('localhost','username','password',true);
$db = mysql_select_db($db_name);

// Build guid_map table
mysql_query("CREATE TABLE IF NOT EXISTS `guid_map` (
    `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
    `table` varchar(255) NOT NULL,
    `guid` char(36) NOT NULL,
    PRIMARY KEY (`id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1"
);

Next, you’ll want to build a PHP script that will crank through all of your old tables, grab all of the GUIDs, and write them into this new table. Here’s a snippet of PHP to get the job done. You’ll need to decide how you want to run it; it assumes a database connection is already established, so you could just drop it in as an action in one of your controllers or you could build a stand-alone PHP script and add the necessary database connection code at the top.

Note: This works with direct SQL queries, not CakePHP’s model methods. That’s just how I did it. Also, when I build PHP utilities like this, I run them in a browser and like to dump status information as basic HTML. If you prefer to run this at the command line, or don’t care about a verbose dump of the details, you could modify/remove any lines here that contain HTML.

// Clear current GUID map data (makes this process repeatable)
mysql_query('TRUNCATE TABLE guid_map');
if (mysql_error()) { echo '<p>' . mysql_error() . '</p>'; exit; }
echo '<p>GUID map cleared.</p>';

// Get tables
$old_prefix = 'old_'; // Set to prefix for old table names
$rs = mysql_query('SHOW TABLES');
$tables = array();
if (!empty($rs)) {
    while ($row = mysql_fetch_array($rs)) {
        if (strpos($row[0],$old_prefix) === 0 && $row[0] != 'guid_map') {
            $tables[] = $row[0];
        }
    }
}

// Loop through tables, retrieving all GUIDs
foreach ((array)$tables as $table) {
    $sql = 'SELECT id FROM `' . $table . '`';
    $rs = mysql_query($sql);
    
    // Loop through rows, adding to GUID map
    $error_count = 0;
    $mapped_count = 0;
    if (mysql_num_rows($rs) > 0) {
        echo '<hr /><h3>Processing ' . mysql_num_rows($rs) . ' rows in table: ' . $table . '</h3>';
        while ($row = mysql_fetch_assoc($rs)) {
            $sql = 'INSERT INTO `guid_map` (`table`, `guid`) VALUES (\'' . mysql_real_escape_string($table) . '\', \'' . mysql_real_escape_string($row['id']) . '\');';
            mysql_query($sql);
            if (mysql_error()) { echo '<p>' . mysql_error() . '</p>'; }
            else { $mapped_count++; }
        }
        echo '<p>Mapped ' . $mapped_count . ' rows with ' . $error_count . ' error(s).</p>';
    }
    else {
        echo '<p>No data rows found in table: ' . $table . '</p>';
    }
}

Once you’ve run this script, your guid_map table will now contain a row for every row of every table in your database. And you now have an integer ID you can use for each of them. And what’s extra cool about it is that it’s repeatable. Run it as many times as you want. Get a new dump of data from the old site and run it again. Fun!

The next step is where things get a lot more complicated, and a lot more customized to your specific data structures. I’ll be using one of my actual data tables as an example, but bear in mind that your system is unlikely to match this schema exactly. This is just a demonstration to work from in building your own script.