How I migrated a client’s blog from Drupal to WordPress 3.0

I just finished migrating a client’s blog from Drupal to WordPress 3.0. When I took on the project, I assumed there was a straightforward way to migrate content from Drupal to WordPress, but I was mistaken. Fortunately, there is a way, but it involves directly mucking about with the Drupal and WordPress databases. Never fear! There are some tools to help.

After a bunch of research, I found the most comprehensive and up-to-date solution on Mike Smullin’s blog. It wasn’t that up-to-date, though, having been written over a year ago and targeted at WordPress 2.7. Some helpful comments on that post offered a few refinements, and I added a few of my own to accommodate the particular quirks of this client’s site.

Since Mike Smullin (and D’Arcy Norman and Dave Dash before him) was so kind to share his work, I thought I should do the same. And so, I present my further refined SQL script for migrating Drupal data to a WordPress 3.0 site.

# DRUPAL-TO-WORDPRESS CONVERSION SCRIPT

# Changelog

# 07.29.2010 – Updated by Scott Anderson / Room 34 Creative Services http://blog.room34.com/archives/4530
# 02.06.2009 – Updated by Mike Smullin http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/
# 05.15.2007 – Updated by D’Arcy Norman http://www.darcynorman.net/2007/05/15/how-to-migrate-from-drupal-5-to-wordpress-2/
# 05.19.2006 – Created by Dave Dash http://spindrop.us/2006/05/19/migrating-from-drupal-47-to-wordpress/

# This assumes that WordPress and Drupal are in separate databases, named 'wordpress' and 'drupal'.
# If your database names differ, adjust these accordingly.

# Empty previous content from WordPress database.
TRUNCATE TABLE wordpress.wp_comments;
TRUNCATE TABLE wordpress.wp_links;
TRUNCATE TABLE wordpress.wp_postmeta;
TRUNCATE TABLE wordpress.wp_posts;
TRUNCATE TABLE wordpress.wp_term_relationships;
TRUNCATE TABLE wordpress.wp_term_taxonomy;
TRUNCATE TABLE wordpress.wp_terms;

# If you're not bringing over multiple Drupal authors, comment out these lines and the other
# author-related queries near the bottom of the script.
# This assumes you're keeping the default admin user (user_id = 1) created during installation.
DELETE FROM wordpress.wp_users WHERE ID > 1;
DELETE FROM wordpress.wp_usermeta WHERE user_id > 1;

# TAGS
# Using REPLACE prevents script from breaking if Drupal contains duplicate terms.
REPLACE INTO wordpress.wp_terms
    (term_id, `name`, slug, term_group)
    SELECT DISTINCT
        d.tid, d.name, REPLACE(LOWER(d.name), ' ', '_'), 0
    FROM drupal.term_data d
    INNER JOIN drupal.term_hierarchy h
        USING(tid)
    INNER JOIN drupal.term_node n
        USING(tid)
    WHERE (1
         # This helps eliminate spam tags from import; uncomment if necessary.
         # AND LENGTH(d.name) < 50
    )
;

INSERT INTO wordpress.wp_term_taxonomy
    (term_id, taxonomy, description, parent)
    SELECT DISTINCT
        d.tid `term_id`,
        'post_tag' `taxonomy`,
        d.description `description`,
        h.parent `parent`
    FROM drupal.term_data d
    INNER JOIN drupal.term_hierarchy h
        USING(tid)
    INNER JOIN drupal.term_node n
        USING(tid)
    WHERE (1
         # This helps eliminate spam tags from import; uncomment if necessary.
         # AND LENGTH(d.name) < 50
    )
;

# POSTS
# Keeps private posts hidden.
INSERT INTO wordpress.wp_posts
    (id, post_author, post_date, post_content, post_title, post_excerpt,
    post_name, post_modified, post_type, `post_status`)
    SELECT DISTINCT
        n.nid `id`,
        n.uid `post_author`,
        FROM_UNIXTIME(n.created) `post_date`,
        r.body `post_content`,
        n.title `post_title`,
        r.teaser `post_excerpt`,
        IF(SUBSTR(a.dst, 11, 1) = '/', SUBSTR(a.dst, 12), a.dst) `post_name`,
        FROM_UNIXTIME(n.changed) `post_modified`,
        n.type `post_type`,
        IF(n.status = 1, 'publish', 'private') `post_status`
    FROM drupal.node n
    INNER JOIN drupal.node_revisions r
        USING(vid)
    LEFT OUTER JOIN drupal.url_alias a
        ON a.src = CONCAT('node/', n.nid)
    # Add more Drupal content types below if applicable.
    WHERE n.type IN ('post', 'page', 'blog')
;

# Fix post type; http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-17826
# Add more Drupal content types below if applicable.
UPDATE wordpress.wp_posts
    SET post_type = 'post'
    WHERE post_type IN ('blog')
;

# Set all pages to "pending".
# If you're keeping the same page structure from Drupal, comment out this query
# and the new page INSERT at the end of this script.
UPDATE wordpress.wp_posts SET post_status = 'pending' WHERE post_type = 'page';

# POST/TAG RELATIONSHIPS
INSERT INTO wordpress.wp_term_relationships (object_id, term_taxonomy_id)
    SELECT DISTINCT nid, tid FROM drupal.term_node
;

# Update tag counts.
UPDATE wp_term_taxonomy tt
    SET `count` = (
        SELECT COUNT(tr.object_id)
        FROM wp_term_relationships tr
        WHERE tr.term_taxonomy_id = tt.term_taxonomy_id
    )
;

# COMMENTS
# Keeps unapproved comments hidden.
# Incorporates change noted here: http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-32169
INSERT INTO wordpress.wp_comments
    (comment_post_ID, comment_date, comment_content, comment_parent, comment_author,
    comment_author_email, comment_author_url, comment_approved)
    SELECT DISTINCT
        nid, FROM_UNIXTIME(timestamp), comment, thread, name,
        mail, homepage, ((status + 1) % 2)
    FROM drupal.comments
;

# Update comments count on wp_posts table.
UPDATE wordpress.wp_posts
    SET `comment_count` = (
        SELECT COUNT(`comment_post_id`)
        FROM wordpress.wp_comments
        WHERE wordpress.wp_posts.`id` = wordpress.wp_comments.`comment_post_id`
    )
;

# Fix images in post content; uncomment if you're moving files from "files" to "wp-content/uploads".
# UPDATE wordpress.wp_posts SET post_content = REPLACE(post_content, '"/files/', '"/wp-content/uploads/');

# Fix taxonomy; http://www.mikesmullin.com/development/migrate-convert-import-drupal-5-to-wordpress-27/#comment-27140
UPDATE IGNORE wordpress.wp_term_relationships, wordpress.wp_term_taxonomy
    SET wordpress.wp_term_relationships.term_taxonomy_id = wordpress.wp_term_taxonomy.term_taxonomy_id
    WHERE wordpress.wp_term_relationships.term_taxonomy_id = wordpress.wp_term_taxonomy.term_id
;

# OPTIONAL ADDITIONS — REMOVE ALL BELOW IF NOT APPLICABLE TO YOUR CONFIGURATION

# CATEGORIES
# These are NEW categories, not in Drupal. Add as many sets as needed.
INSERT IGNORE INTO wordpress.wp_terms (name, slug)
    VALUES
    ('First Category', 'first-category'),
    ('Second Category', 'second-category'),
    ('Third Category', 'third-category')
;

# Set category names to title case (in case term already exists [as a tag] in lowercase).
UPDATE wordpress.wp_terms SET name = 'First Category' WHERE name = 'first category';
UPDATE wordpress.wp_terms SET name = 'Second Category' WHERE name = 'second category';
UPDATE wordpress.wp_terms SET name = 'Third Category' WHERE name = 'third category';

# Add categories to taxonomy.
INSERT INTO wordpress.wp_term_taxonomy (term_id, taxonomy)
    VALUES
    ((SELECT term_id FROM wp_terms WHERE slug = 'first-category'), 'category'),
    ((SELECT term_id FROM wp_terms WHERE slug = 'second-category'), 'category'),
    ((SELECT term_id FROM wp_terms WHERE slug = 'third-category'), 'category')
;

# Auto-assign posts to category.
# You'll need to work out your own logic to determine strings/terms to match.
# Repeat this block as needed for each category you're creating.
INSERT IGNORE INTO wordpress.wp_term_relationships (object_id, term_taxonomy_id)
    SELECT DISTINCT p.ID AS object_id,
        (SELECT tt.term_taxonomy_id
        FROM wordpress.wp_term_taxonomy tt
        INNER JOIN wordpress.wp_terms t USING (term_id)
        WHERE t.slug = 'enter-category-slug-here'
        AND tt.taxonomy = 'category') AS term_taxonomy_id
    FROM wordpress.wp_posts p
    WHERE p.post_content LIKE '%enter string to match here%'
    OR p.ID IN (
        SELECT tr.object_id
        FROM wordpress.wp_term_taxonomy tt
        INNER JOIN wordpress.wp_terms t USING (term_id)
        INNER JOIN wordpress.wp_term_relationships tr USING (term_taxonomy_id)
        WHERE t.slug IN ('enter','terms','to','match','here')
        AND tt.taxonomy = 'post_tag'
    )
;

# Update category counts.
UPDATE wp_term_taxonomy tt
    SET `count` = (
        SELECT COUNT(tr.object_id)
        FROM wp_term_relationships tr
        WHERE tr.term_taxonomy_id = tt.term_taxonomy_id
    )
;

# AUTHORS
INSERT IGNORE INTO wordpress.wp_users
    (ID, user_login, user_pass, user_nicename, user_email,
    user_registered, user_activation_key, user_status, display_name)
    SELECT DISTINCT
        u.uid, u.mail, NULL, u.name, u.mail,
        FROM_UNIXTIME(created), '', 0, u.name
    FROM drupal.users u
    INNER JOIN drupal.users_roles r
        USING (uid)
    WHERE (1
        # Uncomment and enter any email addresses you want to exclude below.
        # AND u.mail NOT IN ('test@example.com')
    )
;

# Assign author permissions.
# Sets all authors to "author" by default; next section can selectively promote individual authors
INSERT IGNORE INTO wordpress.wp_usermeta (user_id, meta_key, meta_value)
    SELECT DISTINCT
        u.uid, 'wp_capabilities', 'a:1:{s:6:"author";s:1:"1";}'
    FROM drupal.users u
    INNER JOIN drupal.users_roles r
        USING (uid)
    WHERE (1
        # Uncomment and enter any email addresses you want to exclude below.
        # AND u.mail NOT IN ('test@example.com')
    )
;
INSERT IGNORE INTO wordpress.wp_usermeta (user_id, meta_key, meta_value)
    SELECT DISTINCT
        u.uid, 'wp_user_level', '2'
    FROM drupal.users u
    INNER JOIN drupal.users_roles r
        USING (uid)
    WHERE (1
        # Uncomment and enter any email addresses you want to exclude below.
        # AND u.mail NOT IN ('test@example.com')
    )
;

# Change permissions for admins.
# Add any specific user IDs to IN list to make them administrators.
# User ID values are carried over from Drupal.
UPDATE wordpress.wp_usermeta
    SET meta_value = 'a:1:{s:13:"administrator";s:1:"1";}'
    WHERE user_id IN (1) AND meta_key = 'wp_capabilities'
;
UPDATE wordpress.wp_usermeta
    SET meta_value = '10'
    WHERE user_id IN (1) AND meta_key = 'wp_user_level'
;

# Reassign post authorship.
UPDATE wordpress.wp_posts
    SET post_author = NULL
    WHERE post_author NOT IN (SELECT DISTINCT ID FROM wordpress.wp_users)
;

# VIDEO – READ BELOW AND COMMENT OUT IF NOT APPLICABLE TO YOUR SITE
# If your Drupal site uses the content_field_video table to store links to YouTube videos,
# this query will insert the video URLs at the end of all relevant posts.
# WordPress will automatically convert the video URLs to YouTube embed code.
UPDATE IGNORE wordpress.wp_posts p, drupal.content_field_video v
    SET p.post_content = CONCAT_WS('\n',post_content,v.field_video_embed)
    WHERE p.ID = v.nid
;

# IMAGES – READ BELOW AND COMMENT OUT IF NOT APPLICABLE TO YOUR SITE
# If your Drupal site uses the content_field_image table to store images associated with posts,
# but not actually referenced in the content of the posts themselves, this query
# will insert the images at the top of the post.
# HTML/CSS NOTE: The code applies a "drupal_image" class to the image and places it inside a <div>
# with the "drupal_image_wrapper" class. Add CSS to your WordPress theme as appropriate to
# handle styling of these elements. The <img> tag as written assumes you'll be copying the
# Drupal "files" directory into the root level of WordPress, NOT placing it inside the
# "wp-content/uploads" directory. It also relies on a properly formatted <base href="" /> tag.
# Make changes as necessary before running this script!
UPDATE IGNORE wordpress.wp_posts p, drupal.content_field_image i, drupal.files f
    SET p.post_content =
        CONCAT(
            CONCAT(
                '<div class="drupal_image_wrapper"><img src="files/',
                f.filename,
                '" class="drupal_image" /></div>'
            ),
            p.post_content
        )
    WHERE p.ID = i.nid
    AND i.field_image_fid = f.fid
    AND (
        f.filename LIKE '%.jpg'
        OR f.filename LIKE '%.jpeg'
        OR f.filename LIKE '%.png'
        OR f.filename LIKE '%.gif'
    )
;

# Fix post_name to remove paths.
# If applicable; Drupal allows paths (i.e. slashes) in the dst field, but this breaks
# WordPress URLs. If you have mod_rewrite turned on, stripping out the portion before
# the final slash will allow old site links to work properly, even if the path before
# the slash is different!
UPDATE wordpress.wp_posts
    SET post_name =
    REVERSE(SUBSTRING(REVERSE(post_name),1,LOCATE('/',REVERSE(post_name))-1))
;

# Miscellaneous clean-up.
# There may be some extraneous blank spaces in your Drupal posts; use these queries
# or other similar ones to strip out the undesirable tags.
UPDATE wordpress.wp_posts
    SET post_content = REPLACE(post_content,'<p> </p>','')
;
UPDATE wordpress.wp_posts
    SET post_content = REPLACE(post_content,'<p class="italic"> </p>','')
;

# NEW PAGES – READ BELOW AND COMMENT OUT IF NOT APPLICABLE TO YOUR SITE
# MUST COME LAST IN THE SCRIPT AFTER ALL OTHER QUERIES!
# If your site will contain new pages, you can set up the basic structure for them here.
# Once the import is complete, go into the WordPress admin and copy content from the Drupal
# pages (which are set to "pending" in a query above) into the appropriate new pages.
INSERT INTO wordpress.wp_posts
    (`post_author`, `post_date`, `post_date_gmt`, `post_content`, `post_title`,
    `post_excerpt`, `post_status`, `comment_status`, `ping_status`, `post_password`,
    `post_name`, `to_ping`, `pinged`, `post_modified`, `post_modified_gmt`,
    `post_content_filtered`, `post_parent`, `guid`, `menu_order`, `post_type`,
    `post_mime_type`, `comment_count`)
    VALUES
    (1, NOW(), NOW(), 'Page content goes here, or leave this value empty.', 'Page Title',
    '', 'publish', 'closed', 'closed', '',
    'slug-goes-here', '', '', NOW(), NOW(),
    '', 0, 'http://full.url.to.page.goes.here', 1, 'page', '', 0)
;

A few notes and suggestions:

  • Use at your own risk. I’m just sharing the code I used. In fact, it’s not even exactly the code I used, since I had to generalize a few things for mass consumption. I haven’t even tested this modified version. I make no guarantees as to its reliability or applicability to your particular implementation.
  • Download the script below; don’t just copy-paste the code from the block above. I had to apply some HTML formatting to make it look right, and this might result in bad characters that would break the script if run as-is above.
  • Always work with a test copy of your database before altering live site content. In theory this script can be run over and over again cleanly, but it will erase whatever data was already in the WordPress database, and there will be no way to get it back unless you have a backup.
  • Read through the code carefully, especially all comments before running the script. Most likely you will need to make some changes to the code.
  • Good luck! If you find bugs or develop further improvements, please comment below!

Download the Script

Download zip file... Drupal-to-WordPress Migration Script
drupal-to-wordpress.sql.zip • 4.8 KB

Migrating from CakePHP 1.2 to 1.3: My Story (Part One of… Possibly More than One)

For the past couple of years I’ve been working on a rapidly evolving CMS project called, not-so-creatively, cms34. It’s built on the open source CakePHP framework.

I love CakePHP. I’ve dabbled with a couple of other frameworks (notably, Zend Framework), and while CakePHP certainly isn’t perfect, I’ve found it by far the easiest to jump into and quickly get a powerful and reliable web application running.

When I started cms34 in late 2008, it was using CakePHP 1.2.0. Since then I’ve kept up with most of the minor point releases, upgrading as far as the current version, 1.2.7, as easily as simply swapping out the cake directory. And in that time, cms34 itself has gone through several major and minor revisions, currently up to what I’m calling version 3.2.

In preparation for version 3.3, I’ve decided to take the plunge and upgrade from CakePHP 1.2.x to the newly released CakePHP 1.3.0. Version 1.3 offers a number of improvements, but it also includes the kinds of changes typical of a major point release that mean an upgrade is not such a simple task. I’ve run into a few issues with the migration from 1.2.x to 1.3.x, many of which are not covered in the project’s official migration documentation, partly due to the incompleteness of the documentation, and partly due to some idiosyncrasies of my application that I wouldn’t expect the documentation to cover. But since I’m probably not in this boat alone, I thought I’d share some of the problems I encountered, and the solutions I found.

Getting started: RTFM

First things first. Download CakePHP 1.3.0. Be sure to get the release version (the one in the list with no hyphenated qualifiers after “1.3.0”). Unzip it and then follow the migration instructions to get started. For the most part this means swapping out the cake directory along with a couple of other files, and then going through all of the notes and making any necessary changes to your application.

In my own experience, I didn’t have to change much: I’ve been trying to keep up with the recommendations in the version 1.2 docs, so I haven’t been using any functionality that got dropped or deprecated. The writing has been on the wall for some time with most of these features and unless your application started with version 1.1 or 1.0, you’re probably not using any of these dead methods anyway.

A few specific issues I did need to address:

  1. Unlike with minor 1.2.x updates, there are a few files outside of the cake directory that need to be replaced. I found it best to just go through the entire release package and replace all files (except those I had modified) with their new versions.
  2. Update your config/core.php file, using the new version as a model. Mine had lots of changes, and I had stripped out all of the instructional comments to make the file less cumbersome to deal with. I wanted to keep those comments stripped out, so I just updated my file with what I needed, and then I kept the distribution’s version of the file in config as core.dist.php so I can refer back to it in the future if I run into any issues.
  3. There are some mixed messages in the documentation regarding the deprecation of AjaxHelper: the migration appendix says it’s deprecated but the main documentation doesn’t indicate that. I’m using it a fair amount and don’t want to have to rework all of that code, so for now I’m leaving it alone.
  4. JavascriptHelper is another matter, but since it appears there are perfect replacements for its functions now in HtmlHelper, I just did a straight-up find-and-replace, changing all instances of $javascript->link() to $html->script() and $javascript->codeBlock() to $html->scriptBlock(). I should note that so far I haven’t really tested any of this functionality to ensure that it’s still working, but no obvious errors have cropped up.

But wait, there’s more!

After following the migration docs, I was overly optimistic that things would just work. They didn’t. First up, I encountered this:

Notice (8): Undefined property: PagesController::$Session
[APP/controllers/app_controller.php, line 383]

Your line number will probably be different, but if you’re using sessions at all (and why wouldn’t you be?), you’ll get this error somewhere. It’s not mentioned in the migration docs, but I found out here that the Session component and helper are not instantiated automatically anymore. The solution is quite easy though. In app_controller.php just make sure that you add 'Session' to the arrays defined for var $components and var $helpers.

That was the biggest and most obvious problem I encountered. Once I had fixed that, I could at least get (some) pages to load, albeit with a few other issues. Namely, my theme was not being applied to the site. I’ll get to that in a minute though, because the problem I decided to fix next came in my CMS admin interface.

Pagination (even if you don’t want it)

I use the PaginatorHelper a lot, especially in the CMS admin. My CMS is modular: there’s a “module” (my term for any model-view-controller group) for pages, a module for blog posts, a module for uploaded files, a module for project porfolios, a module for the event calendar, etc. Each module is represented in the admin interface by a navigation tab, which when clicked takes the user to an index page with a paginated, tabular list of all of the records in the database for that module. In practice (on my own site — yes, I use my CMS to run my own site, of course!), it looks like this:

Unfortunately, the pagination was mysteriously hosed, with the following error:

Unsupported operand types in CAKE/libs/view/helpers/paginator.php
on line 616

What the…? Well, as it turns out, the methods $paginator->prev(), $paginator->next() and $paginator->numbers() have had changes to their input parameters that are currently not very well documented, in that the documentation doesn’t mention that they exist. They are mentioned in the API, but as usually happens when I refer to the API, I found it wanting, to the point that I resorted to digging into the source code directly to try to make sense of what these methods actually do.

In short, where you could previously pass these methods a URL as a string (as the second parameter for prev() and next() and the first for numbers()), now they expect arrays. And numbers(), in particular, really doesn’t like that parameter to be a string (because at one point it uses the += operator to merge two arrays, one of which is the first input parameter), resulting in the Unsupported operand type error.

Fortunately, in my case at least, I found that I didn’t need to pass the URL into any of these methods at all, and that just removing them altogether from the calls fixed the problem.

There was an ancillary problem once I got past the PHP error: inactive “Previous” and “Next” links were appearing on the page when there was no pagination. This was an unwanted addition, but I quickly realized this was an intentional change, and is well-documented. I was able quite easily to make them go away with a little CSS:

span.prev, span.next { display: none; }

With that minor crisis resolved, I could move back to the matter of the theme not showing up.

Changing themes

First, some background on my CMS: I’ve built the application so that I can have a single “core” system that works for numerous client websites, organized in a way that it’s easy to deploy updates across a number of separate sites. The system is also built to support multiple sites running on a single installation (by way of a Site model, with every piece of data in every other model having a site_id field keying the content to the proper site).

There are two key features of CakePHP that I rely on to make this all work: the look-and-feel (and site-specific view functionality) of each site is managed using Themes, and any site-specific controllers and views are handled with Plugins.

It is documented, unfortunately not in the migration appendix, that version 1.3 of CakePHP has changed how theme assets (static files like images, JavaScript and CSS) are organized. Now, instead of placing these in a directory under app/webroot/themed, you create a webroot directory inside the appropriate theme folder in app/views/themed. This is nice, in that now all of the files for a theme are in one location. It can cause a performance hit though, as these files are now being processed through PHP. You can still put the files under app/webroot to avoid this performance hit, but, annoyingly, the path has changed, as noted in the documentation. I decided to go with the new approach under app/views, as performance is not a major issue with the current implementations of my CMS, and the benefit of a single, consolidated directory for each theme is worth the trade-off. If performance does become an issue, though, at least now I know what to do about it.

More to come…?

After completing the above changes, I spent some time exploring my test application, both the user-facing site and the CMS admin interface, and so far I haven’t come across any other problems. Which is not to say I’m ready to roll this out to all of my clients tomorrow morning. I want to make the switch soon, as maintaining two separate code bases is a cumbersome task for a solo freelancer. But while things look good for now, this kind of major upgrade requires some solid testing and troubleshooting. If I run across any other issues, I’ll post them in Part Two.