Notes on a Migration with Images from Blogger to WordPress
I’ve recently reported a very large blog from Blogger to WordPress. The blog has been around for many years, with around a thousand posts – most of which contain at least one image embedded in.
The WordPress theme to be used was a gallery-style one (Arras) which displays the post image thumbnails properly, so there was a requirement to have these included in the import. However it is trickier than you might think to get these in from Blogger on a bulk-import basis. Individually associating all the images is not an option due the number of posts. I’ve not used a solution requiring programming, but writing a Python script to do this would be pretty straightforward. Instead I’ve used Excel to extract the URLs and build up SQL queries with the URLs in them, to insert into WordPress. These notes will hopefully prove useful to someone trying to do the same thing.
Assumptions (a lot of these can be worked around if you have the know-how): You have Excel, EditPad (a text editor), a WordPress.org 3.1.X install which allows you to install arbitrary plugins (i.e. if using WordPress Multisite, you are a super-admin), and that you can access the MySQL database backing your blog using phpMyAdmin.
1. Install the Blogger Import plugin in WordPress and use it to import your blog in.
The result should be a successfully imported blog. When viewing a post, the images even appear. However – these are just links to images on the Blogger server. The images themselves haven’t been pulled across.
2. Install the Hot Linked Image Cacher (HLIC) plugin and use it to pull in local copies of your images.
Now the images are locally stored, and linked to locally in the content of your posts – the plugin updates the URLs as well as copying the images across. However, the images are still not formally associated with the posts that link to them.
3. Use phpMyAdmin to export just the ID and post_content columns of the wp_posts table to a CSV file. Use the suppress CR/LF option and the pipe “|” delimiter.
4. Open the CSV file in EditPad (not Excel – as Excel will automatically assume that the commas in your posts are the field delimiters.) and change all commas in the document to spaces.
5. Copy the contents and paste them into a blank Excel document. Use the Text-to-Columns Import wizard to import in the data with the pipe delimiter.
6. Extract the (first) image URL from the post_content column. You can do this by adding another column and using a formula like this:
=MID($B823,FIND("HLIC/", $B823),41)
It’s OK to use 41 characters because the HLIC plugin always saves images with names this long.
7. Use a Filter (AutoFilter) to remove rows for which there is no image URL.
8. You need to create SQL statements, two for each image, structuring them using Excel columns. Post-associated images are considered by WordPress to be themselves posts, with a parent ID referencing the corresponding post ID – and the images also have entries in the post metadata table. Here’s two examples of the two statements you need for each image, I’ve used IDs from the 7xxxxx and 8xxxxx ranges on the assumption there are no existing posts with IDs this high.
insert into wp_posts values( 700002 ,1,NOW(),NOW(),"","","","inherit","open","open","","","","",NOW(),NOW(),"", 1654 ,"/files/ HLIC/2af1d48d3251d953b106a0bbf8f2f810.jpg ", 0,"attachment", "image/jpeg", 0); insert into wp_postmeta values( 800002 , 700002 ,
"_wp_attached_file", "HLIC/2af1d48d3251d953b106a0bbf8f2f810.jpg ");
insert into wp_posts values( 700003
,1,NOW(),NOW(),"","","","inherit","open","open","","","","",NOW(),NOW(),"", 1683 ,"/files/ HLIC/dd376f1a86ba3e833e866e7f03127712.jpg ", 0,"attachment", "image/jpeg", 0); insert into wp_postmeta values( 800003 , 700003 , "_wp_attached_file", "HLIC/dd376f1a86ba3e833e866e7f03127712.jpg ");
Elements in red are from the original database export. Elements in blue are sequential numbers.
I’ve assumed all the images are JPEGs – you’ll need to change the “image/jpeg” part of the SQL statement if this is not the case.
If using WordPress Multisite, you’ll need to use the appropriate wp_NN_ table prefix and also set the author number (1 above) appropriately.
9. Paste the statements into EditPad, remove all tabs, and then paste all the statements into phpMyAdmin and execute.
10. Finally you need to build in the attributes for each image, before they are seen. You can do this with another plugin called Regenerate Thumbnails. Note that this plugin depends on the URL that was specified in the post metadata table.
Manual tidying will still be needed – particularly for embedded YouTube videos and other content.
One more gotcha with the import was that the Blogger post tags were coming in as categories. As there were several hundred of these, that wasn’t very practical. So I ran the following bit of SQL to change them to be WordPress tags:
UPDATE wp_term_taxonomy SET taxonomy = replace(taxonomy,"category","post_tag")