Losing one’s data, losing one’s mind

Let me start this post with a couple of disclaimers:
1) This post is of no interest to anyone other than myself and possibly a few of my colleagues at UCL’s Centre for Advanced Spatial Analysis. It’s here as a memento of a couple of close-to-unbearable days in the office following an accidental blat of our project’s database. It will serve as a warning to us, and a guide in the unlikely event of us doing it again.

2) Persons and events in this blog post are fictional. Similarities to real persons or real events are entirely coincidental. That said, a few of the shouting, swearing episodes bear close resemblance to those seen round these parts recently…

It all started with an innocent bout of child-like curiosity: what does the new RTM of Windows 8 look like? How will it make our lives more exciting and more productive, as the expensive-looking YouTube ads have suggested?

We decided to run this little experiment on the server we use to host our project’s database, a natty little PostgreSQL install with PostGIS dealing with the spatial elements of our data.

To cut a long story short, we screwed up our Postgres installation due to a weirdness relating to 32-bit vs. 64-bit operating systems: our data refused to go into a 64-bit database, and Windows 8 seemingly refused to install a 32-bit instance of Postgres. We did however, have the contents of the \data folder within the old Postgres installation so we felt like we were in reasonably good shape.

The next piece of good news was that the data was restorable onto a Windows 7 computer with 32-bit Postgres installed. A sigh of relief went round the office like a OAP Mexican wave. We naively took a pg_dump of the database and ran pg_restore in the hopes that it would all come out in the wash. After three days of The Matrix-style rapidly-scrolling DOS window the process was announced as complete, but we couldn’t find the resulting database anywhere. Closer inspection of the pg_restore documentation revealed a very important piece of information hidden away in the header text which no one ever reads (the emphasis is mine):

pg_restore can operate in two modes. If a database name is specified, pg_restore connects to that database and restores archive contents directly into the database. Otherwise, a script containing the SQL commands necessary to rebuild the database is created and written to a file or standard output.

In short, the command either does what you want it to do, or something completely useless. In our case, the three days had been spent writing SQL commands to standard output (for which read the DOS prompt, with a buffer of just 40 lines) which could theoretically be used to piece the database back together line by line.

So where to next? We attempted to rerun the pg_restore command with the relevant stuff to make the output go into the database rather than just to the screen, but were faced with screenfuls of errors such as
ERROR: could not access file "$libdir/postgis-1.5": No such file or directory
Oh god, quoth we, it’s looking for a version of PostGIS which is not supported by a 64-bit installation of Postgres. Being the eager beavers we are, we immediately set about trying to build a version of PostGIS 1.5 in 64-bit bytecode, before throwing our hands in the air and going for coffee and doughnuts. Surely there must be a better way.

At this point in the story the deadlock began to be broken: why not upgrade the version of PostGIS on the 32-bit Postgres which actually worked so the pg_restore would be looking for the right library files? A Google search turned up this vital page about upgrading PostGIS to version 2.0. Unbelievably, the only way to do this is via what is ominously referred to as a “Hard Upgrade” (so could because it’s “hard” to work out how this could possibly be the officially recommended way of doing things).

Hard upgrading basically involves creating a special pg_dump with some important switches (-Fc -b -v) and running a Perl script included in PostGIS specifically for pg_restoring a PostGIS database. So, off to install Perl for Windows 8…

With Perl installed and the instructions for hard upgrading under our belts, things were moving forward. We were now up against this unhelpful error message (x1000 in the error log):
ERROR: type "geometry" does not exist
What? Sounds worrying.

This was resolved when we realised that the new database we’d created to house the old data was not a PostGIS database, but merely a PostgreSQL one. To create a PostGIS database you need to do:

CREATE DATABASE NewDB 
TEMPLATE=whateverYourPostGisTemplateDbIsCalled

So, we had an empty PostGIS-enabled database, a pg_dump dump file created using the magic command:
pg_dump -h localhost -p 5432 -U postgres -Fc -b -v -f "/somepath/olddb.backup" olddb
and we were ready to roll.

Some The Matrix-style scrolling DOS output making us look like the ninja’s we’re not.

The dainty command perl postgis_restore.pl dumpfile.dump | psql -h localhost -p 5432 -U postgres db_name 2> errors.txt was all that was left, and the process began. More The Matrix-style scrolling DOS prompt (but this time, for around ten minutes, rather than the three days under the old regime) and we finally had our 64-bit, PostGIS 2.0 database up, running and ready to use.

Will we make the same mistakes again? Probably? Are we going to start backing up our database from now one? You bet….