Panel data in R for dummies

Since I’ve recently been in the business of making general rules, here’s another: the geniuses who can write brilliant R packages still write documentation like it was the 1990s. Anyone old enough to have lived through the horror of 1990s technical user manuals will know what I mean. Not for them the friendly Ikea-style quick start guide. These manuals usually started thus:

Chapter 1: Configuring the IDE/DMMA Bus Jumper.
Before connecting LPT or COMx peripherals to this device, set the IDE/DMMA Bus Jumper depending on the communication mode of your peripheral. Choosing this jumper setting incorrectly may result in unpredictable results which may damage this device.

So it is with the excellent panel data package for R, called plm. The documentation begins with the following sentence:

In plm the data argument may be an ordinary data.frame but, in this case, an argument called index has to be added to indicate the structure of the data. This can be NULL (the default value), it is then assumed that the first two columns contain the individual and the time index and that observations are ordered by individual and by time period

and continues in much the same vein.

Anyway, in the spirit of relieving my fellow researcher from ever having to read this stuff to get started with panel data regression in R, here is a much-condensed version of what I’ve read which will get one started. Hurrah for me.

library(package=RPostgreSQL)
library(package=plm)
## Load the PostgreSQL driver
drv <- dbDriver("PostgreSQL")

## Open a connection
con <- dbConnect(drv, host="xxx",dbname="xxx",user="xxx",password="xxx")

## Read the whole contents of a table into a dataframers
rs <- dbReadTable(con,"xxx_table_name")

## create a pdata.frame from the dataframe. This is a technicality allowing the plm package to work with the data.
paneldata <- pdata.frame(rs, index=c("category_variable_name","time_variable_name"))

## we can now 'do stuff' with the data in a panel sense: things like lagging/leading and calculating differences:
head(paneldata$year)
head(lag(paneldata$year),1)
head(lag(paneldata$year),0:2)