Lagging panel data in R

Posted on Thursday 16 August 2012 by roblevy

I never thought I’d say this, but Stata rules the roost at at least one thing: lagging a column in panel data. What does that mean?

Imagine you’re looking at test scores, and you think this year’s test score depends on last year’s (a sensible assumption perhaps). Your dataset would look something like this:

Year Name Score Previous Score  Score Before That
2010 Bob  30    .               .
2011 Bob  59    30              .
2012 Bob  63    59              30

The previous scores are calculated by ‘lagging’ the data by one and two periods (note that the dot represents a missing value. Stata has “.” and R has “NA”)
To lag data in Stata you simply type L.Score and for inclusion of lots of lags, L(0/3).Score will give you the score, last year’s score, the year before that AND the year before that one too. Amazing!

Now, as I search for a way to do this in R, imaging my horror on stumbling upon this syntactical nightmare. Being involved with computing gets you used to difference syntax systems. For example, I know that the SQL-like “WHERE Name=’Rob’ AND Home=’London’” looks like “AND(A$2$=’Rob’,A$3$=’London’) in Excel. But you get the feeling that, as computer languages beging to be able to claim some maturity of design, some kind of basic tenets of language syntax would begin to emerge. (Note that, were Excel to be designed today, there’s no way the developers would opt for the AND(A,B) tomfoolery.)
So, to lag in R, I have to do this:

plm(Name ~ Score, lag(Score=2,c(2,3)))

Try getting your head around that! Score for readability of code? Nought! Score for being able to remember how to do this next time I want to? Nought point nought!

Ah, the joys of community-developed software.

Econometrics, R, R development, r studio