Big Data Problems have been around longer than you think

The Strata Conference is in town and one presentation that caught my eye was titled The Great Railway Caper: Big Data in 1955. John Graham-Cumming from CloudFlare gives a great overview on why some Big Data problems have been around since the early days of computing when computer filled entire rooms.

Back in 1955 the Government tasked a team of people, including Roger Coleman, to calculate the distances between nodes on the British Rail system using the first computer created for a commercial company called LEO owned by the Lyons company. They had to compute distances over 12 million nodes which represent the connection between the 5,550 stations in the United Kingdom using a computer that had the equivalent of a 500Hz CPU and 2 Kilobytes of RAM. To give you an idea of the size of the RAM on this mainframe, 2K is about the equivalent to storing 2048 characters in memory and that’s not including the space you would need to store the program while running. The problems don’t stop there. This before graph theory and the shortest path algorithms exsisted and pre-dates some of the standard, accepted algorithmic solutions by 5 years

John bridges some parallels between today’s big data challenges and the challenges these pioneers had to deal with from renting out time from the Lyons company (renting out time on Amazon’s EC2), challenges with storing output (reams of punchcards vs. hard drive space) and memory contraints (2K was the limit due to there being not enough mercury in the earth to build larger medium)

The talk is a very entertaining and provides insight into what appears to be a very old problem in big data processing. A very interesting watch indeed.