My StrataConf highlights

Lots went on at the geologically named, but not geologically inclined, Strata Conference in London. Here are my highlights:

George Dyson was one of the keynote speakers on the first morning. The son of the British–American mathematician Freeman Dyson, George is an author and historian of science and computing. He talked about the history of storage, starting with tally sticks, through the 53kB of global digital storage in 1953, to today. His talk was fascinating. 

Simon Rogers was one of several speakers from the Guardian newspaper, one of the most progressive and online-friendly news outlets in the world. The paper has a host of strategies for putting data first:

  • Their data and viz geeks sit in the middle of news room
  • They built their own software library for data viz, Miso
  • They share the data behind every story on their Datablog

Duncan Irving from Teradata gave the audience a glimpse of the big data geoscientists wield, as I alluded to yesterday. Teradata does data warehousing, but with high technology extras like distributed storage and level of detail layers. I was intrigued by one of the technologies he talked about — SQL on Hadoop. This sounds like gobbledygook, but here's the (possibly horribly misunderstood) gist: store statistical attributes of a massive seismic volume in a database, then you can query them. "Show me all the traces with such-and-such seismic facies."   

Hjalmar Gislason from Datamarket, whose recent products include Energy Portal, gave us his best practices for publishing data:

  • Use simple formats, like CSV
  • Aim for at least 3 stars in Tim Berners-Lee's system
  • Be consistent across the datasets you publish
  • Put unique IDs everywhere, especially on tables and columns
  • Provide FAQs and clear feedback channels for users
  • Be clear about the license terms of the data

Ben Goldacre, author and bad science crimefighter, gave a keynote on the second day. Almost vibrating with energy, he described how the most basic bias-fighting tool in medicine — randomized controlled trials — might be applied to improving government services (Haynes et al., 2012, Test, learn, adapt). 

At the end of the two days, I had the usual feeling of fullness, fatigue, and anticlimax... but also the inspired, impatient, creative energy that I hope for from events. The consistency of the themes was encouraging — data wants to be free, visualization is necessary but insufficient, reproducibility is core, stories drive us — these are ideas we embrace. They're at the heart of the quiet revolution going on in the world, but perhaps not yet at the heart of our subsurface professional communities. 

Photo by flickr user bjelkeman.