Two decades of geophysics freedom

This year is the 20th anniversary of the release of Seismic Un*x as free software. It is six years since the first open software workshop at EAGE. And it is one year since the PTTC open source geoscience workshop in Houston, where I first met Karl Schleicher, Joe Dellinger, and a host of other open source advocates and developers. The EAGE workshop on Friday looked back on all of this, surveyed the current landscape, and looked forward to an ever-increasing rate of invention and implementation of free and open geophysics software.

Rather than attempting any deep commentary, here's a rundown of the entire day:

Joe Dellinger, one of the industry's best-known open source advocates and high-performance computing gurus, kicked off with a quick history lesson. Starting with Berkeley UNIX and mod.sources, he reminisced about the births of SEPlib, USP, and SU in the early–mid 1980s, the open source release of SU in 1992, and the industry turmoil that resulted in FreeUSP and CPseis, among others, in the early 2000s. He also mentioned that, while the goal of the last meeting (in 2006) had been unification, what had resulted was awareness and proliferation—and he expected this to continue. 

Victoria Stodden, a well-known computational statistician and  openness advocate, continued with a passionate cry for reproducibility—the principle at the heart of openness in science. Her supervisor at Stanford, David Donoho, was a champion of the geophysicist Jon Claerbout, and is committed to really reproducible research:

An article about computational science in a scientifi c publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the fi gures.

Victoria has a fascinating background, from statistics to computation to law to science policy in the US. She has collected plenty of data about openness in science, mainly from statistics and the life sciences, and published some Really Useful Stuff like the Reproducible Research Standard [PDF].

Karl Schleicher did a great job of organizing the day. He had judiciously mixed traditional talks with short poster adverts — 5-minute spiels on the dozen or so posters and demos around the room. We heard from

  • Bjørn Olofson, with an update on his terrific SeaSeis project: "Simple tasks should be simple"
  • Nick Tanushev, from ZTerra: "The best documentation is copious examples"
  • Chuck Mosher, ConocoPhillips brain-box and creator of the JavaSeis data description protocol
  • Ricardo Biloti, describing his GêBR project (pron. gee-bee-ar)
  • Man-whose-name-I-missed describing how BotoSeis is focusing on being an SU interface for mortals
  • Karl Schleicher, from UT Austin, pointing everyone to his open data page at wiki.seg.org

Charles Jones continued with a whirlwind tour of BG Group's adventures in open source, and repeatedly shared the company's perspective throughout the day. BG is an important contributor to the core of OpendTect, and puts it on every desktop. The seismic processing group also uses open software and tools in their big data, big compute realm.

Helène Huck of dGB Earth Sciences updated the crowd on OpendTect, the open source seismic interpretation tool which continues to grow and spread. She shared some of the reasons why the project has been such a success: spending time on UX and documentation, building academic relationships, nurturing the community, maintaining the plug-in architecture, and using an agile development process.

Matt Hall of Agile Geoscience was next, and gave an update on our own various projects, including Modelr—a project so new I have not even had time to write about here yet. I won't cram it here, but write something up in the next week or two.

Sergey Fomel of UT Austin pointed to some examples of what reproducibility means right after lunch. Madagascar is committed to reproducibility—some remarked that this may be at the expense of improvements in functionality and usability. He was followed by more poster introductions:

John Stockwell moved us along with tales of the 26 years since Seismic Un*x, aka SU, started. He continued with an overview of how seismic signal processing is taught today at Colorado School of Mines — perhaps a pattern for other teachers of geophysics to adopt. Don't miss the new(ish) SU wiki!

Bob Clapp of Stanford gave SEPlib a similar treatment, emphasizing some processing methods on the rise in their labs: parallel (of course), GPU, interactive processing (see my recent plea for more of this). Again, reproducibility is at the heart of everything: every non-drawn figure in every paper or report must be reproducible... though some run for weeks.

Bill Symes at Rice continued with a look at signal processing—especially wavefield processing and inversion—at Rice. Their goal is to create high-level toolboxes for the modeling and optimization problems at the core of inversion.

Didrik Pinte of Enthought closed the session by cataloging the various open source components of Enthought's essential Python distribution: Chaco, Mayavi, Traits/Enaml, and ETS. The parallel/GPU themes continued too, and Didrick finished with a reminder about the SciPy conference next month.

At the end of the day were two mini-sessions of the sort that most conferences would benefit from: lightning talks and discussion. There was plenty of energy in the room, and the usual spectrum of opinions and experiences. At one point, I counted over 60 people in the room (side note: only four women, and only 2.5 geologists!), so I think over 70 people came through the room at some point. The community is thriving, and if it can find a virtual forum in which to continue the conversation, I know we'll see some amazing things.