The venue for TRANSFORM

Last time I told you a bit about what to expect at the TRANSFORM unconference we’re hosting in May. But I haven’t really told you about the venue yet, and it’s one of the best bits.

We’re hosting the event at the Château de Rosay, near Rouen in France. This is a large house in a small village. It is completely self-contained: we can sleep there, eat there, work there, relax there. There’s room for about 45 people or so. The place looks spectacular:

A few people have said to me that they don’t feel like they could contribute much to a conversation about open source subsurface software… but this unconference is absolutely for anyone. If you are doing science or engineering underground, and if you are interested in the technology we use to do this, you can contribute.

Some of the things we’ll be talking about:

  • Which open tools exist, and can any of them be rescued from disuse?

  • Who is developing these tools and what kind of support do they need?

  • How can we make it easier for anybody to contribute to these projects?

  • What can we do right now that will improve the open stack the most?

All the place needs is a few subsurface scientists and engineers with latops, then it’s perfect! I hope you can join us there.

CdR-Taille_ori-WIDE.jpg

TRANSFORM 2019

A new unconference about subsurface software

What's happening at TRANSFORM?

Last week, I laid out the case for naming and focusing on an open subsurface stack. To this end, we’re hosting TRANSFORM, an unconference, in May. At TRANSFORM, we’ll be mapping out the present state of things, imagining the future, and starting to build it together. You’re invited.

This week, I want to tell you a bit more about what’s happening at the unconference.

BYOS: Bring Your Own Session

We’ll be using an unconference model. If you come to the event, I ask you to prepare a 45 to 60 minute ‘slot’. You can do whatever you like in your slot, the only requirements are that it’s somewhat aligned with the theme (rocks, computers, and openness), and that it produces something tangible. For example:

  • Start with a short presentation, maybe two, then hold a discussion. Capture the debate.

  • Hold a brainstorming session, generating ideas for new technology. Record the ideas.

  • Host a short sprint around a piece of existing software, checking code into GitHub.

  • Research the available open tools for a particular workflow or file type. Report back.

Really, anything is possible. There’s no need to propose topics ahead of time (but please feel free to discuss them in the #transform channel on the Software Underground). We’ll be gathering all the topics and organizing the schedule for Monday, Tuesday and Wednesday on Sunday evening and Monday morning. It’s just-in-time conferencing!

After the unconference, then the sprints

By the end of Wednesday, we should have a very good idea of what’s in the open subsurface stack, and what is missing. On Thursday and Friday, we’ll have the opportunity to build things. In small team, we can take on all sorts of things:

  • Improving the documentation of a project.

  • Writing tutorials or course material for existing tools.

  • Writing tests for an old or new project.

  • Adding functionality to an old project, or even starting a new project.

By the end of Friday, we should have a big pile of new stuff to play with, and lots of new threads to follow after the event.

Here’s a first-draft, high-level view of the schedule so far…

Transform_schedule_prelim.png

The open subsurface stack

Two observations:

  1. Agile has been writing about open source software for geology and geophysics for several years now (for example here in 2011 and here in 2016). Progress is slow. There are lots of useful tools, but lots of gaps too. Some new tools have appeared, others have died. Conclusion: a robust and trusted open stack is not going to magically appear.

  2. People — some of them representing large corporations — are talking more than ever about industry collaboration. Open data platofrms are appearing all over the place. And several times at the DigEx conference in Oslo last week I heard people talk about open source and open APIs. Some organizations, notably Equinor, seem to really mean business. Conclusion: there seems to be a renewed appetite for open source subsurface software.

A quick reminder of what ‘open’ means; paraphrasing The Open Definition and The Open Source Definition in a sentence:

Open data, content and code can be freely used, modified, and shared by anyone for any purpose.

The word ‘open’ is being punted around quite a bit recently, but you have to read the small print in our business. Just as OpenWorks is not ‘open’ by the definition above, neither is OpenSpirit (remember that?), nor the Open Earth Community. (I’m not trying to pick on Halliburton but the company does seem drawn to the word, despite clearly not quite understanding it.)

The conditions are perfect

Earlier I said that a robust and trusted ‘stack’ (a collection of software that, ideally, does all the things we need) is not going to magically appear. What do I mean by ‘robust and trusted’? It goes far beyond ‘just code’ — writing code is the easy bit. It means thoroughly tested, carefully documented, supported, and maintained. All that stuff takes work, and work takes people and time. And people and time mean money.

Two more observations:

  1. Agile has been teaching geocomputing like crazy — 377 people in the last year. In our class, the participants install a lot of Python libraries, including a few from the open subsurface stack: segyio, lasio, welly, and bruges. Conclusion: a proto-stack exists already, hundreds of users exist already, and some training and support exist already.

  2. The Software Underground has over 1200 members (you should sign up, it’s free!). That’s a lot of people that care passionately about computers and rocks. The Python and machine learning communies are especially active. Conclusion: we have a community of talented scientists and developers that want to get good science done.

So what’s missing? What’s stopping us from taking open source subsurface tech to the next level?

Nothing!

Nothing is stopping us. And I’ve reached the conclusion that we need to provide care and feeding to this proto-stack, and this needs to start now. This is what the TRANSFORM 2019 unconference is going to be about. About 40 of us (you’re invited!) will spend five days working on some key questions:

  • What libraries are in the Python ‘proto-stack’? What kind of licenses do they have? Who are the maintainers?

  • Do we need a core library for the stack? Something to manage some basic data structures, units of measure, etc.

  • What are we calling it, who cares about it, and how are we going to work together?

  • Who has the capacity to provide attention, developer time, existing code, or funds to the stack?

  • Where are the gaps in the stack, and which ones need to be filled first?

We won’t finish all this at the unconference. But we’ll get started. We’ll produce a lot of ideas, plans, roadmaps, GitHub issues, and new code. If that sounds like fun to you, and you can contribute something to this work — please come. We need you there! Get more info and sign up here.


Read the follow-up post >>> What’s happening at TRANSFORM?


Thumbnail photo of the Old Man of Hoy by Tom Bastin, CC-BY on Flickr.

The London hackathon

At the end of November I reported on the projects at the Oil & Gas Authority’s machine learning hackathon in Aberdeen. This post is about the follow-up event at London Olympia.


Like the Aberdeen hackathon the previous weekend, the theme was ‘machine learning’. The event unfolded in the Apex Room at Olympia, during the weekend before the PETEX conference. The venue was excellent, with attentive staff and top-notch catering. Thank you to the PESGB for organizing that side of things.

Thirty-eight digital geoscientists spent the weekend with us, and most of them also took advantage of the bootcamp on Friday; at least a dozen of those had not coded at all before the event. It’s such a privilege to work with people on their skills at these events, and to see them writing their own code over the weekend.

Here’s the full list of projects from the event…


Sweet spot hunting

Sweet Spot Sweat Shop: Alan Wilson, Geoff Chambers, Marco van der Linden, Maxim Kotenev, Rowan Haddad.

Project: We’ve seen a few people tackling the issue of making decisions from large numbers of realizations recently. The approach here was to generate maps of various outputs from dynamic modeling and present these to the user in an interactive way. The team also had maps of sweet spots, as determined by simulation, and they attempted to train models to predict these sweetspots directly from the property maps. The result was a unique and interesting exploration of the potential for machine learning to augment standard workflows in reservoir modeling and simulation. Project page. GitHub repo.

sweetspot_prediction.png

An intelligent dashboard

Dash AI: Vincent Penasse, Pierre Guilpain.

Project: Vincent and Pierre believed so strongly in their project that they ran with it as a pair. They started with labelled production history from 8 wells in a Pandas dataframe. They trained some models, including decision trees and KNN classifiers, to recognizedata issues and recommend required actions. Using skills they gained in the bootcamp, they put a flask web app in front of these to allow some interaction. The result was the start of an intelligent dashboard that not only flagged issues, but also recommended a response. Project page.

This project won recognition for impact.

DashAI-team.jpg

Predicting logs ahead of the bit

Team Mystic Bit: Connor Tann, Lawrie Cowliff, Justin Boylan-Toomey, Patrick Davies, Alessandro Christofori, Dan Austin, Jeremy Fortun.

Project: Thinking of this awesome demo, I threw down the gauntlet of real-time look-ahead prediction on the Friday evening, and Connor and the Mystic Bit team picked it up. They did a great job, training a series of models to predict a most likely log (see right) as well as upper and lower bounds. In the figure, the bit is currently at 1770 m. The model is shown the points above this. The orange crosses are the P90, P50 and P10 predictions up to 40 m ahead of the bit. The blue points below 1770 m have not yet been encountered. Project page. GitHub repo.

This project won recognition for best execution.

MysticBit_log-pred.png

The seals make a comeback

Selkie Se7en: Georgina Malas, Matthew Gelsthorpe, Caroline White, Karen Guldbaek Schmidt, Jalil Nasseri, Joshua Fernandes, Max Coussens, Samuel Eckford.

Project: At the Aberdeen hackathon, Julien Moreau brought along a couple of satellite image with the locations of thousands of seals on the images. They succeeded in training a model to correctly identify seal locations 80% of the time. In London, another team of almost all geologists picked up the project. They applied various models to the task, and eventually achieved a binary prediction accuracy of over 97%. In addition, the team trained a multiclass convolutional neural network to distinguish between whitecoats (pups), moulted seals (yearlings and adults), double seals, and dead seals.

Impressive stuff; it’s always inspiring to see people operating way outside their comfort zone. Project page.

selkie-seven.png

Interpreting the language of stratigraphy

The Lithographers: Gijs Straathof, Michael Steventon, Rodolfo Oliveira, Fabio Contreras, Simon Franchini, Malgorzata Drwila.

Project: At the project bazaar on Friday (the kick-off event at which we get people into teams), there was some chat about the recent paper on lithology prediction using recurrent neural networks (Jiang & James, 2018). This team picked up the idea and set out to reproduce the results from the paper. In the process, they digitized lithologies from one of the Posiedon wells. Project page. GitHub repo.

This project won recognition for teamwork.

Lithographers_team_logs.png

Know What You Know

Team KWYK: Malcolm Gall, Thomas Stell, Sebastian Grebe, Marco Conticini, Daniel Brown.

Project: There’s always at least one team willing to take on the billions of pseudodigital documents lying around the industry. The team applied latent semantic analysis (a standard approach in natural language processing) to some of the gnarlier documents in the OGA’s repository. Since the documents don’t have labels, this is essentially an unsupervised task, and therefore difficult to QC, but the method seemed to be returning useful things. They put it all in a nice web app too. Project page. GitHub repo.

This project won recognition for Most Value.


A new approach to source separation

Cocktail Party Problem: Song Hou, Fai Leung, Matthew Haarhoff, Ivan Antonov, Julia Sysoeva.

Project: Song, who works at CGG, has a history of showing up to hackathons with very cool projects, and this was no exception. He has been working on solving the seismic source separation problem, more generally known as the cocktail party problem, using deep learning… and seems to have some remarkable results. This is cool because the current deblending methods are expensive. At the hackathon he and his team looked for ways to express the uncertainty in the deblending result, and even to teach a model to predict which parts of the records were not being resolved with acceptable signal:noise. Highly original work and worth keeping an eye on.

cocktail-party-problem.jpg

A big Thank You to the judges: Gillian White of the OGTC joined us a second time, along with the OGA’s own Jo Bagguley and Tom Sandison from Shell Exploration. Jo and Tom both participated in the Subsurface Hackathon in Copenhagen earlier this year, so were able to identify closely with the teams.

Thank you as well to the sponsors of these events, who all deserve the admiration of the community for stepping up so generously to support skill development in our industry:

oga-sponsors.png

That’s it for hackathons this year! If you feel inspired by all this digital science, do get involved. There are computery geoscience conversations every day over at the Software Underground Slack workspace. We’re hosting a digital subsurface conference in France in May. And there are lots of ways to get started with scientific computing… why not give the tutorials at Learn Python a shot over the holidays?

To inspire you a bit more, check out some more pictures from the event…

90 years of seismic exploration

Today is an important day for applied geoscience. For one thing, it’s St Barbara’s Day. For another, 4 December is the anniversary of the first oil discovery drilled on seismic reflection data.

During World War 1 — thanks to the likes of Reginald Fessenden, Lawrence Bragg, Andrew McNaughton, William Sansome and Ludger Mintrop — acoustics emerged as a method of remote sensing. After the war, enterprising scientists looked for commercial applications of the technology. The earliest geophysical patent application I can find is Fessenden’s 1917 award for the detection of orebodies in mines, and Mintrop applied for a surface-based method in 1920, but the early patents pertained to refraction and diffraction experiments. The first reflection patent, US Patent no. 1,843,725, was filed on 1 May 1929 by John Clarence Karcher… almost 6 months after the discovery well was completed.

It’s fun to read the patent. It begins

This invention related to methods of and apparatus for determining the location and depth of geological formations beneath the surface of the earth and particularly to the determination of geological folding in these sub-surface formations. This invention has special application in the location of anticlines, faults and other structure favorable to the accumulation of petroleum.

Figures 4 and 5 show what must be the first ever depiction of shot gathers:

Figure 5 from Karcher’s patent, ‘Determination of subsurface formations’. It illustrates the arrivals of different wave modes at the receivers.

Karcher was born in Dale, Indiana, but moved to Oklahoma when he was five. He later studied electrical engineering and physics at the University of Oklahoma. Along with William Haseman, David Ohearn, and Irving Perrine, Karcher formed the Geological Engineering Company. Early tests of the technology took place in the summer of 1921 near Oklahoma City, and the men spent the next several years shooting commercial refraction surveys around Texas and Oklahoma — helping discover dozens of saltdome-related fields — and meanwhile trying to perfect the reflection experiment. During this period, they were competing with Mintrop’s company, Seismos.

The first well

In 1925, Karcher formed a new company — Geophysical Research Corporation, GRC, now part of Sercel — with Everette Lee DeGolyer of Amerada Petroleum Corporation and money from the Viscount Cowdray (owner of Pearson, now a publishing company, but originally a construction firm). Through this venture, Karcher eventually prevailed in the race to prove the seismic reflection method. From what I can tell, HB Peacock and/or JE Duncan successfully mapped the structure of the Ordovician Viola limestone, which overlies the prolific Simpson Group. On 4 December 1928, Amerada completed No. 1 Hallum well near Maud, Oklahoma.

The locations (as best I Can tell) of the first test of reflection seismology, the first seismic section, and the first seismic survey that led to a discovery. The map also shows where Karcher grew up; he went to university in Norman, south of Oklahoma City..

st-barbara-wusel007-CC-BY-SA.png

Serial entrepreneur

Karcher was a geophysical legend. After Geophysical Research Corporation, he co-founded Geophysical Service Incorporated (GSI) which was the origin of Texas Instruments and the integrated circuit. And he founded several explorations companies after that. Today, his name lives on in the J. Clarence Karcher Award that SEG gives each year to one or more stellar young geophysicists.

It seems appropriate that the oil discovery fell on the feast of St Barbara, the patron saint of miners and armorers and all who deal in explosives, but also of mathematicians and geologists. If you have a bottle near you this evening, raise a glass to St Barbara and the legion of geophysicists that have made seismic reflection such a powerful tool today.


Source material

The Scottish hackathon

On 16−18 November the UK Oil & Gas Authority (OGA) hosted its first hackathon, with Agile providing the format and technical support. This followed a week of training the OGA provided — again, through Agile — back in September. The theme for the hackathon was ‘machine learning’, and I’m pretty sure it was the first ever geoscience hackathon in the UK.

Thirty-seven digital geoscientists participated in the event at Robert Gordon University; most of them appear below. Many of them had not coded at all before the bootcamp on Friday, so a lot of people were well outside their comfort zones when we sat down on Saturday. Kudos to everyone!

The projects included the usual mix of seismic-based tasks, automated well log picking, a bit of natural language processing, some geospatial processing, and seals (of the mammalian variety). Here’s a rundown of what people got up to:


Counting seals on Scottish islands

Seal Team 6: Julien Moreau, James Mullins, Alex Schaaf, Balazs Kertesz, Hassan Tolba, Tom Buckley.

Project: Julien arrived with a cool dataset: over 6000 seals located on two large TIFFs images of Linga Holm, an island off Stronsay in the Orkneys. The challenge: locate the seals automatically. The team came up with a pipeline to generate HOG descriptors, train a support vector machine on about 20,000 labelled image tiles, then scan the large TIFFs to try to identify seals. Shown here is the output of one such scan, with a few false positive and false negatives. GitHub repo.

This project won the Most Impact award.

seals_test_image.png

Automatic classification of seismic sections

Team Seis Class: Jo Bagguley, Laura Bardsley, Chio Martinez, Peter Rowbotham, Mike Atkins, Niall Rowantree, James Beckwith.

Project: Can you tell if a section has been spectrally whitened? Or AGC’d? This team set out to attempt to teach a neural network the difference. As a first step, they reduced it to a binary classification problem, and showed 110 ‘final’ and 110 ‘raw’ lines from the OGA ESP 2D 2016 dataset to a convolutional neural net. The AI achieved an accuracy of 98% on this task. GitHub repro.

This project won recognition for a Job Well Done.


Why do get blocks relinquished?

Team Relinquishment Surprise: Tanya Knowles, Obiamaka Agbaneje, Kachalla Aliyuda, Daniel Camacho, David Wilkinson (not pictured).

Project: Recognizing the vast trove of latent information locked up in the several thousand reports submitted to the OGA. Despite focusing on relinquishment, they quickly discovered that most of the task is to cope with the heterogeneity of the dataset, but they did manage to extract term frequencies from the various Conclusions sections, and made an ArcGIS web app to map them.

relinquishment_team.jpg

Recognizing reflection styles on seismic

Team What’s My Seismic? Quentin Corlay, Tony Hallam, Ramy Abdallah, Zhihua Cui, Elia Gubbala, Amechi Halim.

Project: The team wanted to detect the presence of various seismic facies in a small segment of seismic data (with a view to later interpreting entire datasets). They quickly generated a training dataset, then explored three classifiers: XGBoost, Google’s AutoML, and a CNN. All of the methods gave reasonable results and were promising enough that the team vowed to continue investigating the problem. Project website. GitHub repo.

This project won the Best Execution award.

whats-my-seismic.png

Stretchy-squeezey well log correlation

Team Dynamic Depth Warping: Jacqueline Booth, Sarah Weihmann, Khaled Muhammad, Sadiq Sani, Rahman Mukras, Trent Piaralall, Julio Rodriguez.

Project: Making picks and correlations in wireline data is hard, partly because the stratigraphic signal changes spatially — thinning and thickening, and with missing or extra sections. To try to cope with this, the team applied a dynamic time (well, depth) warping algorithm to the logs, then looking for similar sections in adjacent wells. The image shows a target GR log (left) with the 5 most similar sections. Two, maybe four, of them seem reasonable. Next the team planned to incorporate more logs, and attach probabilities to the correlations. Early results looked promising. GitHub repo.


Making lithostrat picks

Team Marker Maker: Nick Hayward, Frédéric Ramon, Can Yang, Peter Crafts, Malcolm Gall

Project: The team took on the task of sorting out lithostratigraphic well tops in a mature basin. But there are speedbumps on the road to glory, e.g. recognizing which picks are lithological (as opposed to chronological), and which pick names are equivalent. The team spent time on various subproblems, but there’s a long road ahead.

This project won recognition for a Job Well Done.

marker-maker.jpg

Alongside these projects, Rob and I floated around trying to help, and James Beckwith hacked on a cool project of his own for a while — Paint By Seismic, a look at unsupervised classification on seismic sections. In between generating attributes and clustering, he somehow managed to help and mentor most of the other teams — thanks James!

Thank you!

Thank you to The OGA for these events, and in particular to Jo Bagguley, whose organizational skills I much appreciated over the last few weeks (as my own skills gradually fell apart). The OGA’s own Nick Richardson, the OGTC’s Gillian White, and Robert Gordon Universty’s Eyad Elyan acted as judges.

These organizations contributed to the success of these events — please say Thank You to them when you can!

oga-sponsors.png

I’ll leave you with some more photos from the event. Enjoy!

TRANSFORM 2019

DSC_6548.jpg

Yesterday I announced that we’re hatching a new plan. The next thing. Today I want to tell you about it.

The project has the codename TRANSFORM. I like the notion of transforms: functions that move you from one domain to another. Fourier transforms. Wavelet transforms. Digital subsurface transforms. Examples:

  • The transformative effect of open source software on subsurface science. Open source accelerates our work!

  • The transformative effect of collaborative, participatory events on the community. We can make new things!

  • The transformative effect of training on ourselves and our peers. Lots of us have new superpowers!

Together, we’ve built the foundation for a new, open software platform.

A domain shift

We think it’s time to refocus the hackathons as sprints — purposefully producing a sustainable, long-lasting, high quality, open source software stack that we can all use and combine into new tools, whether open or proprietary, free or commercial.

We think it’s time to bring a full-featured unconference into the mix. The half-day ‘unsessions’ open too many paths, and leave too few explored. We need more time — to share research, plan software projects, and write code.

Together, we can launch a new era in scientific computing for the subsurface.

At the core of this new era core is a new open-source software stack, created, maintained, and implemented by a community of scientists and organizations passionate about its potential.

Sign up!

Here’s the plan. We’re hosting an unconference from 5 to 11 May 2019, with full days from Monday to Friday. The event will take place at the Château de Rosay, near Rouen, France. It will be fully residential and fully catered. We have room for about 45 participants.

The goal is to lay down a road map for designing, funding, and building an open source software stack for subsurface. In the coming days and weeks, we will formulate the plan for the week, with input from the Software Underground. We want to hear from you. Propose a session! Host a sprint! Offer a bounty! There are lots of ways to get involved.

Map data: GeoBasis-DE / BKG / Google, photo: Chateauform. Click to enlarge.

If you want to be part of this effort, as a developer, an end-user, or a sponsor, then we invite you to join us.

The unconference fee will be EUR 1000, and accommodation and food will be EUR 1500. The student fees will be EUR 240 and EUR 360. There will be at least 5 bursaries of EUR 1000 available.

For the time being, we will be accepting early commitments, with a deposit of EUR 400 to secure a place (students wishing to register now should get in touch). Soon, you will be able to sign up online… we are working on a smooth process. In the meantime, click here to register your interest, share ideas for content, or sign up by paying a deposit.

Thanks for reading. We look forward to figuring this out together.


I’m delighted to be able to announce that we already have support from Dell EMC. Thanks as ever to David Holmes for his willingness to fund experiments!


In the US or Canada? Don’t despair! There will be a North American edition in Quebec in late September.

The next thing

Over the last several years, Agile has been testing some of the new ways of collaborating, centered on digital connections:

2010-2019-timeline.png
  • It all started with this blog, which started in 2010 with my move from Calgary to Nova Scotia. It’s become a central part of my professional life, but we’re all about collaboration and blogs are almost entirely one-way, so…

  • In 2011 we launched SubSurfWiki. It didn’t really catch on, although it was a good basis for some other experiments and I still use it sometimes. Still, we realized we had to do more to connect the community, so…

  • In 2012 we launched our 52 Things collaborative, open access book series. There are well over 5000 of these out in the wild now, but it made us crave a real-life, face-to-face collaboration, so…

  • In 2013 we held the first ‘unsession’, a mini-unconference, at the Canada GeoConvention. Over 50 people came to chat about unsolved problems. We realized we needed a way to actually work on problems, so…

  • Later that year, we followed up with the first geoscience hackathon. Around 15 or so of us gathered in Houston for a weekend of coding and tacos. We realized that the community needed more coding skills, so…

  • In 2014 we started teaching a one-day Python course aimed squarely at geoscientists. We only teach with subsurface data and algorithms, and the course is now 5 days long. We now needed a way to connect all these new hackers and coders, so…

  • In 2014, together with Duncan Child, we also launched Software Underground, a chat room for discussing topics related to the earth and computers. Initially it was a Google Group but in 2015 we relaunched it as an open Slack team. We wanted to double down on scientific computing, so…

  • In 2015 and 2016 we launched a new web app, Pick This (returning soon!), and grew our bruges and welly open source Python projects. We also started building more machine learning projects, and getting really good at it.

Growing and honing

We have spent the recent years growing and honing these projects. The blog gets about 10,000 readers a month. The sixth 52 Things book is on its way. We held two public unsessions this year. The hackathons have now grown to 60 or so hackers, and have had about 400 participants in total, and five of them this year already (plus three to come!). We have also taught Python to 400 geoscientists, including 250 this year alone. And the Software Underground has over 1000 members.

In short, geoscience has gone digital, and we at Agile are grateful and excited to be part of it. At no point in my career have I been more optimistic and energized than I am right now.

So it’s time for the next thing.

The next thing is starting with a new kind of event. The first one is 5 to 11 May 2019, and it’s happening in France. I’ll tell you all about it tomorrow.

Reproducibility Zoo

repro-zoo-main-banner.png

The Repro Zoo was a new kind of event at the SEG Annual Meeting this year. The goal: to reproduce the results from well-known or important papers in GEOPHYSICS or The Leading Edge. By reproduce, we meant that the code and data should be open and accessible. By results, we meant equations, figures, and other scientific outcomes.

And some of the results are scary enough for Hallowe’en :)

What we did

All the work went straight into GitHub, mostly as Jupyter Notebooks. I had a vague goal of hitting 10 papers at the event, and we achieved this (just!). I’ve since added a couple of other papers, since the inspiration for the work came from the Zoo… and I haven’t been able to resist continuing.

The scene at the Repro Zoo. An air of quiet productivity hung over the booth. Yes, that is Sergey Fomel and Jon Claerbout. Thank you to David Holmes of Dell EMC for the picture.

The scene at the Repro Zoo. An air of quiet productivity hung over the booth. Yes, that is Sergey Fomel and Jon Claerbout. Thank you to David Holmes of Dell EMC for the picture.

Here’s what the Repro Zoo team got up to, in alphabetical order:

  • Aldridge (1990). The Berlage wavelet. GEOPHYSICS 55 (11). The wavelet itself, which has also been added to bruges.

  • Batzle & Wang (1992). Seismic properties of pore fluids. GEOPHYSICS 57 (11). The water properties, now added to bruges.

  • Claerbout et al. (2018). Data fitting with nonstationary statistics, Stanford. Translating code from FORTRAN to Python.

  • Claerbout (1975). Kolmogoroff spectral factorization. Thanks to Stewart Levin for this one.

  • Connolly (1999). Elastic impedance. The Leading Edge 18 (4). Using equations from bruges to reproduce figures.

  • Liner (2014). Long-wave elastic attentuation produced by horizontal layering. The Leading Edge 33 (6). This is the stuff about Backus averaging and negative Q.

  • Luo et al. (2002). Edge preserving smoothing and applications. The Leading Edge 21 (2).

  • Yilmaz (1987). Seismic data analysis, SEG. Okay, not the whole thing, but Sergey Fomel coded up a figure in Madagascar.

  • Partyka et al. (1999). Interpretational aspects of spectral decomposition in reservoir characterization.

  • Röth & Tarantola (1994). Neural networks and inversion of seismic data. Kudos to Brendon Hall for this implementation of a shallow neural net.

  • Taner et al. (1979). Complex trace analysis. GEOPHYSICS 44. Sarah Greer worked on this one.

  • Thomsen (1986). Weak elastic anisotropy. GEOPHYSICS 51 (10). Reproducing figures, again using equations from bruges.

As an example of what we got up to, here’s Figure 14 from Batzle & Wang’s landmark 1992 paper on the seismic properties of pore fluids. My version (middle, and in red on the right) is slightly different from that of Batzle and Wang. They don’t give a numerical example in their paper, so it’s hard to know where the error is. Of course, my first assumption is that it’s my error, but this is the problem with research that does not include code or reference numerical examples.

Figure 14 from Batzle & Wang (1992). Left: the original figure. Middle: My attempt to reproduce it. Right: My attempt in red, overlain on the original.

This was certainly not the only discrepancy. Most papers don’t provide the code or data to reproduce their figures, and this is a well-known problem that the SEG is starting to address. But most also don’t provide worked examples, so the reader is left to guess the parameters that were used, or to eyeball results from a figure. Are we really OK with assuming the results from all the thousands of papers in GEOPHYSICS and The Leading Edge are correct? There’s a long conversation to have here.

What next?

One thing we struggled with was capturing all the ideas. Some are on our events portal. The GitHub repo also points to some other sources of ideas. And there was the Big Giant Whiteboard (below). Either way, there’s plenty to do (there are thousands of papers!) and I hope the zoo continues in spirit. I will take pull requests until the end of the year, and I don’t see why we can’t add more papers until then. At that point, we can start a 2019 repo, or move the project to the SEG Wiki, or consider our other options. Ideas welcome!

IMG_20181017_163926.jpg

Thank you!

The following people and organizations deserve accolades for their dedication to the idea and hard work making it a reality. Please give them a hug or a high five when you see them.

  • David Holmes (Dell EMC) and Chance Sanger worked their tails off on the booth over the weekend, as well as having the neighbouring Dell EMC booth to worry about. David also sourced the amazing Dell tech we had at the booth, just in case anyone needed 128GB of RAM and an NVIDIA P5200 graphics card for their Jupyter Notebook. (The lights in the convention centre actually dimmed when we powered up our booths in the morning.)

  • Luke Decker (UT Austin) organized a corps of volunteer Zookeepers to help manage the booth, and provided enthusiasm and coding skills. Karl Schleicher (UT Austin), Sarah Greer (MIT), and several others were part of this effort.

  • Andrew Geary (SEG) for keeping things moving along when I became delinquent over the summer. Lots of others at SEG also helped, mainly with the booth: Trisha DeLozier, Rebecca Hayes, and Beth Donica all contributed.

  • Diego Castañeda got the events site in shape to support the Repro Zoo, with a dashboard showing the latest commits and contributors.

Café con leche

At the weekend, 28 digital geoscientists gathered at MAZ Café in Santa Ana, California, to sprint on some open geophysics software projects. Teams and individuals pushed pull requests — code contributions to open source projects — left, right, and centre. Meanwhile, Senah and her team at MAZ kept us plied with coffee and horchata, with fantastic food on the side.

Because people were helping each other and contributing where they could, I found it a bit hard to stay on top of what everyone was working on. But here are some of the things I heard at the project breakdown on Sunday afternoon:

Gerard Gorman, Navjot Kukreja, Fabio Luporini, Mathias Louboutin, and Philipp Witte, all from the devito project, continued their work to bring Kubernetes cluster management to devito. Trying to balance ease of use and unlimited compute turns out to be A Hard Problem! They also supported the other teams hacking on devito.

Thibaut Astic (UBC) worked on implementing DC resistivity models in devito. He said he enjoyed the expressiveness of devito’s symbolic equation definitions, but that there were some challenges with implementing the grad, div, and curl operator matrices for EM.

Vitor Mickus and Lucas Cavalcante (Campinas) continued their work implementing a CUDA framework for devito. Again, all part of the devito project trying to give scientists easy ways to scale to production-scale datasets.

That wasn’t all for devito. Alongside all these projects, Stephen Alwon worked on adapting segyio to read shot records, Robert Walker worked on poro-elastic models for devito, and Mohammed Yadecuri and Justin Clark (California Resources) contributed too. On the second day, the devito team was joined by Felix Hermann (now Georgia Tech), with Mengmeng Yang, and Ali Siakoohi (both UBC). Clearly there’s something to this technology!

Brendon Hall and Ben Lasscock (Enthought) hacked on an open data portal concept, based on the UCI Machine Learning Repository, coincidentally based just down the road from our location. The team successfully got some examples of open data and code snippets working.

Jesper Dramsch (Heriot-Watt), Matteo Niccoli (MyCarta), Yuriy Ivanov (NTNU) and Adriana Gordon and Volodymyr Vragov (U Calgary), hacked on bruges for the weekend, mostly on its documentation and the example notebooks in the in-bruges project. Yuriy got started on a ray-tracing code for us.

Nathan Jones (California Resources) and Vegard Hagen (NTNU) did some great hacking on an interactive plotting framework for geoscience data, based on Altair. What they did looked really polished and will definitely come in useful at future hackathons.

All in all, an amazing array of projects!

This event was low-key compared to recent hackathons, and I enjoyed the slightly more relaxed atmosphere. The venue was also incredibly supportive, making my life very easy.

A big thank you as always to our sponsors, Dell EMC and Enthought. The presence of the irrepressible David Holmes and Chris Lenzsch (both Dell EMC), and Enthought’s new VP of Energy, Charlie Cosad, was greatly appreciated.

sponsors.svg.png

We will definitely be revisiting the sprint concept in the future einmal ist keinmal, as they say. Devito and bruges both got a boost from the weekend, and I think all the developers did too. So stay tuned for the next edition!