Subsurface Hackathon project round-up, part 1

The dust has settled from the Hackathon in Paris two weeks ago. Been there, done that, came home with the T-shirt.

In the same random order they presented their 4-minute demos to our panel of esteemed judges, I present a (very) abbreviated round-up of what the teams made together over the course of the weekend. With the exception of a few teams who managed to spontaneously nucleate before the hackathon, most of these teams were comprised of people who had never met each other before the event.

Just let that sink in for a second: teams of mostly mutual strangers built 13 legit machine-learning-based geoscience applications in one weekend. 


Log Healer      

Log Healer

 

 

An automated well log management system

Team Un-well Loggers: James Wanstall (Glencore), Niket Doshi (Teradata), Joseph Taylor (Teradata), Duncan Irving (Teradata), Jane McConnell (Teradata).

Tech: Kylo (NiFi, HDFS, Hive, Spark)

If you're working with well logs, and if you've got lots of them, you've almost certainly got gaps or inaccuracies from curve to curve and from well to well. The team's scalable, automated well-log file management system Log Healer computes missing logs and heals broken ones. Amazing.


An early result from Team Janus. The image on the left is ground truth, that on the right is predicted. Many of the features are present. Not bad for v0.1!

An early result from Team Janus. The image on the left is ground truth, that on the right is predicted. Many of the features are present. Not bad for v0.1!

Meaningful cross sections from well logs

Team Janus: Daniel Buse, Johannes Camin, Paul Gabriel, Powei Huang, Fabian Kampe (all from GiGa Infosystems)

The team built an elegant machine learning workflow to attack the very hard problem of creating geologically realistic cross-section from well logs. The validation algorithm compares pixels to score the result. 


Think Section's mindblowing photomicrograph labeling tool can also make novel camouflage patterns.

Think Section's mindblowing photomicrograph labeling tool can also make novel camouflage patterns.

Paint-by-numbers on digital thin sections

Team Think Section: Diego Castaneda (Agile*), Brendon Hall (Enthought), Roeland Nieboer (Fugro), Jan Niederau (RWTH Aachen), Simon Virgo (RWTH Aachen)

Tech: Python (Scikit Learn, Scikit Image, Flask, NumPy, SciPy, Pandas), AWS for hosting app & Jupyter server.

Description: Mineral classification and point-counting on thin sections can be an incredibly tedious and time consuming task. Team Think Section trained a model to segregate, classify, and label mineral grains in 200GB of high-resolution multi-polarization-angle photomicrographs.


Team Classy's super-impressive shot gather seismic event Detection technology. Left: synthetic gather. Middle: predicted labels. Right: truth.

Team Classy's super-impressive shot gather seismic event Detection technology. Left: synthetic gather. Middle: predicted labels. Right: truth.

Event detection on seismic shot gathers

Team Classy: Princy Ikotoko Ndong (EOST), Anna Lim (NTNU), Yuriy Ivanov (NTNU), Song Hou (CGG), Justin Gosses (Valador).

Tech: Python (NumPy, Matplotlib), Jupyter notebooks.

The team created an AI which identifies and labels different events on a shot gather image. It can find direct waves, reflections, multiples or coherent noise. It uses a support vector machine for classification, and is simple and fast. 


model2seismic: An entirely new way to do modeling and inversion. Take note: the neural network that made this image knows no physics.

model2seismic: An entirely new way to do modeling and inversion. Take note: the neural network that made this image knows no physics.

Forward and inverse modeling without the physics

Team GANsters - Lukas Mosser (Imperial), Wouter Kimman (Meridian), Jesper Dramsch (Copenhagen), Alfredo de la Fuente (Wolfram), Steve Purves (Euclidity)

Tech: PyNoddy, homegrown Python ML tools.

The GANsters created a deep-learning image-translation-based seismic inversion and forward modelling system. I urge you to go and look at their project on model2seismic. If it doesn't give you goosebumps, you are geophysically inert.


Team Pick Pick Log

Team Pick Pick Log

Machine learning for for stratigraphic interpretation

Team Pick Pick LOG - Antoine Vanbesien (EOST), Fidèle Degni (Mines St-Étienne), Massinissa Mesbahi (Pau), Natsuki Gunji (Mines St-Étienne), Cédric Menut (EOST).

This team of data science and geoscience undergrads attacked an automated stratigraphic interpretation task. They used supervised learning to determine lithology from well logs in Alberta's Athabasca play, then attempted to teach their AI to pick stratigraphic tops. Impressive!


Pretty amazing, huh? The power of the hackathon to bring a project from barely-even-an-idea to actual-working-code is remarkable! And we're not even halfway through the teams: tomorrow I'll describe the other seven projects. 

Machine learning meets seismic interpretation

Agile has been reverberating inside the machine learning echo chamber this past week at EAGE. The hackathon's theme was machine learning, Monday's workshop was all about machine learning. And Matt was also supposed to be co-chairing the session on Applications of machine learning for seismic interpretation with Victor Aare of Schlumberger, but thanks to a power-cut and subsequent rescheduling, he found himself double-booked so, lucky me, he invited me to sit in his stead. Here are my highlights, from the best seat in the house.

Before I begin, I must mention the ambivalence I feel towards the fact that 5 of the 7 talks featured the open-access F3 dataset. A round of applause is certainly due to dGB Earth Sciences for their long time stewardship of open data. On the other hand, in the sardonic words of my co-chair Victor Aarre, it would have been quite valid if the session was renamed The F3 machine learning session. Is it really the only quality attribute research dataset our industry can muster? Let's do better.

Using seismic texture attributes for salt classification

Ghassan AlRegib ruled the stage throughout the session with not one, not two, but three great talks on behalf of himself and his grad students at Georgia Institute of Technology (rather than being a show of bravado, this was a result of problems with visas). He showed some exciting developments in shallow learning methods for predicting facies in seismic data. In addition to GLCM attributes, he also introduced a couple of new (to me anyway) attributes for salt classification. Namely, textural gradient and a thing he called seismic saliency, a metric modeled after the human visual system describing the 'reaction' between relative objects in a 3D scene. 

Twelve Seismic attributes used for multi-attribute salt-boundary classification. (a) is RMS Amplitude, (B) to (M) are TEXTURAL attributes. See abstract for details. This figure is copyright of Ghassan AlRegib and licensed CC-BY-SA by virtue of being generated from the F3 dataset of dGB and TNO.

Ghassan also won the speakers' lottery, in a way. Due to the previous day's power outage and subsequent reshuffle, the next speaker in the schedule was a no-show. As a result, Ghassan had an extra 20 minutes to answer questions. Now for most speakers that would be a public-speaking nightmare, but Ghassan hosted the onslaught of inquiring minds beautifully. If we hadn't had to move on to the next next talk, I'm sure he could have entertained questions all afternoon. I find it fascinating how unpredictable events like power outages can actually create the conditions for really effective engagement. 

Salt classification without using attributes (using deep learning)

Matt reported on Anders Waldeland's work a year ago, and it was interesting to see how his research has progressed, as he nears the completion of his thesis. 

Anders successfully demonstrated how convolutional neural networks (CNNs) can classify salt bodies in seismic datasets. So, is this a big deal? I think it is. Indeed, Anders's work seems like a breakthough in seismic interpretation, at least of salt bodies. To be clear, I don't think this means that it is time for seismic interpreters to pack up and go home. But maybe we can start looking forward to spending our time doing less tedious things than picking complex salt bodies.  

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset, coloured by elevation. Copyright of A Waldeland, used with permission.

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset, coloured by elevation. Copyright of A Waldeland, used with permission.

He trained a CNN on one manually labeled slice of a 3D cube and used the network to automatically classify the full 3D salt body (on the right in the figure). Conventional algorithms for salt picking, such as that used by AlRegib (see above), typically rely on seismic attributes to define a feature space. This requires professional insight and judgment, and is prone to error and bias. Nicolas Audebert mentioned the same shortcoming in his talk in the workshop Matt wrote about last week. In contrast, the CNN algorithm works directly on the seismic data, learning the most discriminative filters on its own, no attributes needed

Intuition training

Machine learning isn't just useful for computing in the inverse direction such as with inversion, seismic interpretation, and so on. Johannes Amtmann showed us how machine learning can be useful for ranking the performance of different clustering methods using forward models. It was exciting to see: we need to get back into the habit of forward modeling, each and every one of us. Interpreters build synthetics to hone their seismic intuition. It's time to get insanely good at building forward models for machines, to help them hone theirs. 

There were so many fascinating problems being worked on in this session. It was one of the best half-day sessions of technical content I've ever witnessed at a subsurface conference. Thanks and well done to everyone who presented.


Machine learning and analytics in geoscience

We're at EAGE in Paris. I'm sitting in a corner of the exhibition because the power is out in the main hall, so all the talks for the afternoon have been postponed. The poor EAGE team must be beside themselves, I feel for them. (Note to future event organizers: white boards!)

Yesterday Diego, Evan, and I — along with lots of hackathon participants — were at the Data Science for Geosciences workshop, an all-day machine learning fest. The session was chaired by Cyril Agut (Total), Marianne Cuif-Sjostrand (Total), Florence Delprat-Jannaud (IFPEN), and Noalwenn Dubos-Sallée (IFPEN), and they had assembled a good programme, with quite a bit of variety.

Michel Lutz, Group Data Officer at Total, and adjunct at École des Mines de Saint-Étienne, gave a talk entitled, Data science & application to geosciences: an introduction. It was high-level but thoughtful, and such glimpses into large companies are always interesting. The company seems to have a mature data science strategy, and a well-developed technology stack. Henri Blondelle (AgileDD) asked about open data at the end, and Michel somewhat sidestepped on specifics, but at least conceded that the company could do more in open source code, if not data.

Infrastructure, big data, and IoT

Next we heard a set of talks about the infrastructure aspect of big (really big) data.

Alan Smith of Luchelan told the group about some negative experiences with Hadoop and seismic data (though it didn't seem to me that his problems were insoluble since I know of several projects that use it), and the realization that sometimes you just need fast infrastructure and custom software.

Hadi Jamali-Rad of Shell followed with an IoT story from the field. He had deployed a large number of wireless seismic sensors around a village in Holland, then tested various aspects of the communication system to answer questions like, what's the packet loss rate when you collect data from the nodes? What about from a balloon stationed over the site?

Duncan Irving of Teradata asked, Why aren't we [in geoscience] doing live analytics on 100PB of live data like eBay? His hypothesis is that IT organizations in oil and gas failed to keep up with key developments in data analytics, so now there's a crisis of sorts and we need to change how we handle our processes and culture around big data. 

Machine learning

We shifted gears a bit after lunch. I started with a characteristically meta talk about how I think our community can help ensure that our research and practice in this domain leads to good places as soon as possible. I'll record it and post it soon.

Nicolas Audebert of ONERA/IRISA presented a nice application of a 3D convolutional neural network (CNN) to the segmentation and classification of hyperspectral aerial photography. His images have between about 100 and 400 channels, and he finds that CNNs reduce error rates by up to about 50% (compared to an SVM) on noisy or complex images. 

Henri Blondelle of Agile Data Decisions talked about his experience of the CDA's unstructured data challenge of 2016. About 80% of the dataset is unstructured (e.g. folders of PDFs and TIFFs), and Henri's vision is to transform 80% of that into structured data, using tools like AgileDD's IQC to do OCR and heuristic labeling. 

Irina Emelyanova of CSIRO provided another case study: unsupervised e-facies prediction using various types of clustering, from K-means to some interesting variants of self-organizing maps. It was refreshing to see someone revealing a lot of the details of their implementation.

Jan Limbeck, a research scientist at Shell wrapped up the session with an overview of Shell's activities around big data and machine learning, as they prepare for exabytes. He mentioned the Mauricio Araya-Polo et al. paper on deep learning in seismic shot gathers in the special March issue of The Leading Edge — clearly it's easiest to talk about things they've already published. He also listed a lot of Shell's machine learning projects (frac optimization, knowledge graphs, reservoir simulation, etc), but there's no way to know what state they are in or what their chances of success are. 

As well as all the 9 talks, there were 13 posters, about a third of which were on infrastructure stuff, with the rest providing more case studies. Unfortunately, I didn't get the chance to look at them in any detail, but I appreciated the organizers making time for discussion around the posters. If they'd also allowed more physical space for the discussion it could have been awesome.

Analytics!

After hearing about Mentimeter from Chris Jackson I took the opportunity to try it out on the audience. Here are the results, I think they are fairly self-explanatory... 

I also threw in the mindmap I drew at the end as a sort of summary. The vertical axis represents something like'abstraction' or 'time' (in a workflow sense) and I think each layer depends somewhat on those beneath it. It probably makes sense to no-one but me.

Breakout!

It seems clear that 2017 is the breakout year for machine learning in petroleum geoscience, and in petroleum in general. If your company or institution has not yet gone beyond "watching" or "thinking about" data science and machine learning, then it is falling behind by a little more every day, and it has been for at least a year. Now's the time to choose if you want to be part of what happens next, or a victim of it.

Le grand hack!

It happened! The Subsurface Hackathon drew to a magnificent close on Sunday, in an intoxicating cloud of code, creativity, coffee, and collaboration. It will take some beating.

Nine months in gestation, the hackathon was on a scale we have not attempted before. Total E&P joined us as co-organizers and made this new reach possible. They also let us use their amazing Booster — a sort of intrapreneurship centre — which was perfect for the event. Their team (thanks especially to Marine and Caroline!) did an amazing job of hosting, as well as providing several professionals from their subsurface software (thanks Jonathan and Yannick!) and data science teams (thanks Victor and David!). Arnaud Rodde and Frédéric Broust, who had to do some organization hacking of their own to make something as weird as a hackathon happen, should be proud of their teams.

Instead of trying to describe the indescribable, here are some photos:

BY THE NUMBERS

16 hours of code
13 teams
62 hackers
44 students
4 robots
568 croissants
0 lost-time incidents

I won't say much about the projects for now. The diversity was high — there were projects in thin section photography, 3D geological modeling, document processing, well log prediction, seismic modeling and inversion, and fault detection. All of the projects included some kind of machine learning, and again there was diversity there, including several deep learning applications. Neural networks are back!

Feel the buzz!

If you are curious, Gram and I recorded a quick podcast and interviewed a few of the teams:

It's going to take a few days to decompress and come down from the high. In a couple of weeks I'll tell you more about the projects themselves, and we'll edit the photos and post the best ones to Flickr (and in the meantime there are a few more pics there already). 

Thank you to the sponsors!

Last thing: we couldn't have done any of this without the support of Dell EMC. David Holmes has been a rock for the hackathon project over the last couple of years, and we appreciate his love of community and code! Thank you too to Duncan and Jane at Teradata, Francois at NVIDIA, Peter and Jon at Amazon AWS, and Gram at Sandstone for all your support. Dear reader: please support these organizations!


Looking forward to EAGE

Evan, Diego and I are flying to Paris today for the EAGE Conference and Exhibition. It's exciting. We're excited. 

But the excitement starts before the conference. The Subsurface Hackathon is this weekend!

My diary

Even the hackathon excitement starts before the weekend, because tomorrow, Friday, we're running the hacker's bootcamp — a sort of short course appetizer for the hackathon. We have about 25 geoscientists coming to the Booster TOTAL (an event space at TOTAL's La Défense offices) to get some hands-on practice with Python and the latest in machine learning tools. It's especially exciting because we'll also have engineers from NVIDIA on hand to help with the coaching. The idea is to help people hit the ground running when the hackathon starts on Saturday.

After that, on Saturday and Sunday,  it's the hackathon itself. We have no fewer than 60 geoscientists and engineers registered for this breakout event. They're coming to the Booster to work on a wide array of machine learning ideas for the subsurface. It's going to be epic. You can read all about what happens next week, I promise. 

Then on Monday it's the Data Science for Geoscience workshop, at which I'm giving a keynote. Since I'm far from possessing expertise, I'm using it as a chance to get people jazzed about helping make the coming AI revolution in geoscience a positive experience. I'm really looking forward to it.

The conference itself starts on Tuesday. In the afternoon I'm co-chairing a session on machine learning (have you spotted the theme yet?) in seismic interpretation, along with Victor Aare of Schlumberger. It will be awesome to see what kind of progress our community is making in this field — it's fun to imagine what seismic interpretation might be like in a few years. There are so many fascinating problems to work on! Here are the talks in that session:

On Wednesday we'll be taking in some more talks and posters, then in the afternoon I'm reprising my keynote talk at IFPEN, a subsurface research institute in the Bois de Boulogne. I've never been there before, although I have met a few IFP scientists before. I'm looking forward to it very much. 

It all ends for us on Thursday. Evan and Diego fly home and I'm off to Cambridge (the old one in the fens, not the one in Massachusetts) for a few days with family (and bookshops). Until then, expect much blogging!


Going to EAGE?

If you're reading this and would like to meet up with us at Agile or some of the Software Underground crowd — the friendliest bunch of coding geoscientists you could hope for — let's plan to meet at the end of the workshop, at the workshop location. Look for the Software Underground shirts.

News and updates and a sandwich

Plans for the hackathon in Paris in June are well underway. We now have two major sponsors: Dell EMC and now Total E&P too will be supporting the event with generous funding. Bolstered by this, I've set a goal of getting 50 participants in the event. Imagine that!

If you would like to help us reach this goal, please consider printing out some of these posters (right) and putting them up in your place of work or study >> hi-res PDF << It should even be readable in black & white, if that's your only option.

You can find links to everything you need to know about the event at agilescientific.com/paris.

Le grand sandwich délicieux

The hackathon is really just the filling in a delicious Parisian sandwich of geocomputing goodness. The bread at the bottom is the Hacker Bootcamp on 9 June. The filling is the hackathon weekend... and the final piece is the EAGE workshop on machine learning. Convened by geoscientists at Total and IFP, it should be a great day of knowledge sharing and discussion. I can't wait.

11 days to go!

There are only 11 days left to take part in the SEG Machine Learning contest, in which you are challenged to predict lithologies in two wells, given some wireline logs and lithologies in several other nearby wells. Everything you need to get started, even if you've never tried anything like this before, is right here. See Brendon Hall's TLE article for more deets.

The radio show for geo-nerds

Undersampled Radio is still going strong. We just recorded episode 32 today. Last week's chat with Prof Chris Jackson (Imperial College London) — who's embarking on a GSA lecture tour this year — was a real cracker, check it out:

The other thing you need to know about Chris is that he's started writing his blog again. It's awesome, of course, and you should probably just go and read it now...