FORCE ML Hackathon: project round-up

The FORCE Machine Learning Hackathon last week generated hundreds of new relationships and nine new projects, including seven new open source tools. Here’s the full run-down, in no particular order…

Predicting well rates in real time

Team Virtual Flow Metering: Nils Barlaug, Trygve Karper, Stian Laagstad, Erlend Vollset (all from Cognite) and Emil Hansen (AkerBP).

Tech: Cognite Data Platform, scikit-learn. GitHub repo.

Project: An engineer from AkerBP brought a problem: testing the rate from a well reduces the pressure and therefore reduces the production rate for a short time, costing about $10k per day. His team investigated whether they could instead predict the rate from other known variables, thereby reducing the number of expensive tests.

This project won the Most Commercial Potential award.

The predicted flow rate (blue) compared to the true flow rate (orange). The team used various models, from multilinear regression to boosted trees.

Reinforcement learning tackles interpretation

Team Gully Attack: Steve Purves, Eirik Larsen, JB Bonas (all Earth Analytics), Aina Bugge (Kalkulo), Thormod Myrvang (NTNU), Peder Aursand (AkerBP).

Tech: keras-rl. GitHub repo.

Project: Deep reinforcement learning has proven adept at learning, and winning, games, and at other tasks including image segmentation. The team tried training an agent to pick these channels in the Parihaka 3D, as well as some other automatic interpretation approaches.

The agent learned something, but in the end it did not prevail. The team learned lots, and did prevail!

This project won the Most Creative Idea award.

Early in training, the learning agent wanders around the image (top left). After an hour of training, the agent tends to stick to the gullies (right).

A new kind of AVO crossplot?

Team ASAP: Per Avseth (Dig), Lucy MacGregor (Rock Solid Images), Lukas Mosser (Imperial), Sandeep Shelke (Emerson), Anders Draege (Equinor), Jostein Heredsvela (DEA), Alessandro Amato del Monte (ENI).

Tech: t-SNE, UMAP, VAE. GitHub repo.

Project: If you were trying to come up with a new approach to AVO analysis, these are the scientists you’d look for. The idea was to reduce the dimensionality of the input traces — using first t-SNE and UMAP then a VAE. This resulted in a new 2-space in which interesting clusters could be probed, chiefly by processing synthetics with known variations (e.g. in thickness or porosity).

This project won the Best In Show award. Look out for the developments that come from this work!

Top: Illustration of the variational autoencoder, which reduces the input data (top left) into some abstract representation — a crossplot, essentially (top middle) — and can also reconstruct the data, but without the features that did not discriminate between the datasets, effectively reducing noise (top right).

The lower image shows the interpreted crossplot (left) and the implied distribution of rock properties (right).

Acquiring seismic with crayons

Team: Jesper Dramsch (Technical University of Denmark), Thilo Wrona (University of Bergen), Victor Aare (Schlumberger), Arno Lettman (DEA), Alf Veland (NPD).

Tech: pix2pix GAN (TensorFlow). GitHub repo.

Project: Not everything tht looks like a toy is a toy. The team spent a few hours drawing cartoons of small seismic sections, then re-trained the pix2pix GAN on them. The result — an app (try it!) that turns sketches into seismic!

This project won the People’s Choice award.

A sketch of a salt diapir penetrating geological layers (left) and the inferred seismic expression, generated by the neural network. In principal, the model could also be trained to work in the other direction.

A sketch of a salt diapir penetrating geological layers (left) and the inferred seismic expression, generated by the neural network. In principal, the model could also be trained to work in the other direction.

Extracting show depths and confidence from PDFs

Team: Florian Basier (Emerson), Jesse Lord (Kadme), Chris Olsen (ConocoPhillips), Anne Estoppey (student), Kaouther Hadji (Accenture).

Project: A couple of decades ago, the last great digital revolution gave us PDFs. Lots of PDFs. But these pseudodigital documents still need to be wrangled into Proper Data. This team took on that project, trying in particular to extract both the depth of a show, and the confidence in its identification, from well reports.

This project won the Best Presentation award.

Kaouther Hadji (left), Florian Basier, Jesse Lord, and Anne Estoppey (right).

Kaouther Hadji (left), Florian Basier, Jesse Lord, and Anne Estoppey (right).

Grain size and structure from core images

Team: Eirik Time, Xiaopeng Liao, Fahad Dilib (all Equinor), Nathan Jones (California Resource Corp), Steve Braun (ExxonMobil), Silje Moeller (Cegal).

Tech: sklearn, skimage, GitHub repo.

Project: One of the many teams composed of professionals from all over the industry — it’s amazing to see this kind of collaboration. The team did a great job of breaking the problem down, going after what they could and getting some decent results. An epic task, but so many interesting avenues — we need more teams on these problems!

The pipeline was as ambitious as it looks. But this is a hard problem that will take some time to get good at. Kudos to this team for starting to dig into it and for making amazing progress in just 2 days.

Learning geological age from bugs

Team: David Wade (Equinor), Per Olav Svendsen (Equinor), Bjoern Harald Fotland (Schlumberger), Tore Aadland (University of Bergen), Christopher Rege (Cegal).

Tech: scikit-learn (random forest). GitHub repo.

Project: The team used DEX files from five wells from the recently released Volve dataset from Equinor. The goal was to learn to predict geological age from biostratigraphic species counts. They made substantial progress — and highlighted what a great resource Volve will be as the community explores it and publishes results like these.

David Wade and Per Olav Svendsen of Equinor (top), and some results (bottom)

Lost in 4D space!

Team: Andres Hatloey, Doug Hakkarinen, Mike Brhlik (all ConocoPhillips), Espen Knudsen, Raul Kist, Robin Chalmers (all Cegal), Einar Kjos (AkerBP).

Tech: scikit-learn (random forest regressor). GitHub repo.

Project: Another cross-industry collaboration. In their own words, the team set out to “identify trends between 4D seismic and well measurements in order to calculate reservoir pressures and/or thickness between well control”. They were motivated by real data from Valhall, and did a great job making sense of a lot of real-world data. One nice innovation: using the seismic quality as a weighting factor to try to understand the role of uncertainty. See the team’s presentation.


Clustering reveals patterns in 4D maps

Team: Tetyana Kholodna, Simon Stavland, Nithya Mohan, Saktipada Maity, Jone Kristoffersen Bakkevig (all CapGemini), Reidar Devold Midtun (ConocoPhillips).

Project: The team worked on real 4D data from an operating field. Reidar provided a lot of maps computed with multiple seismic attributes. Groups of maps represent different reservoir layers, and thirteen different time-lapse acquisitions. So… a lot of maps. The team attempted to correlate 4D effects across all of these dimensions — attributes, layers, and production time. Reidar, the only geoscientist on a team of data scientists, also provided one of the quotes of the hackathon: “I’m the geophysicist, and I represent the problem”.


That’s it for the FORCE Hackathon for 2018. I daresay there may be more in the coming months and years. If they can build on what we started last week, I think more remarkable things are on the way!


One more thing…

I mentioned the UK hackathons last time, but I went and forgot to include the links to the events. So here they are again, in case you couldn’t find them online…

What are you waiting for? Get signed up and tell your friends!

Machine learning goes mainstream

At our first machine-learning-themed hackathon, in New Orleans in 2015, we had fifteen hackers. TImes were hard in the industry. Few were willing or able to compe out and play. Well, it’s now clear that times have changed! After two epic ML hacks last year (in Paris and Houston), at which we hosted about 115 scientists, it’s clear this year is continuing the trend. Indeed, by the end of 2018 we expect to have welcomed at least 240 more digital scientists to hackathons in the US and Europe.

Conclusion: something remarkable is happening in our field.

The FORCE hackathon

Last Tuesday and Wednesday, Agile co-organized the FORCE Machine Learning Hackathon in Stavanger, Norway. FORCE is a cross-industry geoscience organization, coordinating meetings and research in subsurface. The event preceeded a 1-day symposium on the same theme: machine learning in geoscience. And it was spectacular.

Get a flavour of the spectacularness in Alessandro Amato’s beautiful photographs:

Fifty geoscientists and engineers spent two days at the Norwegian Petroleum Directorate (NPD) in Stavanger. Our hosts were welcoming, accommodating, and generous with the waffles. As usual, we gently nudged the participants into teams, and encouraged them to define projects and find data to work on. It always amazes me how smoothly this potentially daunting task goes; I think this says something about the purposefulness and resourcefulness of our community.

Here’s a quick run-down of the projects:

  • Biostrat! Geological ages from species counts.

  • Lost in 4D Space. Pressure drawdown prediction.

  • Virtual Metering. Predicting wellhead pressure in real time.

  • 300 Wells. Extracting shows and uncertainty from well reports.

  • AVO ML. Unsupervised machine learning for more geological AVO.

  • Core Images. Grain size and lithology from core photos.

  • 4D Layers. Classification engine for 4D seismic data.

  • Gully Attack. Strat trap picking with deep reinforcement learning.

  • sketch2seis. Turning geological cartoons into seismic with pix2pix.

I will do a complete review of the projects in the coming few days, but notice the diversity here. Five of the projects straddle geological topics, and five are geophysical. Two or three involve petroleum engineering issues, while two or three move into sed/strat. We saw natural language processing. We saw random forests. We saw GANs, VAEs, and deep reinforcement learning. In terms of input data, we saw core photos, PDF reports, synthetic seismograms, real-time production data, and hastily assembled label sets. In short — we saw everything.

Takk skal du ha

Many thanks to everyone that helped the event come together:

  • Peter Bormann, the mastermind behind the symposium, was instrumental in making the hackathon happen.

  • Grete Block Vargle (AkerBP) and Pernille Hammernes (Equinor) kept everyone organized and inspired.

  • Tone Helene Mydland (NPD) and Soelvi Amundrud (NPD) made sure everything was logistically honed.

  • Eva Halland (NPD) supported the event throughout and helped with the judging.

  • Alessandro Amato del Monte (Eni) took some fantastic photos — as seen in this post.

  • Diego Castaneda and Rob Leckenby helped me on the Agile side of things, and helped several teams.

And a huge thank you to the sponsors of the event — too many to name, but here they all are:


There’s more to come!

If you’re reading this thinking, “I’d love to go to a geoscience hackathon”, and you happen to live in or near the UK, you’re in luck! There are two machine learning geoscience hackathons coming up this fall:

Don’t miss out! Get signed up and we’ll see you there.

What should national data repositories do?

Right now there's a conference happening in Stavanger, Norway: National Data Repository 2017. My friend David Holmes of Dell EMC, a long time supporter of Agile's recent hackathons and general geocomputing infrastructure superhero, is there. He's giving a talk, I think, and chairing at least one session. He asked a question today on Software Underground:

If anyone has any thoughts or ideas as to what the regulators should be doing differently now is a good time to speak up :)

My response

For me it's about raising their aspirations. Collectively, they are sitting on one of the most valuable — or invaluable — datasets in the world, comparable to Hubble, or the LHC. Better yet, the data are (in most cases) already open and they actually want to share it. And the community (us) is better tooled than ever, and perhaps also more motivated, to get cracking. So the possibility is there to see a revolution in subsurface science and exploration (in the broadest sense of the word) and my challenge to them is:

Can they now create the conditions for this revolution in earth science?

Some things I think they can do right now:

  • Properly fund the development of an open data platform. I'll expand on this topic below.
  • Don't get too twisted off on formats (go primitive), platforms (pick one), licenses (go generic), and other busy work that committees love to fret over. Articulate some principles (e.g. public first, open source, small footprint, no lock-in, componentize, no single provider, let-users-choose, or what have you), and stay agile. 
  • Lobby NOCs and IOCs hard to embrace integrated and high-quality open data as an advantage that society, as well as industry, can share in. It's an important piece in the challenge we face to modernize the industry. Not so that it can survive for survival's sake, but so that it can serve society for as long as it's needed. 
  • Get involved in the community: open up their processes and collaborate a lot more with the technical societies — like show up and talk about their programs. (How did I not hear about the CDA's unstructured data challenge — a subject I'm very much into — till it was over? How many other potential participants just didn't know about it?)

An open data platform

The key piece here is the open data platform. Here are the features I'd like to see of such a platform:

  • Optimized for users, not the data provider, hosting provider, or system administrator.
  • Clear rights: well-known, documented, obvious, clearly expressed open licenses for re-use.
  • Meaningful levels of access that are free of charge for most users and most use cases.
  • Access for humans (a nice mappy web interface) with no awkward or slow registration processes.
  • Access for machines (a nice API, perhaps even a couple of libraries expressing it).
  • Tools for query, discovery, and retrieval; ideally with user feedback paths ('more like this, less like that').
  • Ways to report, or even fix, problems in the data. This relieves you of "the data's not ready" procrastination.
  • Good documentation of all of this, ideally in a wiki or something that people can improve.
  • Support for a community of users and developers that want to do things with the data.

Building this platform is not trivial. There is massive file storage, database back end, web front end, licensing, and so on. Then there's the community of developers and users to engage and support. It will take years, and never be finished. It sounds hard... but people are doing it. Prototypes for seismic data exist already, and there are countless models in other verticals (just check out the Registry of Research Data Repositories, or look at the list on PLOS). 

The contract to build data infrostructure is often awarded to the likes of Schlumberger, Halliburton or CGG. In theory, these companies have the engineering depth to pull them off (though this too is debatable, especially in today's web-first, native-never world). But they completely lack the culture required: there's no corporate understanding of what 'open' means. So the model is broken in subtle but fatal ways and the whole experiment fails. 

I'm excited to hear what comes out of this conference. If you're there, please tell!