July 31, 2018

Results from the AAPG Machine Learning Unsession

July 31, 2018/ Matt Hall

Back in May, I co-hosted a different kind of conference session — an 'unsession' — at the AAPG Annual Conference and Exhibition in Salt Lake City, Utah. It was successful in achieving its main goal, which was to show the geoscience community and AAPG organizers a new way of collaborating, networking, and producing tangible outcomes from conference sessions.

It also succeeded in drawing out hundreds of ideas and questions around machine learning in geoscience. We have now combed over what the 120 people (roughly) produced on that afternoon, written it up in a Google Doc (right), and present some highlights right here in this post.

The unsession had three phases:

Exploring current and future skills for geoscientists.
Asking about the big questions in machine learning in geoscience.
Digging into some of those questions.

Let's look at each one in turn.

Current and future skills

As an icebreaker, we asked everyone to list three skills they have that set them apart from others in their teams or organizations — their superpowers, if you will. They wrote these on green Post-It notes. We also asked for three more skills they didn't have today, but wanted to acquire in the next decade or so. These went on orange Post-Its. We were especially interested in those skills that felt intimidating or urgent. The 8 or 10 people at each table then shared these with each other, by way of introducing themselves.

The skills are listed in this Google Sheets document.

Unsurprisingly, the most common 'skills I have' were around geoscience: seismic interpretation, seismic analysis, stratigraphy, engineering, modeling, sedimentology, petrophysics, and programming. And computational methods dominated the 'skills I want' category: machine learning, Python, coding or programming, deep learning, statistics, and mathematics.

We followed this up with a more general question — How would you rate the industry's preparedness for this picture of the future, as implied by the skill gap we've identified?. People could substitute 'industry' for whatever similar scale institution felt meaningful to them. As shown (right), this resulted in a bimodal distribution: apparently there are two ways to think about the future of applied geoscience — this may merit more investigation with a more thorough survey.

Get the histogram data.

Big questions in ML

After the icebreaker, we asked the tables to respond to a big question:

What are the most pressing questions in applied geoscience that can probably be tackled with machine learning?

We realized that this sounds a bit 'hammer looking for a nail', but justified asking the question this way by drawing an anology with other important new tools of the past — well logging, or 3D seismic, or sequence stratigrapghy. The point is that we have this powerful new (to us) set of tools; what are we going to look at first? At this point, we wanted people to brainstorm, without applying constraints like time or money.

This yielded approximately 280 ideas, all documented in the Google Sheet. Once the problems had been captured, the tables rotated so that each team walked to a neighboring table, leaving all their problems behind... and adopting new ones. We then asked them to score the new problems on two axes: scope (local vs global problems) and tractability (easy vs hard problems). This provided the basis for each table to choose one problem to take to the room for voting (each person had 9 votes to cast). This filtering process resulted in the following list:

How do we communicate error and uncertainty when using machine learning models and solutions? 85 votes.
How do we account for data integration, integrity, and provenance in our models? 78 votes.
How do we revamp the geoscience curriculum for future geoscientists? 71 votes.
What does guided, searchable, legacy data integration look like? 68 votes.
How can machine learning improve seismic data quality, or provide assistive technology on poor data? 65 votes.
How does the interpretability of machine learning model predictions affect their acceptance? 54 votes.
How do we train a model to assign value to prospects? 51 votes.
How do we teach artificial intelligences foundational geology? 45 votes.
How can we implement automatic core description? 42 votes.
How can we contain bad uses of AI? 40 votes.
Is self-steering well drilling possible? 21 votes.

I am paraphrasing most of those, but you can read the originals in the Google Sheet data harvest.

Exploring the questions

In the final stage of the afternoon, we took the top 6 questions from the list above, and dug into them a little deeper. Tables picked their way through our Solution Sketchpads — especially updated for machine learning problems — to help them navigate the problems. Clearly, these questions were too enormous to make much progress in the hour or so left in the day, but the point here was to sound out some ideas, identify some possible actions, and connect with others interested in working on the problem.

One of the solution sketches is shown here (right), for the Revamp the geoscience curriculum problem. They discussed the problem animatedly for an hour.

This team included — among others — an academic geostatistician, an industry geostatistician, a PhD student, a DOE geophysicist, an SEC geologist, and a young machine learning brainbox. Amazingly, this kind of diversity was typical of the tables.

See the rest of the solution sketches in Flickr.

That's it! Many thanks to Evan Bianco for the labour of capturing and digitizing the data from the event. Thanks also to AAPG for the great photos, and for granting them an open license. And thank you to my co-chairs Brendon Hall and Yan Zaretskiy of Enthought, and all the other folks who helped make the event happen — see the Productive chaos post for details.

To dig deeper, look for the complete write up in Google Docs, and the photos in Flickr.

Just a reminder... if it's Python and machine learning skills you want, we're running a Summer School in downtown Houston the week of 13 August. Come along and get your hands on the latest in geocomputing methods. Suitable for beginners or intermediate programmers.

Don't miss out! Find out more or register now.

July 27, 2018

Visualization in Copenhagen, part 2

July 27, 2018/ Matt Hall

In Part 1, I wrote about six of the projects teams contributed at the Subsurface Hackathon in Copenhagen in June. Today I want to tell you about the rest of them.

A data exploration tool

Team GeoClusterFu...n: Dan Stanton (University of Leeds), Filippo Broggini (ETH Zürich), Francois Bonneau (Nancy), Danny Javier Tapiero Luna (Equinor), Sabyasachi Dash (Cairn India), Nnanna Ijioma (geophysicist).

Tech: Plotly Dash. GitHub repo.

Project: The team set out to build an interactive web app — a totally new thing for all of them — to make interactive plots from data in a CSV. They ended up with the basis of a useful tool for exploring geoscience data. Project page.

Four sixths of the GeoClusterFu...n team cluster around a laptop.

AR outcrop on your phone

Team SmARt_OGs: Brian Burnham (University of Aberdeen), Tala Maria Aabø (Natural History Museum of Denmark), Björn Wieczoreck, Georg Semmler and Johannes Camin (GiGa Infosystems).

Tech: ARKit/ARCore, WebAR, Firebase. GitLab repo.

Project: Bjørn and his colleagues from GiGa Infosystems have been at all the European hackathons. This time, he knew he wanted to get virtual outcrops on mobiles phones. He found a willing team, and they got it done! Project page.

Three views from the SmartOGs's video. See the full version.

Rock clusters in latent space

The Embedders: Lukas Mosser (Imperial College London), Jesper Dramsch (Technical University of Denmark), Ben Fischer (PricewaterhouseCoopers), Harry McHugh (DUG), Shubhodip Konar (Cairn India), Song Hou (CGG), Peter Bormann (ConocoPhillips).

Tech: Bokeh, scikit-learn, Multicore-TSNE. GitHub repo.

Project: There has been a lot of recent interest in the t-SNE algorithm as a way to reduce the dimensionality of complex data. The team explored its application to subsurface data, and found promising applications. Web page. Project page.

The Embeders built a web app to cluster the data in an LAS file. The clusters (top left) are generated by the t-SNE algorithm.

Fully mixed reality

Team Hands On GeoLabs: Will Sanger (Western Geco), Chance Sanger (Houston Museum of Fine Arts), Pierre Goutorbe (Total), Fernando Villanueva (Institut de Physique du Globe de Paris).

Project: Starting with the ambitious goal of combining the mixed reality of the Meta AR gear with the mixed reality of the Gempy sandbox, the team managed to display and interact with some seismic data in the AR headset, which allows interaction with simple hand gestures. Project page.

The team demonstrate the Meta AR headset.

Huge grids over the web

Team Grid Vizards: Fabian Kampe, Daniel Buse, Jonas Kopcsek, Paul Gabriel (all from GiGa Infosystems)

Tech: three.js. GitHub repo.

Project: Paul and his team wanted to visualize hundreds of millions or billions of grid cells — all in the browser. They ended up with about 20 million points working very smoothly, and impressed everyone. Project page.

Interpreting RGB displays for spec decomp

Team: Florian Smit (Technical University of Denmark), Gijs Straathof (SGS), Thomas Gazzola (Total), Julien Capgras (Total), Steve Purves (Euclidity), Tom Sandison (Shell)

Tech: Python, react.js. GitHub repos: Client. Backend.

Project: Spectral decomposition is still a mostly quantitative tool, especially the interpretation of RGB-blended displays. This team set out to make intuitive, attractive forward models of the spectral response of wells. This should help interpret seismic data, and perhaps make more useful RGB displays too. Intriguing and promising work. Project page.

That's it for another year! Twelve new geoscience visualization projects — ten of them open source. And another fun, creative weekend for 63 geoscientists — all of whom left with new connections and new skills. All this compressed into one weekend. If you haven't experienced a hackathon yet, I urge you to seek one out.

I will leave you with two videos — and an apology. We are so focused on creating a memorable experience for everyone in the room, that we tend to neglect the importance of capturing what's happening. Early hackathons only had the resulting blog post as the document of record, but lately we've been trying to livestream the demos at the end. Our success has been, er, mixed... but they were especially wonky this time because we didn't have livestream maestro Gram Ganssle there. So, these videos exist, and are part of the documentation of the event, but they barely begin to convey the awesomeness of the individuals, the teams, or their projects. Enjoy them, but next time — you should be there!

July 25, 2018

Visualization in Copenhagen, part 1

July 25, 2018/ Matt Hall

It's finally here! The round-up of projects from the Subsurface Hacakthon in Copenhagen last month. This is the first of two posts presenting the teams and their efforts, in the same random order the teams presented them at the end of the event.

Subsurface data meets Pokemon Go

Team Geo Go: Karine Schmidt, Max Gribner, Hans Sturm (all from Wintershall), Stine Lærke Andersen (University of Copenhagen), Ole Johan Hornenes (University of Bergen), Per Fjellheim (Emerson), Arne Kjetil Andersen (Emerson), Keith Armstrong (Dell EMC).

Project: With Pokemon Go as inspiration, the team set out to prototype a geoscience visualization app that placed interactive subsurface data elements into a realistic 3D environment.

Visualizing blind spots in data

Team Blind Spots: Jo Bagguley (UK Oil & Gas Authority), Duncan Irving (Teradata), Laura Froelich (Teradata), Christian Hirsch (Aalborg University), Sean Walker (Campbell & Walker Geophysics).

Tech: Flask, Bokeh, AWS for hosting app. GitHub repo.

Project: Data management always comes up as an issue in conversations about geocomputing, but few are bold enough to tackle it head on. This team built components for checking the integrity of large amounts of raw data, before passing it to data science projects. Project page.

Sean, Laura, and Christian. Jo and Duncan were out doing research. Note the kanban board in the background — agile all the way!

Volume uncertainties visualization

Team Fortuna: Natalia Shchukina (Total), Behrooz Bashokooh (Shell), Tobias Staal (University of Tasmania), Robert Leckenby (now Agile!), Graham Brew (Dynamic Graphics), Marco van Veen (RWTH Aachen).

Tech: Flask, Bokeh, Altair, Holoviews. GitHub repo.

Project: Natalia brought some data with her: lots of surface grids. The team built a web app to compute uncertainty sections and maps, then display them dynamically and interactively — eliciting audible gasps from the room. Project page.

The Fortuna app: Probability of being the the zone (left) and entropy (right). Cross-sections are shown at the top, maps on the bottom.

Differences and similarities with RGB blends

Team RGBlend: Melanie Plainchault and Jonathan Gallon (Total), Per Olav Svendsen, Jørgen Kvalsvik and Max Schuberth (Equinor).

Tech: Python, Bokeh. GitHub repo.

Project: One of the more intriguing ideas of the hackathon was not just so much a fancy visualization technique, as a novel way of producing a visualization — differencing 3 images and visualizing the differences in RGB space. It reminded me of an old blog post about the spot the difference game. Project page.

The differences (lower right) between three time-lapse seismic amplitude maps. — The differences (lower right) between three time-lapse seismic amplitude maps.

Augmented reality geological maps

Team AR Sandbox: Simon Virgo (RWTH Aachen), Miguel de la Varga (RWTH Aachen), Fabian Antonio Stamm (RWTH Aachen), Alexander Schaaf (University of Aberdeen).

Tech: Gempy. GitHub repo.

Project: I don't have favourite projects, but if I did, this would be it. The GemPy group had already built their sandbox when they arrived, but they extended it during the hackathon. Wonderful stuff. Project page.

magic box of sand: Sculpting a landscape (left), and the projected map (right). You can't even imagine how much fun it was to play with. — magic box of sand: Sculpting a landscape (left), and the projected map (right). You can't even imagine how much fun it was to play with.

Augmented reality seismic wavefields

Team Sandbox Seismics: Yuriy Ivanov (NTNU Trondheim), Ana Lim (NTNU Trondheim), Anton Kühl (University of Copenhagen), Jean Philippe Montel (Total).

Tech: GemPy, Devito. GitHub repo.

Project: This team worked closely with Team AR Sandbox, but took it in a different direction. They instead read the velocity from the surface of the sand, then used devito to simulate a seismic wavefield propagating across the model, and projected that wavefield onto the sand. See it in action in my recent Code Show post. Project page.

Yuriy Ivanov demoing the seismic wavefield moving across the sandbox.

Pretty cool, right? As usual, all of these projects were built during the hackathon weekend, almost exclusively by teams that formed spontaneously at the event itself (I think one team was self-contained from the start). If you didn't notice the affiliations of the participants — go back and check them out; I think this might have been an unprecedented level of collaboration!

Next time we'll look at the other six projects. [UPDATE: Next post is here.]

Before you go, check out this awesome video Wintershall made about the event. A massive thank you to them for supporting the event and for recording this beautiful footage — and for agreeing to share it under a CC-BY license. Amazing stuff!

July 19, 2018

Lots of news!

July 19, 2018/ Matt Hall

I can't believe it's been a month since my last post! But I've now recovered from the craziness of the spring — with its two hackathons, two conferences, two new experiments, as well as the usual courses and client projects — and am ready to start getting back to normal. My goal with this post is to tell you all the exciting stuff that's happened in the last few weeks.

Meet our newest team member

There's a new Agilist! Robert Leckenby is a British–Swiss geologist with technology tendencies. Rob has a PhD in Dynamic characterisation and fluid flow modelling of fractured reservoirs, and has worked in various geoscience roles in large and small oil & gas companies. We're stoked to have him in the team!

Rob lives near Geneva, Switzerland, and speaks French and several other human languages, as well as Python and JavaScript. He'll be helping us develop and teach our famous Geocomputing course, among other things. Reach him at robert@agilescientific.com.

Geocomputing Summer School

We have trained over 120 geoscientists in Python so far this year, but most of our training is in private classes. We wanted to fix that, and offer the Geocomputing class back for anyone to take. Well, anyone in the Houston area :) It's called Summer School, it's happening the week of 13 August, and it's a 5-day crash course in scientific Python and the rudiments of machine learning. It's designed to get you a long way up the learning curve. Read more and enroll.

A new kind of event

We have several more events happening this year, including hackathons in Norway and in the UK. But the event in Anaheim, right before the SEG Annual Meeting, is going to be a bit different. Instead of the usual Geophysics Hackathon, we're going to try a sprint around open source projects in geophysics. The event is called the Open Geophysics Sprint, and you can find out more here on events.agilescientific.com.

That site — events.agilescientific.com — is our new events portal, and our attempt to stay on top of the community events we are running. Soon, you'll be able to sign up for events on there too (right now, most of them are still handled through Eventbrite), but for now it's at least a place to see everything that's going on. Thanks to Diego for putting it together!

June 20, 2018

Code Show version 1.0

June 20, 2018/ Matt Hall

Last week we released Code Show version 1.0. In a new experiment, we teamed up with Total and the European Association of Geoscientists and Engineers at the EAGE Annual Conference and Exhibition in Copenhagen. Our goal was to bring a little of the hackathon to as many conference delegates as possible. We succeeded in reaching a few hundred people over the three days, making a lot of new friends in the process. See the action in this Twitter Moment.

What was on the menu?

The augmented reality sandbox that Simon Virgo and his colleagues brought from the University of Aachen. The sandbox displayed both a geological map generated by the GemPy 3D implicit geological modeling tool, as well as a seismic wavefield animation generated by the Devito modeling and inversion project. Thanks to Yuriy Ivanov (NTNU) and others in his hackathon team for contributing the seismic modeling component.

SandboxSeismics in action at the #EAGEAnnual2018 with @endo_simon @EvanBianco @agilegeo #GemPy pic.twitter.com/f1Lr8fNfaP
— Yuriy Ivanov (@veryYuri) June 14, 2018

Demos from the Subsurface Hackathon. We were fortunate to have lots of hackathon participants make time for the Code Show. Graham Brew presented the uncertainty visualizer his team built; Jesper Dramsch and Lukas Mosser showed off their t-SNE experiments; Florian Smit and Steve Purves demoed their RGB explorations; and Paul Gabriel shared the GiGa Infosystems projects in AR and 3D web visualization. Many thanks to those folks and their teams.

AR and VR demos by the Total team. Dell EMC provided HTC Vive and Meta 2 kits, with Dell Precision workstations, for people to try. They were a lot of fun, provoking several cries of disbelief and causing at least one person to collapse in a heap on the floor.

Python demos by the Agile team. Dell EMC also kindly provided lots more Dell Precision workstations for general use. We hooked up some BBC micro :bit microcontrollers, Microsoft Azure IoT DevKits, and other bits and bobs, and showed anyone who would listen what you can do with a few lines of Python. Thank you to Carlos da Costa (University of Edinburgh) for helping out!

Tech demos by engineers from Intel and INT. Both companies are very active in visualization research and generously spent time showing visitors their technology.

v 2.0 next year... maybe?

The booth experience was new to us. Quite a few people came to find us, so it was nice to have a base, rather than cruising around as we usually do. I'd been hoping to get more people set up with Python on their own machines, but this may be too in-depth for most people in a trade show setting. Most were happy to see some new things and maybe tap out some Python on a keyboard.

Overall, I'd call it a successful experiment. If we do it next year in London, we have a very good idea of how to shape an even more engaging experience. I think most visitors enjoyed themselves this year though; If you were one of them, we'd love to hear from you!

June 17, 2018

Big open data... or is it?

June 17, 2018/ Matt Hall

Huge news for data scientists and educators. Equinor, the company formerly known as Statoil, has taken a bold step into the open data arena. On Thursday last week, it 'disclosed' all of its subsurface and production data for the Volve oil field, located in the North Sea.

What's in the data package?

A lot! The 40,000-file package contains 5TB of data, that's 5,000GB!

This collection is substantially larger, both deeper and broader, than any other open subsurface dataset I know of. Most excitingly, Equinor has released a broad range of data types, from reports to reservoir models: 3D and 4D seismic, well logs and real-time drilling records, and everything in between. The only slight problem is that the seismic data are bundled in very large files at the moment; we've asked for them to be split up.

Questions about usage rights

Regular readers of this blog will know that I like open data. One of the cornerstones of open data is access, and there's no doubt that Equinor have done something incredible here. It would be preferable not to have to register at all, but free access to this dataset — which I'm guessing cost more than USD500 million to acquire — is an absolutely amazing gift to the subsurface community.

Another cornerstone is the right to use the data for any purpose. This involves the owner granting certain privileges, such as the right to redistribute the data (say, for a class exercise) or to share derived products (say, in a paper). I'm almost certain that Equinor intends the data to be used this way, but I can't find anything actually granting those rights. Unfortunately, if they aren't explicitly granted, the only safe assumption is that you cannot share or adapt the data.

For reference, here's the language in the CC-BY 4.0 licence:

Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:

reproduce and Share the Licensed Material, in whole or in part; and
produce, reproduce, and Share Adapted Material.

You can dig further into the requirements for open data in the Open Data Handbook.

The last thing we need is yet another industry dataset with unclear terms, so I hope Equinor attaches a clear licence to this dataset soon. Or, better still, just uses a well-known licence such as CC-BY (this is what I'd recommend). This will clear up the matter and we can get on with making the most of this amazing resource.

More about Volve

The Volve field was discovered in 1993, but not developed until 15 years later. It produced oil and gas for 8.5 years, starting on 12 February 2008 and ending on 17 September 2016, though about half of that came in the first 2 years (see below). The facility was the Maersk Inspirer jack-up rig, standing in 80 m of water, with an oil storage vessel in attendance. Gas was piped to Sleipner A. In all, the field produced 10 million Sm³ (63 million barrels) of oil, so is small by most standards, with a peak rate of 56,000 barrels per day.

Volve production over time in standard m³ (i.e. at 20°C). Multiply by 6.29 for barrels.

The production was from the Jurassic Hugin Formation, a shallow-marine sandstone with good reservoir properties, at a depth of about 3000 m. The top reservoir depth map from the discovery report in the data package is shown here. (I joined Statoil in 1997, not long after this report was written, and the sight of this page brings back a lot of memories.)

The top reservoir depth map from the discovery report. The Volve field (my label) is the small closure directly north of Sleipner East, with 15/9-19 well on it.

Get the data

To explore the dataset, you must register in the 'data village', which Equinor has committed to maintaining for 2 years. It only takes a moment. You can get to it through this link.

Let us know in the comments what you think of this move, and do share what you get up to with the data!

June 13, 2018

Visualize this!

June 13, 2018/ Matt Hall

The Copenhagen edition of the Subsurface Hackathon is over! For three days during the warmest June in Denmark for over 100 years, 63 geoscientists and programmers cooked up hot code in the Rainmaking Loft, one of the coolest, and warmest, coworking spaces you've ever seen. As always, every one of the participants brought their A game, and the weekend flew by in a blur of creativity, coffee, and collaboration. And croissants.

Pierre enjoying the Meta AR headset that DEll EMC provided.

Our sponsors have always been unusually helpful and inspiring, pushing us to get more audacious, but this year they were exceptionally engaged and proactive. Dell EMC, in the form of David and Keith, provided some fantastic tech for the teams to explore; Total supported Agile throughout the organization phase, and Wintershall kindly arranged for the event to be captured on film — something I hope to be able to share soon. See below for the full credit roll!

During th event, twelve teams dug into the theme of visualization and interaction. As in Houston last September, we started the event on Friday evening, after the Bootcamp (a full day of informal training). We have a bit of process to form the teams, and it usually takes a couple of hours. But with plenty of pizza and beer for fuel, the evening flew by. After that, it was two whole days of coding, followed by demos from all of the teams and a few prizes. Check out some of the pictures:

Thank you very much to everyone that helped make this event happen! Truly a cast of thousands:

David Holmes of Dell EMC for unparallelled awesomeness.
The whole Total team, but especially Frederic Broust, Sophie Segura, Yannick Pion, and Laurent Baduel...
...and also Arnaud Rodde for helping with the judging.
The Wintershall team, especially Andreas Beha, who also acted as a judge.
Brendon Hall of Enthought for sponsoring the event.
Carlos Castro and Kim Saabye Pedersen of Amazon AWS.
Mathias Hummel and Mahendra Roopa of NVIDIA.
Eirik Larsen of Earth Science Analytics for sponsoring the event and helping with the judging.
Duncan Irving of Teradata for sponsoring, and sorting out the T-shirts.
Monica Beech of Ikon Science for participating in the judging.
Matthias Hartung of Target for acting as a judge again.
Oliver Ranneries, plus Nina and Eva of Rainmaking Loft.
Christopher Backholm for taking such great photographs.

Finally, some statistics from the event:

63 participants, including 8 women (still way too few, but 100% better than 4 out of 63 in Paris)
15 students plus a handful of post-docs.
19 people from petroleum companies.
20 people from service and technology companies, including 7 from GiGa-infosystems!
1 no-show, which I think is a new record.

I will write a summary of all the projects in a couple of weeks when I've caught my breath. In the meantime, you can read a bit about them on our new events portal. We'll be steadily improving this new tool over the coming weeks and months.

That's it for another year... except we'll be back in Europe before the end of the year. There's the FORCE Hackathon in Stavanger in September, then in November we'll be in Aberdeen and London running some events with the Oil and Gas Authority. If you want some machine learning fun, or are looking for a new challenge, please come along!

Simon Virgo (centre) and his colleagues in Aachen built an augmented reality sandbox, powered by their research group's software, Gempy. He brought it along and three teams attempted projects based on the technology. Above, some of the participants … — Simon Virgo (centre) and his colleagues in Aachen built an augmented reality sandbox, powered by their research group's software, Gempy. He brought it along and three teams attempted projects based on the technology. Above, some of the participants are having a scrum meeting to keep their project on track.

UPDATED on 27 July

Check out the projects:

June 08, 2018

Looking forward to Copenhagen

June 08, 2018/ Matt Hall

We're in Copenhagen for the Subsurface Bootcamp and Hackathon, which start today, and the EAGE Annual Conference and Exhibition, which starts next week. Walking around the city yesterday, basking in warm sunshine and surrounded by sun-giddy Scandinavians, it became clear that Copenhagen is a pretty special place, where northern Europe and southern Europe seem to have equal influence.

The event this weekend promises to be the biggest hackathon yet. It's our 10th, so I think we have the format figured out. But it's only the third in Europe, the theme — Visualization and interaction — is new for us, and most of the participants are new to hackathons so there's still the thrill of the unknown!

Many thanks to our sponsors for helping to make this latest event happen! Support these organizations: they know how to accelerate innovation in our industry.

New events for UK

By the way, we just announced two new hackathons, one in London and one in Aberdeen, for the autumn. They are happening just before PETEX, the PESGB petroleum conference; find out more here. You can skill up for these events at some new courses, also just announced. The UK Oil and Gas Authority is offering our Intro to Geocomputing and Machine Learning class for free — apply here for a place. The courses are oversubscribed, so be sure to tell the OGA why you should get a place!

Code Show

There is a lot of other stuff happening at the EAGE exhibition this year — the HPC area, a new start-up area, and a digital transformation area which I hope is as bold as it sounds. Here's the complete schedule and some highlights:

WS02 Data Integration in Geoscience - Perspectives for Computational Methods, although it only contains 4 talks so I'm not sure if that means it will be short, or contain a lot of discussion (which would be cool).
Seismic Interpretation I - Automation through AI, Machine Learning, Deep Learning, with an accompanying poster session. Evan reported on this session last year.
Geothermal Solutions I (Dedicated Session) and Geothermal Solutions II — we always enjoy geothermal sessions. And geothermal is hot right now (heh, no but seriously, it is).
Computational Geoscience and Data/information Management, including the talk Digitalization in subsurface learning, which sounds interesting but apparently you can't read abstracts online so who knows.

There's lots of other stuff of course — EAGE has the most varied programme of any subsurface conference — but these are the sessions I'd be at if I had time to go to any sessions this year. But I won't because The hackathon is not all that's happening! Next week, starting on Tuesday, we're conducting a new experiment with the Code Show. In partnership with EAGE and Total, this is our attempt to bring some of the hackathon experience to everyone at EAGE. We'll be showing people the projects from the hackathon, talking to them about programming, and helping them get started on their own coding adventure. So if you're at EAGE, swing by Booth #1830 and say Hi.

May 31, 2018

Weekend worship in Salt Lake City

May 31, 2018/ Evan Bianco

The Salt Lake City hackathon — only the second we've done with a strong geology theme — is a thing of history, but you can still access the event page to check out who showed up and who did what. (This events page is a new thing we launched in time for this hackathon; it will serve as a public document of what happens at our events, in addition to being a platform for people to register, sponsor, and connect around our events.)

In true seat-of-the-pants hackathon style we managed to set up an array of webcams and microphones to record the finale. The demos are the icing on the cake. Teams were selected at random and were given 4 minutes to wow the crowd. Here is the video, followed by a summary of what each team got up to...

Unconformist.ai

Didi Ooi (University of Bristol), Karin Maria Eres Guardia (Shell), Alana Finlayson (UK OGA), Zoe Zhang (Chevron). The team used machine learning the automate the mapping of unconformities in subsurface data. One of the trickiest parts is building up a catalog of data-model pairs for GANs to train on. Instead of relying on thousand or hundreds of thousands of human-made seismic interpretations, the team generated training images by programmatically labelling pixels on synthetic data as being either above (white) or below (black) the unconformity. Project page. Slides.

Outcrops Gee Whiz

Thomas Martin (soon Colorado School of Mines), Zane Jobe (Colorado School of Mines), Fabien Laugier (Chevron), and Ross Meyer (Colorado School of Mines). The team wrote some programs to evaluate facies variability along drone-derived digital outcrop models. They did this by processing UAV point cloud data in Python and classified different rock facies using using weather profiles, local cliff face morphology, and rock colour variations as attributes. This research will help in the development drone assisted 3D scanning to automate facies boundaries mapping and rock characterization. Repo, Slides.

Jet Loggers

Eirik Larsen and Dimitrios Oikonomou (Earth Science Analytics), and Steve Purves (Euclidity). This team of European geoscientists, with their circadian clocks all out of whack, investigated if a language of stratigraphy can be extracted from the rock record and, if so, if it can be used as another tool for classifying rocks. They applied natural language processing (NLP) to an alphabetic encoding of well logs as a means to assist or augment the labour-intensive tasks of classifying stratigraphic units and picking tops. Slides.

Book Cliffs Bandits

Tom Creech (ExxonMobil) and Jesse Pisel (Wyoming State Geological Survey). The team started munging datasets in the Book Cliffs. Unfortunately, they really did not have the perfect, ready to go data, and by the time they pivoted to some workable open data from Alaska, their team name had already became a thing. The goal was build a tool to assist with lithology and stratigraphic correlation. They settled on change-point detection using Bayesian statistics, which they were using to build richer feature sets to test if it could produce more robust automatic stratigraphic interpretation. Repo, and presentation.

A channel runs through it

Nam Pham (UT Austin), Graham Brew (Dynamic Graphics), Nathan Suurmeyer (Shell). Because morphologically realistic 3D synthetic seismic data is scarce, this team wrote a Python program that can take seismic horizon interpretations from real data, then construct richer training data sets for building an AI that can automatically delineate geological entities in the subsurface. The pixels enclosed by any two horizons are labelled with ones, pixels outside this region are labelled with zeros. This work was in support of Nam's thesis research which is using the SegNet architecture, and aims to extract not only major channel boundaries in seismic data, but also the internal channel structure and variability – details that many seismic interpreters, armed even with state-of-the art attribute toolboxes, would be unable to resolve. Project page, and code.

GeoHacker

Malcolm Gall (UK OGA), Brendon Hall and Ben Lasscock (Enthought). Innovation happens when hackers have the ability to try things... but they also need data to try things out on. There is a massive shortage of geoscience datasets that have been staged and curated for machine learning research. Team Geohacker's project wasn't a project per se, but a platform aimed at the sharing, distribution, and long-term stewardship of geoscience data benchmarks. The subsurface realm is swimming with disparate data types across a dizzying range of length scales, and indeed community efforts may be the only way to prove machine-learning's usefulness and keep the hype in check. A place where we can take geoscience data, and put it online in a ready-to-use for for machine learning. It's not only about being open, online and accessible. Good datasets, like good software, need to be hosted by individuals, properly documented, enriched with tutorials and getting-started guides, not to mention properly funded. Website.

Petrodict

Mark Mlella (Univ. Louisiana, Lafayette), Matthew Bauer (Anchutz Exploration), Charley Le (Shell), Thomas Nguyen (Devon). Petrodict is a machine-learning driven, cloud-based lithology prediction tool that takes petrophysics measurements (well logs) and gives back lithology. Users upload a triple combo log to the app, and the app returns that same log with with volumetric fractions for it's various lithologic or mineralogical constituents. For training, the team selected several dozen wells that had elemental capture spectroscopy (ECS) logs – a premium tool that is run only in a small fraction of wells – as well as triple combo measurements to build a model for predicting lithology. Repo.

Seismizor

George Hinkel, Vivek Patel, and Alex Waumann (all from University of Texas at Arlington). Earthquakes are hard. This team of computer science undergraduate students drove in from Texas to spend their weekend with all the other geo-enthusiasts. What problem in subsurface oil and gas did they identify as being important, interesting, and worthy of their relatively unvested attention? They took on the problem of induced seismicity. To test whether machine learning and analytics can be used to predict the likelihood that injected waste water from fracking will cause an earthquake like the ones that have been making news in Oklahoma. The majority of this team's time was spent doing what all good scientists do –understanding the physical system they were trying to investigate – unabashedly pulling a number of the more geomechanically inclined hackers from neighbouring teams and peppering them with questions. Induced seismicity is indeed a complex phenomenon, but George's realization that, "we massively overestimated the availability of data", struck a chord, I think, with the judges and the audience. Another systemic problem. The dynamic earth – incredible in its complexity and forces – coupled with the fascinating and politically charged technologies we use for drilling and fracking might be one of the hardest problems for machine learning to attack in the subsurface.

AAPG next year is in San Antonio. If it runs, the hackathon will be 18–19 of May. Mark your calendar and stay tuned!

May 25, 2018

Productive chaos

May 25, 2018/ Matt Hall

Wednesday was a good day.

Over 150 participants came to Room 251 for all or part of the first 'unsession' at the AAPG Annual Conference and Exhibition in Salt Lake City. I was one of the hosts of the event, and emceed the afternoon.

In a nutshell, it was awesome. I have facilitated unsessions before, but this event was on a new scale. Twelve tables of 8–10 seats — covered in sticky notes, stickers, coloured pens, and large sheets of paper — quickly filled up. Together, we burned about 10 person-weeks of human productivity, raising the temperature in the room by several degrees in the process.

Diversity means good conversation

On the way in, people self-identified as mostly software (blue name tags) or mostly soft rocks (red), as a non-serious way to get a handle on how many data scientists we had vs how many people are focused on the rocks themselves — without, I hope, any kind of value judgment. The ratio was about 1:2.

As people continued to drift in, we counted people identifying with various categories, to get a very rough idea of who was in the room. The results are shown here. In addition, I counted 24 women present at the start. Part of the point here is to introduce participants to each other, but there's another purpose too. AAPG, like many scientific organizations, is grappling with diversity today. Like others, it needs to do much better. A small part of the solution is, I think, to name it and measure how we're doing at every opportunity. It's one way to pay more attention.

Harder to capture is the profound level of job diversity. People responsible for billion-dollar budgets sat with graduate students, AAPG medal winners with SEC executives. We even had a venture capitalist and a physician.

Look at all these lovely people:

Tangible and intangible output

At the start of the session, I told the room I wanted to fill the walls with things we made — with data. We easily achieved this, producing a survey of the skills geoscientists will need in the future, hundreds of high-value machine learning tasks in geoscience, a ranked list of the most interesting of these, and even some problem analysis of some of them. None of this was definitive, but I hope it will provide grist for the mill of future conversations about machine learning in geoscience.

As well as these tangible products, each person in the room walked away with new connections and new ideas — about machine learning, about collaboration, and about what scientific meetings can be like.

Acknowledgments

A lot of people contributed to making this event happen.

My unsession co-chairs, Brendon Hall and Yan Zaretskiy of Enthought — spent several hours on the phone with me over the last few weeks, shaping the content and flow of an event that was a bit, er, fuzzy.

We seeded the tables with some of the Software Underground crowd who were in town for the hackathon and AAPG. This ensures that there's no failure case: twelve people are definitely coming. And in the unlikely event that 100 people come, there are twelve allies to manage some of the chaos. Heartfelt thanks to the table hosts:

Didi Ooi of the University of Bristol
Graham Ganssle of Expero
Lisa Stright of Colorado State University
Thomas Martin of Colorado School of Mines
Tom Creech of ExxonMobil
David Holmes of Dell EMC
Steve Purves of Euclidity
Diego Castaneda of Agile
Evan Bianco of Agile

Jenny Cole of SEG came along to observe the session and I appreciated her enthusiastic help as it became clear we were in for more than the usual amount of entropy in the room. Theresa Curry of AAPG did an amazing job getting the venue set up, providing refreshments, and ensuring the photographers were there to capture some of the action. The ACE 2018 organizing committee, especially Zane Jobe and Lauren Birgenheier, did their part by agreeing to supprt including such a weird-sounding thing in the program.

Finally, thank you to the 100+ scientists that came to the event, not knowing at all what to expect. It was a privilege to receive your enthusiastic participation and thoughtful contributions. Let's do it again some time!

We will digitize the ideas and products of the unsession over the coming weeks. They will be released under an open license. Watch this space for updates.

If you're interested in the methodology we use for these events, check out Proceedings of an unsession in CSEG Recorder, November 2013. If you'd like help running an event like this, get in touch.

Blog