Results from the AAPG Machine Learning Unsession

Click here to visit the Google Doc write-up

Click here to visit the Google Doc write-up

Back in May, I co-hosted a different kind of conference session — an 'unsession' — at the AAPG Annual Conference and Exhibition in Salt Lake City, Utah. It was successful in achieving its main goal, which was to show the geoscience community and AAPG organizers a new way of collaborating, networking, and producing tangible outcomes from conference sessions.

It also succeeded in drawing out hundreds of ideas and questions around machine learning in geoscience. We have now combed over what the 120 people (roughly) produced on that afternoon, written it up in a Google Doc (right), and present some highlights right here in this post.

Click here to visit the Flickr photo album.

Click here to visit the Flickr photo album.

The unsession had three phases:

  1. Exploring current and future skills for geoscientists.

  2. Asking about the big questions in machine learning in geoscience.

  3. Digging into some of those questions.

Let's look at each one in turn.


skills_blog.jpg

Current and future skills

As an icebreaker, we asked everyone to list three skills they have that set them apart from others in their teams or organizations — their superpowers, if you will. They wrote these on green Post-It notes. We also asked for three more skills they didn't have today, but wanted to acquire in the next decade or so. These went on orange Post-Its. We were especially interested in those skills that felt intimidating or urgent. The 8 or 10 people at each table then shared these with each other, by way of introducing themselves.

The skills are listed in this Google Sheets document.

Unsurprisingly, the most common 'skills I have' were around geoscience: seismic interpretation, seismic analysis, stratigraphy, engineering, modeling, sedimentology, petrophysics, and programming. And computational methods dominated the 'skills I want' category: machine learning, Python, coding or programming, deep learning, statistics, and mathematics.

We followed this up with a more general question — How would you rate the industry's preparedness for this picture of the future, as implied by the skill gap we've identified?. People could substitute 'industry' for whatever similar scale institution felt meaningful to them. As shown (right), this resulted in a bimodal distribution: apparently there are two ways to think about the future of applied geoscience — this may merit more investigation with a more thorough survey.

Get the histogram data.

preparedness_histogram.png

Big questions in ML

After the icebreaker, we asked the tables to respond to a big question:

What are the most pressing questions in applied geoscience that can probably be tackled with machine learning?

We realized that this sounds a bit 'hammer looking for a nail', but justified asking the question this way by drawing an anology with other important new tools of the past — well logging, or 3D seismic, or sequence stratigrapghy. The point is that we have this powerful new (to us) set of tools; what are we going to look at first? At this point, we wanted people to brainstorm, without applying constraints like time or money.

This yielded approximately 280 ideas, all documented in the Google Sheet. Once the problems had been captured, the tables rotated so that each team walked to a neighboring table, leaving all their problems behind... and adopting new ones. We then asked them to score the new problems on two axes: scope (local vs global problems) and tractability (easy vs hard problems). This provided the basis for each table to choose one problem to take to the room for voting (each person had 9 votes to cast). This filtering process resulted in the following list:

  1. How do we communicate error and uncertainty when using machine learning models and solutions? 85 votes.

  2. How do we account for data integration, integrity, and provenance in our models? 78 votes.

  3. How do we revamp the geoscience curriculum for future geoscientists? 71 votes.

  4. What does guided, searchable, legacy data integration look like? 68 votes.

  5. How can machine learning improve seismic data quality, or provide assistive technology on poor data? 65 votes.

  6. How does the interpretability of machine learning model predictions affect their acceptance? 54 votes.

  7. How do we train a model to assign value to prospects? 51 votes.

  8. How do we teach artificial intelligences foundational geology? 45 votes.

  9. How can we implement automatic core description? 42 votes.

  10. How can we contain bad uses of AI? 40 votes.

  11. Is self-steering well drilling possible? 21 votes.

I am paraphrasing most of those, but you can read the originals in the Google Sheet data harvest.


Exploring the questions

In the final stage of the afternoon, we took the top 6 questions from the list above, and dug into them a little deeper. Tables picked their way through our Solution Sketchpads — especially updated for machine learning problems — to help them navigate the problems. Clearly, these questions were too enormous to make much progress in the hour or so left in the day, but the point here was to sound out some ideas, identify some possible actions, and connect with others interested in working on the problem.

One of the solution sketches is shown here (right), for the Revamp the geoscience curriculum problem. They discussed the problem animatedly for an hour.

This team included — among others — an academic geostatistician, an industry geostatistician, a PhD student, a DOE geophysicist, an SEC geologist, and a young machine learning brainbox. Amazingly, this kind of diversity was typical of the tables.

See the rest of the solution sketches in Flickr.


That's it! Many thanks to Evan Bianco for the labour of capturing and digitizing the data from the event. Thanks also to AAPG for the great photos, and for granting them an open license. And thank you to my co-chairs Brendon Hall and Yan Zaretskiy of Enthought, and all the other folks who helped make the event happen — see the Productive chaos post for details.

To dig deeper, look for the complete write up in Google Docs, and the photos in Flickr


calendar.png

Just a reminder... if it's Python and machine learning skills you want, we're running a Summer School in downtown Houston the week of 13 August. Come along and get your hands on the latest in geocomputing methods. Suitable for beginners or intermediate programmers.

Don't miss out! Find out more or register now.

Productive chaos

Wednesday was a good day.

Over 150 participants came to Room 251 for all or part of the first 'unsession' at the AAPG Annual Conference and Exhibition in Salt Lake City. I was one of the hosts of the event, and emceed the afternoon.

In a nutshell, it was awesome. I have facilitated unsessions before, but this event was on a new scale. Twelve tables of 8–10 seats — covered in sticky notes, stickers, coloured pens, and large sheets of paper — quickly filled up. Together, we burned about 10 person-weeks of human productivity, raising the temperature in the room by several degrees in the process.

Diversity means good conversation

On the way in, people self-identified as mostly software (blue name tags) or mostly soft rocks (red), as a non-serious way to get a handle on how many data scientists we had vs how many people are focused on the rocks themselves — without, I hope, any kind of value judgment. The ratio was about 1:2.

As people continued to drift in, we counted people identifying with various categories, to get a very rough idea of who was in the room. The results are shown here. In addition, I counted 24 women present at the start. Part of the point here is to introduce participants to each other, but there's another purpose too. AAPG, like many scientific organizations, is grappling with diversity today. Like others, it needs to do much better. A small part of the solution is, I think, to name it and measure how we're doing at every opportunity. It's one way to pay more attention.

Harder to capture is the profound level of job diversity. People responsible for billion-dollar budgets sat with graduate students, AAPG medal winners with SEC executives. We even had a venture capitalist and a physician.

Look at all these lovely people:

Tangible and intangible output

At the start of the session, I told the room I wanted to fill the walls with things we made — with data. We easily achieved this, producing a survey of the skills geoscientists will need in the future, hundreds of high-value machine learning tasks in geoscience, a ranked list of the most interesting of these, and even some problem analysis of some of them. None of this was definitive, but I hope it will provide grist for the mill of future conversations about machine learning in geoscience.

As well as these tangible products, each person in the room walked away with new connections and new ideas — about machine learning, about collaboration, and about what scientific meetings can be like.

Acknowledgments

A lot of people contributed to making this event happen.

My unsession co-chairs, Brendon Hall and Yan Zaretskiy of Enthought — spent several hours on the phone with me over the last few weeks, shaping the content and flow of an event that was a bit, er, fuzzy.

We seeded the tables with some of the Software Underground crowd who were in town for the hackathon and AAPG. This ensures that there's no failure case: twelve people are definitely coming. And in the unlikely event that 100 people come, there are twelve allies to manage some of the chaos. Heartfelt thanks to the table hosts:

  • Didi Ooi of the University of Bristol
  • Graham Ganssle of Expero
  • Lisa Stright of Colorado State University
  • Thomas Martin of Colorado School of Mines
  • Tom Creech of ExxonMobil
  • David Holmes of Dell EMC
  • Steve Purves of Euclidity
  • Diego Castaneda of Agile
  • Evan Bianco of Agile

Jenny Cole of SEG came along to observe the session and I appreciated her enthusiastic help as it became clear we were in for more than the usual amount of entropy in the room. Theresa Curry of AAPG did an amazing job getting the venue set up, providing refreshments, and ensuring the photographers were there to capture some of the action. The ACE 2018 organizing committee, especially Zane Jobe and Lauren Birgenheier, did their part by agreeing to supprt including such a weird-sounding thing in the program.

Finally, thank you to the 100+ scientists that came to the event, not knowing at all what to expect. It was a privilege to receive your enthusiastic participation and thoughtful contributions. Let's do it again some time!


We will digitize the ideas and products of the unsession over the coming weeks. They will be released under an open license. Watch this space for updates.

If you're interested in the methodology we use for these events, check out Proceedings of an unsession in CSEG Recorder, November 2013. If you'd like help running an event like this, get in touch.

The developer's mind

Humbled by the aura of the legendary Cavendish Labs sitting in the adjacent building next door, I refrain from expressing the full extent of my awe and reverence for this special place. "Sure", I think, "it's no big deal. Let's get on with it". I came to Cambridge to collaborate with Pietro Berkes. He's building Canopy Geo at Enthought. We spent the day spiking, apparently. Working shoulder to shoulder with Pietro was nearly as responsive as dictating a vision to a painter and watching it emerge before my eyes. He's darn good. During my visit, I took notice of some characteristics and guiding principles that top developers, such as himself and his colleagues, bring to their work.

On whiteboarding

The best way to be understood, to connect, or to teach, is to do it one-on-one in front of a whiteboard. It is fitting that all of the walls of their office space are whiteboard walls. Old marks wiped clean but still visible show remnant algorithms sketched out and stacked up upon each other. A well-worn workshop, where writing on the walls is the cultural norm. And for electronic communication? Some are deliberate to only check emails three times a day: first thing in the morning, midday, and mid afternoon. Any more often, would be disruptive to their flow. Email is the enemy of real work, but instant messaging can be be a good productivity tool. 

On discipline

To build something that is extensible takes a good deal of thoughfulness and discipline. Code will survive long after the project is over and the programmer has moved on. This doesn't just mean leaving an adequate documentation trail behind you, but also building a solid foundation that others can contribute to. Being Agile, it turns out, although not the only choice, also takes discipline and diligence in order to be effective. 

On ownership and responsibilty 

Authority is not given, responsibility is taken. Many of the best developers define themselves by the authorship of code and libraries. So attribution is not only necessary politeness, it is a direct line of communication. What body of work would you stand up and speak for? Someone may find a bug at 9:00 am in a different time zone. Will they wait till 2:00 pm to hear from you? Somehow, this decentralized system of self-appointed responsibilty just works. The longevity of emotional and intellectual labour, particuarly in an open source setting, is a fascinating concept. The work becomes more relevant because the developer never stops caring for it. You can change projects, you can change languages, you can change companies, but your work never leaves you. If that notion excites you, you are making an impact. 

The developer knows that prowess is earned by execution. They thrive in an accepted sub-culture of meritocracy: largely free of politics, organizational hierarchies, and other social drama that get in the way of the real work. With a mind cleared to deal with essential tasks, what emerges is the ego of an artist and a creator with the potential to act on it. "Now that we can build anything, what do we do next?"  

Proceedings of an unsession

Two weeks ago today Evan and I hosted a different kind of session at the Canada GeoConvention. It was an experiment in collaboration and integration, and I'm happy to say it exceeded our expectations. We will definitely be doing it again, so if you were there, or even if you weren't, any and all feedback will help ensure the dial goes to 11.

One of the things we wanted from the session was evidence. Evidence of conversation, innovation, and creative thinking. We took home a great roll of paper and sticky notes, and have now captured it all in SubSurfWiki, along with notes from the event. You are invited to read and edit. Be bold! And please share the link...

  ageo.co/unsession

The video from the morning is in the editing suite right now: watch for that too.

Post-It NoteWe have started a write-up of the morning. If you came to the session, please consider yourself a co-author: your input and comments are welcome. You might be unaccustomed to editing a community document, but don't be shy — that's what it's there for. 

We want to share two aspects of the event on the blog. First, the planning and logistics of the session — a cheatsheet for when we (or you!) would like to repeat the experience. Second, the outcomes and insights from it — the actual content. Next time: planning an unsession.