May 19, 2021

How can technical societies support openness?

May 19, 2021/ Matt Hall

There’s an SPE conference on openness happening this week. Around 60 people paid the $400 registration fee — does that seem like a lot for a virtual conference? — and it’s mostly what you’d expect: talks and panel discussions. But there’s 20 minutes per day for open discussion, and we must be grateful for small things! For sure, it is always good to see the technical societies pay attention to open data, open source code, and open access content.

But what really matters is action, and in my breakout room today I asked about SPE’s role in raising the community’s level of literacy around openness. Someone asked in turn what sorts of things the organization could do. I said my answer needed to be written down 😄 so here it is.

To save some breath, I’m going to use the word openness to talk about open access content, open source code, and open data. And when I say ‘open’, I mean that something meets the Open Definition. In a nutshell, this states:

“Open data and content can be freely used, modified, and shared by anyone for any purpose”

Remember that ‘free’ here means many things, but not necessarily ‘free of charge’.

So that we don’t lose sight of the forest for the tree, my advice boils down to this: I would like to see all of the technical societies understand and embrace the idea that openness is an important way for them to increase their reach, improve their accessibility, become more equitable, increase engagement, and better serve their communities of practice.

No, ‘increase their revenue’ is not on that list. Yes, if they do those things, their revenue will go up. (I’ve written about the societies’ counterproductive focus on revenue before.)

Okay, enough preamble. What can the societies do to better serve their members? I can think of a few things:

Advocate for producers of the open content and technology that benefits everyone in the community.
Help member companies understand the role openness plays in innovation and help them find ways to support it.
Take a firm stance on expectations of reproducibility for journal articles and conference papers.
Provide reasonable, affordable options for authors to choose open licences for their work (and such options must not require a transfer of copyright).
When open access papers are published, be clear about the licence. (I could not figure out the licence on the current most read paper in SPE Journal, although it says ‘open access’.)
Find ways to get well-informed legal advice about openness to members (this advice is hard to find; most lawyers are not well informed about copyright law, nevermind openness).
Offer education on openness to members.
Educate editors, associate editors, and meeting convenors on openness so that they can coach authors, reviewers., and contributors.
Improve peer review machinery to better support the review of code and data submissions.
Highlight exemplary open research projects, and help project maintainers improve over time. (For example, what would it take to accelerate MRST’s move to an open language? Could SPE help create those conditions?)
Recognize that open data benchmarks are badly needed and help organize labour around them.
Stop running data science contests that depend on proprietary data.
Put an open licence on PetroWiki. I believe this was Apache’s intent when they funded it, hence the open licences on AAPG Wiki and SEG Wiki. (Don’t get me started on the missed opportunity of the SEG/AAPG/SPE wikis.)
Allow more people from more places to participate in events, with sympathetic pricing, asynchronous activities, recorded talks, etc. It is completely impossible for a great many engineers to participate in this openness workshop.
Organize more events around openness!

I know that SPE, like the other societies, has some way to go before they really internalize all of this. That’s normal — change takes time. But I’m afraid there is some catching up to do. The petroleum industry is well behind here, and none of this is really new — I’ve been banging on about it for a decade and I think of myself as a newcomer to the openness party. Jon Claerbout and Paul de Groot must be utterly exhausted by the whole thing!

The virtual conference this week is an encouraging step in the right direction, as are the recent SPE datathons (notwithstanding what I said about the data). Although it’s a late move — making me wonder if it’s an act of epiphany or of desperation — I’m cautiously encouraged. I hope the trend continues and picks up pace. And I’m looking forward to more debate and inspiration as the week goes on.

May 18, 2021

Projects from the Geothermal Hackathon 2021

May 18, 2021/ Matt Hall

The second Geothermal Hackathon happened last week. Timed to coincide with the Geosciences virtual event of the World Geothermal Congress, our 2-day event brought about 24 people together in the famous Software Underground Chateau (I’m sorry if I missed anyone!). For comparison, last year we were 13 people, so we’re going in the right direction! Next time I hope we’re as big as one of our ‘real world’ events — maybe we’ll even be able to meet up in local clusters.

Here’s a rundown of the projects at this year’s event:

Induced seismicity at Espoo, Finland

Alex Hobé, Mohsen Bazagan and Matteo Niccoli

Alex’s original workflow for creating dynamic displays of microseismic events was to create thousands of static images then stack them into a movie, so the first goal was something more interactive. On Day 1 Alex built a Plotly widget with a time zoomer/slider in a Jupyter Notebook. On day 2 he and Matteo tried Panel for a dynamic 3D plot. Alex then moved the data into LLNL Visit for fully interactive 3D plots. The team continues to hack on the idea.

Fluid inclusions at Coso, USA

Diana Acero-Allard, Jeremy Zhao, Samuel Price, Lawrence Kwan, Jacqueline Floyd, Brendan, Gavin, Rob Leckenby and Martin Bentley

Diana had the idea of a gas analysis case study for Coso Field, USA. The team’s specific goal was to develop visualization tools for interetpaton of fluid inclusion gas data to identify fluid types, regions of permeability, and geothermal processes. They had access to analyses from 29 wells, requiring the usual data science workflow: find and load the data, clean the data, make some visualizations and maps, and finally analyse the permeability. GitHub repo here.

Utah Forge data pipeline

Andrea Balza, Evan Bianco, and Diego Castañeda

Andrea was driven to dive into the Utah FORGE project. Navigating the OpenEI data portal was a bit hit-and-miss, having to download files to get into ZIP files and so on (this is a common issue with open data repositories). The team eventually figured out how to programmatically access the files to explore things more easily — right from a Jupyter Notebook. Their code for any data on the OpenEI site, not just Utah FORGE, so it’s potentially a great research tool. GitHub repo here.

Pythonizing a power density estimation tool

Irene Wallis, Jan Niederau, Hannah Wood, Will Middlebrook, Jeff Jex, and Bill Cummings

Like a lot of cool hackathon projects, this one started with spreadsheet that Bill created to simplify the process of making power density estimates for geothermal fields under some statistical assumptions. Such a clear goal always helps focus the mind and the team put together some Python notebooks and then a Streamlit app — which you can test-drive here! From this solid foundation, the team has plenty of plans for new directions to take the tool. GitHub repo here.

Computing boiling point for depth

Thorsten Hörbrand, Irene Wallis, Jan Niederau and Matt Hall

Irene identified the need for a Python tool to generate boiling-point-for-depth curves, accommodating various water salinities and chemistries. As she showed during her recent TRANSFORM tutorial (which you must watch!), so-called BPD curves are an important part of geothermal well engineering. The team produced some scripts to compute various scenarios, based on corrections in the IAPWS standards and using the PHREEQC aqueous geochemistry modeling software. GitHub repo here.

A big Thank You to all of the hackers that came along to this virtual event. Not quite the same as a meatspace hackathon, admittedly, but Gather.town + Slack was definitely an improvement over Zoom + Slack. At least we have an environment in which people can arrive and immediately get a sense of what is happening in the event. When you realize that people at the tables are actually sitting in Canada, the US, the UK, Switzerland, South Africa, and Auckland — it’s clear that this could become an important new way to collaborate across large distances.

Do check out all these awesome and open-source projects — and check out the #geothermal channel in the Software Underground to keep up with what happens next. We’ll be back in the future — perhaps the near future! — with more hackathons and more geothermal technology. Hopefully we’ll see you there! 🌋

March 16, 2021

Transformation in 2021

March 16, 2021/ Matt Hall

Virtual confererences have become — for now — the norm. In many ways they are far better than traditional conferences: accessible to all, inexpensive to organize and attend, asynchronous, recorded, and no-one has to fly 5,000 km to deliver a PowerPoint. In other ways, they fall short, for example as a way to meet new collaborators or socialize with old ones. As face-to-face meetings become a possibility again this summer, smart organizations will figure out ways to get the best of both worlds.

The Software Underground is continuing its exploration of virtual events next month with the latest edition of the TRANSFORM festival of the digital subsurface. In broad strokes, here’s what’s on offer:

The Subsurface Hackathon, starting on 16 April — all are welcome, including those new to programming.
20 free & awesome tutorials, covering topics from Python to R, geothermal wells to seismic, and even reservoir simulation! And of course there’s a bit of machine learning and physics-based modeling in there too. Look forward to content from scientists in North & South America, Norway, Nigeria, and New Zealand.
Lightning talks from 24 members of the community — would you like to do one?
Birds of a Feather community meet-ups, a special Xeek challenge, and other special events.
The Annual General Meeting of the Software Underground, where we’ll adopt our by-law and appoint the board.

Detailed schedule

We’ll even try to get at that tricky “hang out with other scientists” component, because we will have a virtual Gather.town world in which to hang out and hack, chat, or watch the livestreams.

If last year’s event is anything to go by, we can expect fantastic tutorial content, innovative hackathon projects, and great conversation between at least 750 digital geoscientists and engineers. (If you missed TRANSFORM 2020, don’t worry — all the content from last year is online and free forever, so it’s not too late to take part! Check it out.)

Registering for TRANSFORM

Registration is free, or pay-what-you-like. In other words, if you have funding or expenses for conferences and training, there’s an option to pay a small amount. But anyone can attend TRANSFORM free of charge. Thank you to the event sponsors, Studio X, for making this possible. (I will write about Studio X at a later date — they are doing some really cool things in the digital subsurface.)

To register for any part of TRANSFORM — even if you just want to come to the hackathon or a tutorial — click this button and complete the process on the Software Underground website. It’s a ‘pay what you like’ event, so there are 3 registration options with different prices — these are just different donation amounts. They don’t change anything about your registration.

I hope we see you at TRANSFORM. In the meantime, please jump into the Software Underground Slack and get involved in the conversations there. (You can also catch up on recent Software Underground highlights in the new series of blog posts.)

March 02, 2021

The hot rock hack is back

March 02, 2021/ Matt Hall

Last year we ran the first ever Geothermal Hackathon. As with all things, we started small, but energetic: fourteen of us worked on six projects. Topics ranged from project management to geological mapping to natural language processing. It was a fun two days not thinking about coronavirus.

This year we’ll be meeting up on Thursday 13 and Friday 14 May, starting right after the Geoscience Virtual Event of the World Geothermal Congress. Everyone is invited — geoscientists, engineers, data nerds, programmers. No experience of geothermal is necessary, just creativity and curiosity.

Learn more & register

Projects are already being discussed on the Software Underground; here are some of the ideas:

Data-munging project for Utah Forge, especially well 58-32.
Update the Awesome list Thomas Martin started last year.
Implementing classic, or newly published, equations and algorthims from the literature.

I expect the preceeding WGC event will spark some last-minute projects too. But for the time being, you’re welcome to add or vote on ideas on the event page. What tools or visualizations would you find useful?

Build some digital geo skills

📣 If you’re looking to build up your coding skills before the hackathon — or for a research project or an idea at work — join us for a Python class. We teach the fundamentals of Python, NumPy and matplotlib using geological and geophysical examples and geo-familiar datasets. There are two classes coming up in May (Digital Geology) and June (Digital Geophysics).

Check out training classes

February 02, 2021

Future proof

February 02, 2021/ Matt Hall

Last week I wrote about the turmoil many subsurface professionals are experiencing today. There’s no advice that will work for everyone, but one thing that changed my life (ok, my career at least) was learning a programming language. Not only because programming computers is useful and fun, but also because of the technology insights it brings. Whether you’re into data management or machine learning, workflow automation or just being a more rounded professional, there really is no faster way to build digital muscles!

Six classes

We have six public classes coming up in the next few weeks. But there are thousands of online and virtual classes you can take — what’s different about ours? Here’s what I think:

All of the instructors are geoscientists, and we have experience in sedimentology, geophysics, and structural geology. We’ve been programming in Python for years, but we remember how it felt to learn it for the first time.
We refer to subsurface data and typical workflows throughout the class. We don’t use abstract or unfamiliar examples. We focus 100% on scientific computing and data visualization. You can get a flavour of our material from the X Lines of Python blog series.
We want you to be self-sufficient, so we give you everything you need to start being productive right away. You’ll walk away with the full scientific Python stack on your computer, and dozens of notebooks showing you how to do all sorts of things from loading data to making a synthetic seismogram.

Let’s look at what we have on offer.

Upcoming classes

We have a total of 6 public classes coming up, in two sets of three classes: one set with timing for North, Central and South America, and one set with timing for Europe, Africa, and the Middle East. Here they are:

Intro to Geocomputing, 5 half-days, 15–19 Feb — 🌎 Timing for Americas — 🌍 Timing for Europe & Africa — If you’re just getting started in scientific computing, or are coming to Python from another language, this is the class for you. No prerequisites.
Digital Geology with Python, 4 half-days, 22–25 Feb — 🌍 Timing for Europe & Africa — A closer look at geological workflows using Python. This class is for scientists and engineers with some Python experience.
Digital Geophysics with Python, 4 half-days, 22–25 Feb — 🌎 Timing for Americas — We get into some geophysical workflows using Python. This class is for quantitative scientists with some Python experience.
Machine Learning for Subsurface, 4 half-days in March — 🌎 Timing for Americas (1–4 Mar) — 🌍 Timing for Europe & Africa (8–11 Mar) — The best way into machine learning for earth scientists and subsurface engineers. We give you everything you need to manage your data and start exploring the world of data science and machine learning.

Follow the links above to find out more about each class. We have space for 14 people in each class. You find pricing options for students and those currently out of work. If you are in special circumstances, please get in touch — we don’t want price to be a barrier to these classes.

In-house options

If you have more than about 5 people to train, it might be worth thinking about an in-house class. That way, the class is full of colleagues learning things together — they can speak more openly and share more freely. We can also tailor the content and the examples to your needs more easily.

Get in touch if you want more info about this approach.

January 18, 2021

Openness is a two-way street

January 18, 2021/ Matt Hall

Last week the Data Analysis Study Group of the SPE Gulf Coast Section announced a new machine learning contest (I’m afraid registration is now closed, even though the contest has not started yet). The task is to predict shear-wave sonic from other logs, similar to the SPWLA PDDA contest last year. This is a valuable problem in the subsurface, because shear sonic log is essential for computing elastic properties of rocks and therefore in predicting rock and fluid properties or processing seismic. Indeed, TGS have built a business on predicted logs with their ARLAS product. There’s money in log prediction!

The task looks great, but there’s one big problem: the dataset is not open.

Why is this a problem?

Before answering that, let’s look at some context.

What’s a machine learning contest?

Good question. Typically, an organization releases a dataset (financial timeseries, Netflix viewer data, medical images, or whatever). They invite people to predict some valuable property (when to sell, which show to recommend, how to treat the illness, or whatever). And they pick the best, measured against known labels on a hidden dataset.

Kaggle is one of the largest platforms hosting such challenges, and they often attract thousands of participants — competing for large prizes. TGS ran a seismic salt-picking contest on the platform, attracting almost 74,000 submissions from 3220 teams with a $100k prize purse. Other contests are more grass-roots, like the one I ran with Brendon Hall in 2016 on lithology prediction, and like this SPE contest. It’s being run by a team of enthusiasts without a lot of resources from SPE, and the prize purse is only $1000 — representing about 3 hours of the fully loaded G&A of an oil industry professional.

What has this got to do with reproducibility?

Contests that award a large prize in return for solving a hard problem are essentially just a kind of RFP-combined-with-consulting-job. It’s brutally inefficient: hundreds or even thousands of people spend hours on the problem for free, and a handful are financially rewarded. These contests attract a lot of attention, but I’m not that interested in them.

Community-oriented events like this SPE contest — and the recent FORCE one that Xeek hosted — are more interesting and I believe they are more impactful. They have lots of great outcomes:

Lots of people have fun working on a hard problem and connecting with each other.
Solutions are often shared after, or even during, the contest, so that everyone learns and grows their toolbox.
A new open dataset that might even become a much-needed benchmark for the task in hand.
Researchers can publish what they did, or do later. (The SEG ML contest tutorial and results article have 136 citations between them, largely from people revisiting the dataset to show new solutions.)

A lot of new open-source machine learning code is always exciting, but if the data is not open then the work is by definition not reproducible. It seems especially unfair — cheeky, even — to ask participants to open-source their code, but to keep the data proprietary. For sure TGS is interested in how these free solutions compare to their own product.

Well, life’s not fair. Why is this a problem?

The data is being shared with the contest participants on the condition that they may not share it. In other words it’s proprietary. That means:

Participants are encumbered with the liability of a proprietary dataset. Sure, TGS is sharing this data in good faith today, but who knows how future TGS lawyers will see it after someone accidentally commits it to their GitHub repo? TGS is a billion-dollar company, they will win a legal argument with you. (Having said that, there’s no NDA or anything, just a checkbox in a form. I don’t know how binding it really is… but I don’t want to be the one that finds out.)
Participants can’t publish reproducible papers on their own work. They can publish classic oil-indsutry, non-reproducible work — I did this thing but no-one can check it because I can’t give you the data — but do we really need more of that? (In the contest introductory Zoom, someone asked about publishing plots of the data. The answer: “It should be fine.” Are we really still this naive about data?)

If anyone from TGS is reading this and thinking, “Come on, we’re not going to sue anyone — we’re not GSI! — it’s fine :)” then my response is: Wonderful! In that case, why not just formalize everything by releasing the data under an open licence — preferably Creative Commons Attribution 4.0? (Unmodified! Don’t make the licensing mistakes that Equinor and NAM have made recently.) That way, everyone knows their rights, everyone can safely download the data, and the community can advance. And TGS looks pretty great for contributing an awesome dataset to the subsurface machine learning community.

I hope TGS decides to release the data with an open licence. If they don’t, it feels like a rather one-sided deal to me. And with the arrangement as it stands, there’s no way I would enter this contest.

May 08, 2020

The hot rock hack happened

May 08, 2020/ Matt Hall

I was excited about the World Geothermal Congress this year. (You remember conferences — big, expensive, tiring lecture-marathons that scientists used to go to. But sometimes they were fun.)

Until this year, the WGC has only happened every 5 years and we missed the last one because it was in Australia… and the 2023 edition (it’s moving to a 3-year cycle) will be in China. So this year’s event, just a stone’s throw away in Iceland, was hotly anticipated.

And it still is, because now it will be next May. And we’ll be doing a hackathon there! You should come, get it in your calendar: 27 and 28 May 2021.

Meanwhile, this year… we moved our planned hackathon online. For the record, here’s what happened at the first Geothermal Hackathon.

Logistics: Timezones are tricky

There’s no doubt, the biggest challenge was the rotation of the earth (though admittedly it has other benefits). I believe the safest way to communicate times to a global audience is UTC, so I’ll stick to that here. It’s not ideal for anyone (except Iceland, appropriately enough in this case) but it reduces errors. We started at 0600 UTC and went until about 2100 UTC each day; about 15 hours of fun. I did check in briefly at 0000 UTC on each morning (my evening), in case anyone from New Zealand showed up, but no-one did.

Rob Leckenby and Martin Bentley, both in the UTC+2 zone, handled the early morning hosting, with me, Evan and Diego showing up a few hours later (we’re all in Canada, UTC–a few). This worked pretty well even though, as usual, the hackers were all perfectly happy and mostly self-sufficient whether we were there or not.

Technology-wise, we met up on Zoom, which was good for the start and the end of the day, and also for getting the attention of others in between (many people left the audio open, one ear to the door, so to speak.) Alongside Zoom we used the Software Underground’s Slack. As well as the #geothermal channel, each project had a channel — listed below — which meant that each project could have a separate video meetup at any time, as well as text-based chat and code-sharing. It was a good combination.

Let’s have a look at the hacks.

Six projects

An awesome list for geothermal — #geothermal-awesome — Thomas Martin (Colorado School of Mines), with some input from me and others, made a great start on an ‘awesome list’ document for geothermal, with a machine learning amphasis. He lists papers, tools, and open data. You can read (or contribute to!) the document here.

Collaboration tools for geothermal teams — #geothermal-collaboration-tools — Alex Hobé (Uppsala) and Valentin Métraux (GEO2X), with input from Martin Bentley and others, had a clear vision for the event: he wanted to map out the flow of data and interpretations between professionals in a geothermal project. I’ve seen similar projects get nowhere near as far in 2 months as Alex got in 2 days. The team used Holoviews and NetworkX to make some nice graphics.

GEOPHIRES web app — #geothermal-geophires — Marko Gauk (SeisWare) wanted to get into web apps at the event, and he succeeded! He built a web-based form for submitting jobs to a server running GEOPHIRES v2, a ‘full field’ geothermal project modeling tool. You can check out his app here.

Geothermal Natural Language Processing — #geothermal-nlp — Mohammad ‘Jabs’ Aljubran (Stanford), Friso (Denver), along with Rob and me, did some playing with the Stanford geothermal bibliographic database. Jabs and Friso got a nice paper recommendation engine working, while Rob and I managed to do automatic geolocation on the articles — and Jabs turned this into some great maps. Repo is here. Coming soon: a web app.

Experiments with porepy — #geothermal-porepy — Luisa Zuluaga, Daniel Coronel, and Sam got together to see what they could do with porepy, a porous media simulation tool, especially aimed at modeling fractured and deformable rocks.

Radiothermic map of Nova Scotia — #geothermal-radiothermic — Evan Bianco downloaded some open data for Nova Scotia, Canada, to see if he could implement this workflow from Beamish and Busby. But the data turned out to be unscaled (among other things), and therefore probably impossible to use for quantitative purposes. At least he made progress on a nice map.

All in all it was a fun couple of days. You can’t beat a hackathon for leaving behind emails and to-do lists for a day

January 23, 2020

The hacks are back

January 23, 2020/ Matt Hall

We ran the first geoscience hackathon over 7 years ago in Houston. Since then we’ve hosted another 26 subsurface hackathons — that’s 175 projects, and over 900 hackers. Last year, 10 of the 11 hackathons that Agile* facilitated were in-house.

This is exciting. It means that grass-roots, creative, high-speed collaboration and technology development is possible inside large corporations. But it came at the cost of reducing our public events… and we want to bring the hackathon experience to everyone!

So this year, as well as helping execute a dozen or so in-house hackathons, we’ll be running and supporting more public hackathons too. So if you’ve been waiting for a chance to learn to code or try a social coding event, or just hang out with a lot of nerdy geoscientists and engineers — here’s your chance!

May: Geothermal Hackathon

The first event of the year is a new one for us. We’ll be at the World Geothermal Congress in Reykjavik, Iceland, in the last week of April. The second weekend, 2 and 3 May, we’ll be running a hackathon on machine learning for geothermal subsurface applications. Iceland is only a short flight from the rest of Europe and many places in North America, so if you fancy something completely different, this is for you! Find out more and sign up.

[An earlier version of this post had the event on the previous weekend.]

June: Subsurface Hackathon (USA)

We’re back in Houston in June! The AAPG ACE is there — clashing with EAGE unfortunately — and we’ll be holding a (completely unrelated) hackathon on the weekend before: 5 to 7 June. Enthought is hosting the event in their beautiful new Houston digs, and Dell EMC is there too as a major sponsor. The theme is Tools… It’s going to be a big one! Find out more and sign up.

We are running two public Python classes before this event. Check them out.

June: Amstel Hack (Europe)

The brilliant Filippo Broggini (ETHZ) is running a European hackathon again this year, again right before EAGE — and therefore the same weekend as the Houston event: 6 and 7 June. The event is being hosted at Shell’s Technology Centre in Amsterdam, and is guaranteed to be awesome. If you’re going to EAGE, it’s a no-brainer. Find out more and sign up.

We are also running a public Python class before this event. Check it out.

That’s it for now… I hope you can come to one of these events. If you’re just starting out on your technology journey, have no fear — these events are friendly and welcoming. If you can’t make any of them, don’t worry: there will be more in the autumn, so stay tuned. Or, if you want help making one happen at your company, get in touch.

To get email alerts about new hackathon events, sign up here. No spam, we promise.

January 08, 2020

Learn to code in 2020

January 08, 2020/ Matt Hall

Happy New Year! I hope 2020 is going well so far and that you have audacious plans for the new decade.

Perhaps among your plans is learning to code — or improving your skills, if you’re already on the way. As I wrote in 2011, programming is more than just writing code: it’s about learning a new way to think, not just about data but about problems. It’s also a great way to quickly raise your digital literacy — something most employers value more each year. And it’s fun.

We have three public courses planned for 2020. We’re also planning some public hackathons, which I’ll write about in the next week or three. Meanwhile, here’s the lowdown on the courses:

Lausanne in March

Rob Leckenby will be teaming up with Valentin Metraux of Geo2X to teach this 3-day class in Lausanne, Switzerland. We call it Intro to Geocomputing and it’s 100% suitable for beginners and people with less than a year or so of experience in Python. By the end, you’ll be able to read and write Python, write functions, read files, and run Jupyter Notebooks. More info here.

Amsterdam in June

If you can’t make it to Lausanne, we’ll be repeating the Intro to Geocomputing class in Amsterdam, right before the Software Underground’s Amstel Hack hackathon event (and then the EAGE meeting the following week). Check out the Software Underground Slack — look for the #amstel-hack-2020 channel — to find out more about the hackathon. More info here.

Houston in June

There’s also a chance to take the class in the US. The week before AAPG (which clashes with EAGE this year, which is very weird), we’ll be teaching not one but two classes: Intro to Geocomputing, and Intro to Machine Learning. You can take either one, or both — but be aware that the machine learning class assumes you know the basics of Python and NumPy. More info here.

In-house options

We still teach in-house courses (last year we taught 37 of them!). If you have more than about 5 people to train, then in-house is probably the way to go; we’d be delighted to work with you to figure out the best curriculum for your team.

Most of our classes fall into one of the following categories:

Beginner classes like the ones described above, usually 3 days.
Machine learning classes, like the Houston class above, usually 2 or 3 days.
Other more advanced classes built around engineering skills (object-oriented programming, testing, packaging, and so on), usually 3 days.
High-level digital literacy classes for middle to upper management, usually 1 day.

We also run hackathons and design sprints for teams that are trying to solve tricky problems in the digital subsurface, but those are another story…

Get in touch if you want more info about any of these.

Whatever you want to learn in 2020, give it everything you have. Schedule time for it. The discipline will pay off. If we can help or support you somehow, please let us know — above all, we want you to succeed.

October 11, 2019

FORCE ML 2019: project round-up

October 11, 2019/ Matt Hall

The FORCE Machine Learning Hackathon and Symposium were a great success again this year (read all about last year). Kudos to Peter Bormann of ConocoPhillips Norge, who put the programme together — held over 3 days at the NPD in Stavanger, Norway, together. Here’s a round-up of the projects.

A visualization of how human-generated rock descriptions were distributed with respect to porosity measured from the core plug.

from.cr.dscrptn.to.clssfctn

The team took up Peter’s challenge of translating abbreviated core descriptions (hence the strange team name) into something useful. Overall, the pipeline was clean > translate > classify. Cleaning was required to deal with a lot of ‘as above’ and other expediencies. As a first pass for translation, they tried simply substituting complete words for abbreviations: sandstone for ss, limestone for ls, and so on, but had more success with a bidirectional LSTM.

Find it clean it analyse it

Given a pile of undifferentiated well files containing over 40,000 curves including LAS and DLIS, the team wanted to find and analyse image log data, especially FMIs. They successfully read the data they wanted with the new dlisio library from Equinor, then threw some texture analysis at it after interpolating across the data gaps and resampling to 360 bins. They then applied a k-means clustering with 6 clusters, to find some key textures in the data. GitHub repo.

Just Surf

Using a synthetic dataset, the team (mostly coders from Emerson) set out to use convolutional deep neural networks to check if the structural model seems sensible, quantify the uncertainty, and validate the gridding algorithm used. The team brought 100 realizations for each map, and tried various combinations of single realizations and statistics from the cohort. They found that transfer learning on ResNet-50 did better than training from scratch. They said they looked forward to building on the work to produce tools for quality assurance, and they hope to use seismic data next time.

Siamese seismic

The team applied a Siamese network, normally used on human faces, to the problem of classifying 3D seismic facies. The method is semi-supervised: the network is trained on the entire dataset, with some labeled subimages. This establises a latent space (a 3D latent space of the F3 seismic data is shown to the right) with semantically meaningful norms (i.e. distance between points means something useful), in which clusters can be found. Classification on unseen subimages is done in the latent space. The team almost had an app working, and also produced the start of a new open dataset of labels for the F3 seismic volume. The team was rewarded with a prize for innovation. GitHub repo.

Lost Frequencies

This team formed spontaneously at the Tuesday meetup when it looked like there might not be any seismic projects! They set out to estimate attenuation using neural networks. This involved learning to pick maximum frequency from the peak frequency plus the seismic trace. They found that a 1D CNN did best out of all the methods they tried, and that including well logs somehow would likely improve the result quite a bit.

Rock Pandas

A creenshot from the app the team built. Each circle is a collection of documents that can be filtered dynamically.

Geolocalizing documents is a much-needed task in any pile of PDF files. This team got lots of documents from Peter, with the goal to put them on a map. The characteristically diverse team extracted keywords from an NPD corpus, with preprocessing and regular expressions for well names and so on. They built a nice-looking slippy map app allowing a user to click on a well or field entity, and see the documents associated with the location. Documents hitting multiple keywords were tagged on many entities. The Rock Pandas team won the coveted People's Choice Award, for making a great start on a hard problem, and producing a working app in limited time. GitHub repo.

Core team

In a reprise of a project last year, the team set out to get grain size from core photos. But then they thought: why not cut out the middle man and go straight for reservoir parameters? So they tried to get permeability from core photos. Using simple models, they got an accuracy of 60% with linear regression, and 69% with a neural network. Although they had some glitches in their approach (using porosity and not using depth, for example), they built a first pipeline for an interesting problem.

Some Unsupervised team members clustering around a problem.

Somehow Unsupervised

Unsupervised learning has been a theme in a coupe of previous hackathons (Copenhagen and FORCE 2018), and it was good to see another iteration of these exciting ideas. The team used the very nice Geolink dataset. After filtering out poor quality data (based on caliper and local statistics), the team applied dimensionality reduction methods like UMAP and t-SNE (these are conceptually like PCA, but much more effective) to reduce the dataset to just 2 dimensions — allowing them to make lots of crossplots. Coloring points by lithology, sand type, GR, or fluid type allowed them to look at all sorts of trends and patterns. The team won a prize for the amount of ground they covered and the attractive plots. GitHub repo.

Rock Stars

The Rock Stars took on Peter’s Make me that rock project. He wants an app which provides plausible rock properties and uncertainty for any location, depth, and formation on the Norwegian shelf. This gigantic team (12 of them!) decided to cluster the data first, then build a model for each cluster. They built an app which could indeed provide porosity and permeability given a location and depth. That such a huge team managed to converge on anything was an achievement, and they won a prize for taking on a tough project and getting a good way into it.

That’s it for this year! Thanks to all the participants for a fun week, and thank you to the sponsors (below) for supporting the event. Hope to see you in 2020.

More pictures from the event. Thanks to Alex Schaaf and the others that took photos.

Blog

Induced seismicity at Espoo, Finland

Fluid inclusions at Coso, USA

Utah Forge data pipeline

Pythonizing a power density estimation tool

Computing boiling point for depth

Registering for TRANSFORM

Build some digital geo skills

Six classes

Upcoming classes

In-house options

Why is this a problem?

What’s a machine learning contest?

What has this got to do with reproducibility?

Well, life’s not fair. Why is this a problem?

Logistics: Timezones are tricky

Six projects

May: Geothermal Hackathon

June: Subsurface Hackathon (USA)

June: Amstel Hack (Europe)

Lausanne in March

Amsterdam in June

Houston in June

In-house options

from.cr.dscrptn.to.clssfctn

Find it clean it analyse it

Just Surf

Siamese seismic

Lost Frequencies

Rock Pandas

Core team

Somehow Unsupervised

Rock Stars

Agile

Search the site

Recent posts

Previous posts

@kwinkunks on Twitter