March 06, 2018

Jounce, Crackle and Pop

March 06, 2018/ Matt Hall

I saw this T-shirt recently, and didn't get it. (The joke or the T-shirt.)

It turns out that the third derivative of displacement $x$ with respect to time $t$ — that is, the derivative of acceleration $\mathbf{a}$ — is called 'jerk' (or sometimes, boringly, jolt, surge, or lurch) and is measured in units of m/s³.

So far, so hilarious, but is it useful? It turns out that it is. Since the force $\mathbf{F}$ on a mass $m$ is given by $\mathbf{F} = m\mathbf{a}$, you can think of jerk as being equivalent to a change in force. The lurch you feel at the onset of a car's acceleration — that's jerk. The designers of transport systems and rollercoasters manage it daily.

$$ \mathrm{jerk,}\ \mathbf{j} = \frac{\mathrm{d}^3 x}{\mathrm{d}t^3}$$

Here's a visualization of velocity (green line) of a Tesla Model S driving in a parking lot. The coloured stripes show the acceleration (upper plot) and the jerk (lower plot). Notice that the peaks in jerk correspond to changes in acceleration.

The snap you feel at the start of the lurch? That's jounce — the fourth derivative of displacement and the derivative of jerk. Eager et al (2016) wrote up a nice analysis of these quantities for the examples of a trampolinist and roller coaster passenger. Jounce is sometimes called snap... and the next two derivatives are called crackle and pop.

What about momentum?

If the momentum $\mathrm{p}$ of a mass $m$ moving at a velocity $v$ is $m\mathbf{v}$ and $\mathbf{F} = m\mathbf{a}$, what is mass times jerk? According to the physicist Philip Gibbs, who investigated the matter in 1996, it's called yank:

“Momentum equals mass times velocity.
Force equals mass times acceleration.
Yank equals mass times jerk.
Tug equals mass times snap.
Snatch equals mass times crackle.
Shake equals mass times pop.”

There are jokes in there, help yourself.

What about integrating?

Clearly the integral of jerk is acceleration, and that of acceleration is velocity, the integral of which is displacement. But what is the integral of displacement with respect to time? It's called absement, and it's a pretty peculiar quantity to think about. In the same way that an object with linearly increasing displacement has constant velocity and zero acceleration, an object with linearly increasing absement has constant displacement and zero velocity. (Constant absement at zero displacement gives rise to the name 'absement': an absence of displacement.)

Integrating displacement over time might be useful: the area under the displacement curve for a throttle lever could conceivably be proportional to fuel consumption for example. So absement seems to be a potentially useful quantity, measured in metre-seconds.

Integrate absement and you get absity (a play on 'velocity'). Keep going and you get abseleration, abserk, and absounce. Are these useful quantities? I don't think so. A quick look at them all — for the same Tesla S dataset I used before — shows that the loss of detail from multiple cumulative summations makes for rather uninformative transformations:

You can reproduce the figures in this article with the Jupyter Notebook Jerk_jounce_etc.ipynb. Or you can launch a Binder right here in your browser and play with it there, without installing a thing!

References

David Eager et al (2016). Beyond velocity and acceleration: jerk, snap and higher derivatives. Eur. J. Phys. 37 065008. DOI: 10.1088/0143-0807/37/6/065008

Amarashiki (2012). Derivatives of position. The Spectrum of Riemannium blog, retrieved on 4 Mar 2018.

The dataset is from Jerry Jongerius's blog post, The Tesla (Elon Musk) and
New York Times (John Broder) Feud. I have no interest in the 'feud', I just wanted a dataset.

The T-shirt is from Chummy Tees; the image is their copyright and used here under Fair Use terms.

The vintage Snap, Crackle and Pop logo is copyright of Kellogg's and used here under Fair Use terms.

January 31, 2018

This year's social coding events

January 31, 2018/ Matt Hall

If you've always wondered what goes on at our hackathons, make 2018 the year you find out. There'll be plenty of opportunities. We'll be popping up in Salt Lake City, right before the AAPG annual meeting, then again in Copenhagen, before EAGE. We're also running events at the AAPG and EAGE meetings. Later, in the autumn, we'll be making some things happen around SEG too.

If you just want to go sign up right now, head to the Events page. If you want more deets first, read on.

Salt Lake City in May: machine learning and stratigraphy

This will be one of our 'traditional' hackathons. We're looking for 7 or 8 teams of four to come and dream up, then hack on, new ideas in geostatistics and machine learning, especially around the theme of stratigraphy. Not a coder? No worries! Come along to the bootcamp on Friday 18 May and acquire some new skills. Or just show up and be a brainstormer, tester, designer, or presenter.

Thank you to Earth Analytics for sponsoring this event. If you'd like to sponsor it too, check out your options. The bottom line is that these events cost about $20,000 to put on, so we appreciate all the help we can get.

It doesn't stop with the hackathon demos on Sunday. At the AAPG ACE, Matt is part of the team bringing you the Machine Learning Unsession on Wednesday afternoon. If you're interested in the future of computation and geoscience, come along and be heard. It wouldn't be the same without you.

Copenhagen in June: visualization and interaction

After events in Vienna in 2016 and Paris in 2017, we're looking forward to being back in Europe in June. The weekend before the EAGE conference, we'll be hosting the Subsurface Hackathon once again. Partnering with Dell EMC and Total E&P, as last year, we'll be gathering 60 eager geoscientists to explore data visualization, from plotting to virtual reality. I can't wait.

In the EAGE Exhibition itself, we're cooking up something else entirely. The Codeshow is a new kind of conference event, mixing coding tutorials with demos from the hackathon and even some mini-hackathon projects to get you started on your own. It's 100% experimental, just the way we like it.

Anaheim in October: something exciting

We'll be at SEG in Anaheim this year, in the middle of October. No idea what exactly we'll be up to, but there'll be a hackathon for sure (sign up for alerts here). And tacos, lots of those.

You can get tickets to most of these events on the Event page. If you have ideas for future events, or questions about them, drop us a line or leave a comment on this post!

I'll leave you with a short and belated look at the hackathon in Paris last year...

A quick look at the Subsurface Hackathon in Paris, June 2017.

December 20, 2017

2017 retrospective

December 20, 2017/ Matt Hall

Another year pulls on its winter boots and prepares to hurry through the frigid night to wherever old years go to die. From a purely Agile point of view, putting aside all the odious nonsense going on in the world for a moment, it was a good year here at Agile, and I hope it was for you too. If not — if you were unduly affected by any of the manifold calamities in 2017 — then we wish you the best and hope life bounces back with renewed vigour in 2018.

>>>
A reproducible festive card for you, made from a well-
log and a bunch of random numbers. Make your own.

It's that time when I like to self-indulgently glance back over the last twelve months — both on the blog and elsewhere in the Agile universe. Let's start with the blog...

The most popular posts

We should top 52 posts this year (there's just something about the number 52). Some of them do little more than transmit news, events and such, but we try to bring you entertainment and education too. Just no sport or weather. These were our most visited posts in this year:

Machine learning meets seismic interpretation — Evan's highlights from an analytics session at EAGE in Paris.
Machine learning and analytics in geoscience — what happened at the analytics workshop at EAGE in Paris.
Attribution is not permission — all about that time when Elsevier published a horrible book about reservoir analysis.
No more rainbows! — last week's post about rainbow colourmaps broke all our traffic records.
The new reality — how the future is not going to be like the past. Not in petroleum geoscience anyway.

As usual though, the most popular page on the site is k is for wavenumber, the 2012 post that keeps on giving. The other perennials are Well tie workflow, What is anisotropy? and What is SEG Y?

Engagement

We love getting comments! Most people tend to chime in via Twitter or LinkedIn, but we get quite a few on the blog. Indeed, the posts listed above got more than 60 comments between them. The following were the next most commented upon:

Organizing spreadsheets — how to make a spreadsheet more useful by making it more like a database.
Isn't everything on the internet free? — a follow-up post to Attribution is not permission (see above).
More precise SEG-Y and SEG-Y Rev 2 again — a pair of posts looking at the new SEG-Y standard.

Where is everybody?

Houston (about 6.6% of you)
Calgary (4.8%)
London (3.3%)
Perth (1.8%)
Moscow (1.3%)
Stavanger (1.2%)
Rio de Janiero (1.1%)
Kuala Lumpur (1.0%)
Paris (1.0%)
Aberdeen (0.9%)

Work

We're fortunate to have had a good year at Agile. I won't beat our drum too hard, but here's a bit of what we've been up to:

We're doing a machine learning project on GPR interpretation.
We finished a machine learning lithology prediction project for Canstrat.
Matt did more seep and DHI mapping on Canada's Atlantic margin.
It was a good year for hackathons, with over 100 people taking part in 2017.
Agile Libre brought out a new book, 52 More Things... Palaeontology.
We hired awesome data scientist Diego Castañeda (right) full time.

Thank you

Last but far from least — thank you. We appreciate your attention, one of the most precious resources you have. We love writing useful-and/or-interesting stuff, and are lucky to have friends and colleagues who read it and push us to do more, and a bit better than before. It would be a chore if it wasn't for your readership.

All the best for this Yuletide season, and for a peaceful New Year. Cheers!

November 30, 2017

The post of Christmas present

November 30, 2017/ Matt Hall

It's nearly the end of another banner year for humanity, which seems determined as ever to destroy the good things it has achieved. Here's hoping certain world 'leaders' have their Scrooge moments sooner rather than later.

One positive thing we can all do is bring a little more science into the world. And I don't just mean for the scientists you already know. Let's infect everyone we can find! Maybe your niece will one day detect a neutron star collision in the Early Cretaceous, or your small child's intuition for randomness will lead to more breakthroughs in quantum computing.

Build a seismic station

There's surely no better way to discover the wonder of waves than to build a seismometer. There are at least a couple of good options. I built a single component 10 Hz Raspberry Shake myself; it was easy to do and, once hooked up to Ethernet, the device puts itself online and starts streaming data immediately.

The Lego seismometer kit (above right) looks like a slightly cheaper option, and you might want to check that they can definitely ship in time for Xmas, but it's backed by the British Geological Survey so I think it's legit. And it looks very cool indeed.

Everyone needs a globe!

As I mentioned last year, I love globes. We have several at home and more at the office. I don't yet have a Moon globe, however, so I've got my eye on this Replogle edition, NASA approved apparently ("Yup, that's the moon alright!"), and not too pricey at about USD 85.

They seem to be struggling to fill orders, but I can't mention globes without mentioning Little Planet Factory. These beautiful little 3D-printed worlds can be customized in all sorts of ways (clouds or no clouds, relief or smooth, etc), and look awesome in sets.

The good news is that you can pick up LPF's little planets direct from Shapeways, a big 3D printing service provider. They aren't lacquered, but until LPF get back on track, they're the next best thing.

Geology as a lifestyle

Brenda Houston like minerals. A lot. She's made various photomicrographs into wallpaper and fabrics (below, left), and they are really quite awesome. Especially if you always wanted to live inside a geode

OK, some of them might make your house look a bit... Bond-villainy.

If you prefer the more classical imagery of geology, how about this Ancient Dorset duvet cover (USD 120) by De la Beche?

I love this tectonic pewter keychain (below, middle) — featuring articulated fault blocks, and tiny illustrations of various wave modes. And it's under USD 30.

A few months ago, Mark Tingay posted on Twitter about his meteorite-faced watch (below, right). Turns out it's a thing (of course it's a thing) and you can drop substantial sums of money on such space-time trinkets. Like $235,000.

Algorithmic puzzles and stuff

These are spectacular: randomly generated agate-like jigsaw puzzles. Every one is different! Even the shapes of the wooden pieces are generated with maths. They cost about USD 95, and come from Boston-based Nervous System. The same company has lots of other rock- and fossil-inspired stuff, like ammonity jewellery (from about USD 50) and some very cool coasters that look a bit like radiolarians (USD 48 for 4).

There's always books

You can't go wrong with books. These all just came out, and just might appeal to a geoscientist. And if these all sound a bit too much like reading for work, try the Atlas of Beer instead. Click on a book to open its page at Amazon.com.

The posts of Christmas past

If by any chance there aren't enough ideas here, or you are buying for a very large number of geoscientists, you'll have to dredge through the historical listicles of yesteryear — 2011, 2012, 2013, 2014, 2015, or 2016. You'll find everything there, from stocking stuffers to Triceratops skulls.

The images in this post are all someone else's copyright and are used here under fair use guidelines. I'm hoping the owners are cool with people helping them sell stuff!

November 15, 2017

x lines of Python: Let's play golf!

November 15, 2017/ Matt Hall

Normally in the x lines of Python series, I'm trying to do something useful in as few lines of code as possible, but — and this is important — without sacrificing clarity. Code golf, on the other hand, tries solely to minimize the number of characters used, and to heck with clarity. This might, and probably will, result in rather obfuscated code.

So today in x lines, we set x = 1 and see what kind of geophysics we can express. Follow along in the accompanying notebook if you like.

A Ricker wavelet

One of the basic building blocks of signal processing and therefore geophysics, the Ricker wavelet is a compact, pulse-like signal, often employed as a source in simulation of seismic and ground-penetrating radar problems. Here's the equation for the Ricker wavelet:

$$ A = (1-2 \pi^2 f^2 t^2) e^{-\pi^2 f^2 t^2} $$

where $A$ is the amplitude at time $t$, and $f$ is the centre frequency of the wavelet. Here's one way to translate this into Python, more or less as expressed on SubSurfWiki:

import numpy as np 
def ricker(length, dt, f):
    """Ricker wavelet at frequency f Hz, length and dt in seconds.
    """
    t = np.arange(-length/2, length/2, dt)
    y = (1.0 - 2.0*(np.pi**2)*(f**2)*(t**2)) * np.exp(-(np.pi**2)*(f**2)*(t**2))
    return t, y

That is alredy pretty terse at 261 characters, but there are lots of obvious ways, and some non-obvious ways, to reduce it. We can get rid of the docstring (the long comment explaining what the function does) for a start. And use the shortest possible variable names. Then we can exploit the redundancy in the repeated appearance of $\pi^2f^2t^2$... eventually, we get to:

def r(l,d,f):import numpy as n;t=n.arange(-l/2,l/2,d);k=(n.pi*f*t)**2;return t,(1-2*k)/n.exp(k)

This weighs in at just 95 characters. Not a bad reduction from 261, and it's even not too hard to read. In the notebook accompanying this post, I check its output against the version in our geophysics package bruges, and it's legit:

The 95-character Ricker wavelet in green, with the points computed by the function in BRuges.

What else can we do?

In the notebook for this post, I run through some more algorithms for which I have unit-tested examples in bruges:

The Ormsby wavelet, reduced from 1545 characters in bruges to 168 characters. (Not bad, considering it took me 111 characters just to express the mathematical equation in the wiki!)
The 4-term Aki-Richards equation, weighing in at only 179 characters (9.8% of the 1828 characters in bruges).
The exact Zoeppritz solution for a PP reflection, at 386 characters.

To give you some idea of why we don't normally code like this, here's what the Aki–Richards solution looks like:

def r(a,c,e,b,d,f,t):import numpy as n;w=f-e;x=f+e;y=d+c;p=n.pi*t/180;s=n.sin(p);return w/x-(y/a)**2*w/x*s**2+(b-a)/(b+a)/n.cos((p+n.arcsin(b/a*s))/2)**2-(y/a)**2*(2*(d-c)/y)*s**2

A bit hard to debug! But there is still some point to all this — I've found I've had to really understand Python's order of mathematical operations, and find different ways of doing familiar things. Playing code golf also makes you think differently about repetition and redundancy. All good food for developing the programming brain.

Do have a play with the notebook, which you can even run in Microsoft Azure, right in your browser! Give it a try. (You'll need an account to do this. Create one for free.)

Many thanks to Jesper Dramsch and Ari Hartikainen for helping get my head into the right frame of mind for this silliness!

May 31, 2017

Unweaving the rainbow

May 31, 2017/ Matt Hall

Last week at the Canada GeoConvention in Calgary I gave a slightly silly talk on colourmaps with Matteo Niccoli. It was the longest, funnest, and least fruitful piece of research I think I've ever embarked upon. And that's saying something.

Freeing data from figures

It all started at the Unsession we ran at the GeoConvention in 2013. We asked a roomful of geoscientists, 'What are the biggest unsolved problems in petroleum geoscience?'. The list we generated was topped by Free the data, and that one topic alone has inspired several projects, including this one.

Our goal: recover digital data from any pseudocoloured scientific image, without prior knowledge of the colourmap.

I subsequently proferred this challenge at the 2015 Geophysics Hackathon in New Orleans, and a team from Colorado School of Mines took it on. Their first step was to plot a pseudocoloured image in (red, green blue) space, which reveals the colourmap and brings you tantalizingly close to retrieving the data. Or so it seems...

Here's our talk:

March 16, 2017

The quick green forsterite jumped over the lazy dolomite

March 16, 2017/ Matt Hall

The best-known pangram — a sentence containing every letter of the alphabet — is probably

“The quick brown fox jumped over the lazy dog.”

There are lots of others of course. If you write like James Joyce, there are probably an infinite number of others. The point is to be short, and one of the shortest, with only 29 letters (!), even has a geological flavour:

“Sphinx of black quartz, judge my vow.”

I know what you're thinking: Cool, but what's the shortest set of mineral names that uses all the letters of the alphabet? What logophiliac geologist would not wonder the same thing?

Well, we posed this question in the most recent "Riddle me this" segment on the Undersampled Radio podcast. This blog post is my solution.

The set cover problem

Finding pangrams in a list of words amounts to solving the classical set cover problem:

“Given a set of elements $\{U_1, U_2,\ldots , U_n\}$ (called the ‘universe’) and a collection $S$ of $m$ sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of $S$ whose union equals (or ‘covers’) the universe.”

Our universe is the alphabet, and our $S$ is the list of $m$ mineral names. There is a slight twist in our case: the set cover problem wants the smallest subset of $S$ — the fewest members. But in this problem, I suspect there are several 4-word solutions (judging from my experiments), so I want the smallest total size of the members of the subset. That is, I want the fewest total letters in the solution.

The solution

The set cover problem was shown to be NP-complete in 1972. What does this mean? It means that it's easy to tell if you have an answer (do you have all the letters of the alphabet?), but the only way to arrive at a solution is — to oversimplify massively — by brute force. (If you're interested in this stuff, this edition of the BBC's In Our Time is one of the best intros to P vs NP and complexity theory that I know of.)

Anyway, the point is that if we find a better way than brute force to solve this problem, then we need to write a paper about it immediately, claim our prize, collect our turkey, then move to a sunny tax haven with good water and double-digit elevation.

So, this could take a while: there are over 95 billion ways to draw 3 words from my list of 4600 mineral names. If we need 4 minerals, there are 400 trillion combinations... and a quick calculation suggests that my laptop will take a little over 50 years to check all the combinations.

Can't we speed it up a bit?

Brute force is one thing, but we don't need to be brutish about it. Maybe we can think of some strategies to give ourselves a decent chance:

The list is alphabetically sorted, so randomize the list before searching. (I did this.)
Guess some 'useful' minerals and ensure that you get to them. (I did this too, with quartz.)
Check there are at least 26 letters in the candidate words, and (if it's only records we care about) no more than 44, because I have a solution with 45 letters (see below).
We could sort the list into word length order. That way we search shorter things first, so we should get shorter lists (which we want) earlier.
My solution does not depend much on Python's set type. Maybe we could do more with set theory.
Before inspecting the last word in each list, we could make sure it contains at least one letter that's so far missing.

So far, the best solution I've come up with so far has 45 letters, so there's plenty of room for improvement:

'quartz', 'kvanefjeldite', 'abswurmbachite', 'pyroxmangite'

My solution is in this Jupyter Notebook. Please put me out of my misery by improving on it.

March 09, 2017

Two new short courses in Calgary

March 09, 2017/ Matt Hall

We're running two one-day courses in Calgary for the CSPG Spring Education Week. One of them is a bit... weird, so I thought I'd try to explain what we're up to.

Both classes run from 8:30 till 4:00, and both of them cost just CAD 425 for CSPG members.

Get introduced to Python

The first course is Practical programming for geoscientists. Essentially a short version of our 2 to 3 day Creative geocomputing course, we'll take a whirlwind tour through the Python programming language, then spend the afternoon looking at some basic practical projects. It might seem trivial, but leaving with a machine fully loaded with all the tools you'll need, plus long list of resources and learning aids, is worth the price of admission alone.

If you've always wanted to get started with the world's easiest-to-learn programming language, this is the course you've been waiting for!

Hashtag geoscience

This is the weird one. Hashtag geoscience: communicating geoscience in the 21st century. Join me, Evan, Graham Ganssle (my co-host on Undersampled Radio) — and some special guests — for a one-day sci comm special. Writing papers and giving talks is all so 20th century, so let's explore social media, blogging, podcasting, open access, open peer review, and all the other exciting things that are happening in scientific communication today. These tools will not only help you in your job, you'll find new friends, new ideas, and you might even find new work.

I hope a lot of people come to this event. For one, it supports the CSPG (we're not getting paid, we're on expenses only). Secondly, it'll be way more fun with a crowd. Our goal is for everyone to leave burning to write a blog, record a podcast, or at least create a Twitter account.

One of our special guests will be young-and-famous geoscience vlogger Dr Chris. Coincidentally, we just interviewed him on Undersampled Radio. Here's the uncut video version; audio will be on iTunes and Google Play in a couple of days:

March 07, 2017

Unearthing gold in Toronto

March 07, 2017/ Matt Hall

I just got home from Toronto, the mining capital of the world, after an awesome weekend hacking with Diego Castañeda, a recent PhD grad in astrophysics that is working with us) and Anneya Golob (another astrophysicist and Diego's partner). Given how much I bang on about hackathons, it might surprise you to know that this was the first hackathon I have properly participated in, without having to order tacos or run out for more beer every couple of hours.

PArticipants being briefed by one of the problem sponsors on the first evening.

What on earth is Unearthed?

The event (read about it) was part of a global series of hackathons organized by Unearthed Solutions, a deservedly well-funded non-profit based in Australia that is seeking to disrupt every single thing in the natural resources sector. This was their fourteenth event, but their first in Canada. Remarkably, they got 60 or 70 hackers together for the event, which I know from my experience organizing events takes a substantial amount of work. Avid readers might remember us mentioning them before, especially in a guest post by Jelena Markov and Tom Horrocks in 2014.

A key part of Unearthed's strategy is to engage operating companies in the events. Going far beyond mere sponsorship, Barrick Gold sent several mentors to the event, the Chief Innovation Officer Michelle Ash, as well as two judges, Ed Humphries (head of digital transformation) and Iain Allen (head of digital mining). Barrick provided the chellenge themes, as well as data and vivid descriptions of operational challenges. The company was incredibly candid with the participants, and should be applauded for its support of what must have felt like a pretty wild idea.

Team Auger Effect: Diego and Anneya hacking away on Day 2.

What went down?

It's hard to describe a hackathon to someone who hasn't been to one. It's like trying to describe the Grand Canyon, ice climbing, or a 1985 Viña Tondonia Rioja. It's always fun to see and hear the reactions of the judges and other observers that come for the demos in the last hours of the event: disbelief at what small groups of humans can do in a weekend, for little tangible reward. It flies in the face of everything you think you know about creativity, productivity, motivation, and collaboration. Not to mention intellectual property.

As the fifteen (!) teams made their final 5-minute pitches, it was clear that every single one of them had created something unique and useful. The judges seemed genuinely blown away by the level of accomplishment. It's hard to capture the variety, but I'll have a go with a non-comprehensive list. First, there was a challenge around learning from geoscience data:

BGC Engineering, one of the few pro teams and First Place winner, produced an impressive set of tools for scraping and analysing public geoscience data. I think it was a suite of desktop tools rather than a web application.
Mango (winners of the Young Innovators award), Smart Miner (second place overall), Crater Crew, Aureka, and Notifyer and others presented map-based browsers for public mining data, with assistance from varying degrees of machine intelligence.
Auger Effect (me, Diego, and Anneya) built a three-component system consisting of a browser plugin, an AI pipeline, and a social web app, for gathering, geolocating, and organizing data sources from people as they research.

The other challenge was around predictive maintenance:

Tyrelyze, recognizing that two people a year are killed by tyre failures, created a concept for laser scanning haul truck tyres during operations. These guys build laser scanners for core, and definitely knew what they were doing.
Decelerator (winners of the People's Choice award) created a concept for monitoring haul truck driving behaviour, to flag potentially expensive driving habits.
Snapfix.io looked at inventory management for mine equipment maintenance shops.
Arcana, Leo & Zhao, and others looked at various other ways of capturing maintenance and performace data from mining equipment, and used various strategies to try to predict

I will try to write some more about the thing we built... and maybe try to get it working again! The event was immensely fun, and I'm so glad we went. We learned a huge amount about mining too, which was eye-opening. Massive thanks to Unearthed and to Barrick on all fronts. We'll be back!

Brad BEchtold of Cisco (left) presenting the Young Innovator award for under-25s to Team Mango.

The winners of the People's Choice Award, Team Decelerate.

The winners of the contest component of the event, BGC Engineering, with Ed Humphries of Barrick (left).

UPDATE View all the results and submissions from the event.

Wish there was a hackathon just for geoscientists and subsurface engineers?
You're in luck! Join us in Paris for the Subsurface Hackathon — sponsored by Dell EMC, Total E&P, NVIDIA, Teradata, and Sandstone. The theme is machine learning, and registration is open. There's even a bootcamp for anyone who'd like to pick up some skills before the hack.

February 02, 2017

No secret codes: announcing the winners

February 02, 2017/ Matt Hall

The SEG / Agile / Enthought Machine Learning Contest ended on Tuesday at midnight UTC. We set readers of The Leading Edge the challenge of beating the lithology prediction in October's tutorial by Brendon Hall. Forty teams, mostly of 1 or 2 people, entered the contest, submitting several hundred entries between them. Deadlines are so interesting: it took a month to get the first entry, and I received 4 in the second month. Then I got 83 in the last twenty-four hours of the contest.

How it ended

	Team	F1	Algorithm	Language	Solution
1	LA_Team (Mosser, de la Fuente)	0.6388	Boosted trees	Python	Notebook
2	PA Team (PetroAnalytix)	0.6250	Boosted trees	Python	Notebook
3	ispl (Bestagini, Tuparo, Lipari)	0.6231	Boosted trees	Python	Notebook
4	esaTeam (Earth Analytics)	0.6225	Boosted trees	Python	Notebook

The winners are a pair of graduate petroelum engineers, Lukas Mosser (Imperial College, London) and Alfredo de la Fuente (Wolfram Research, Peru). Not coincidentally, they were also one of the more, er, energetic teams — it's say to say that they explored a good deal of the solution space. They were also very much part of the discussion about the contest on GitHub.com and on the Software Underground Slack chat group, aka Swung (you're in there, right?).

I will be sending Raspberry Shakes to the winners, along with some other swag from Enthought and Agile. The second-place team will receive books from SEG (thank you SEG Book Mart!), and the third-place team will have to content themselves with swag. That team, led by Paolo Bestagini of the Politecnico di Milano, deserves special mention — their feature engineering approach was very influential, being used by most of the top-ranking teams.

Coincidentally Gram and I talked to Lukas on Undersampled Radio this week:

Back up a sec, what the heck is a machine learning contest?

To enter, a team had to predict the lithologies in two wells, given wireline logs and other data. They had complete data, including lithologies, in nine other wells — the 'training' data. Teams trained a wide variety of models — from simple nearest neighbour models and support vector machines, to sophisticated deep neural networks and random forests. These met with varying success, with accuracies ranging between about 0.4 and 0.65 (i.e., error rates from 60% to 35%). Here's one of the best realizations from the winning model:

One twist that made the contest especially interesting was that teams could not just submit their predictions — they had to submit the code that made the prediction, in the open, for all their fellow competitors to see. As a result, others were quickly able to adopt successful strategies, and I'm certain the final result was better than it would have been with secret code.

I spent most of yesterday scoring the top entries by generating 100 realizations of the models. This was suggested by the competitors themselves as a way to deal with model variance. This was made a little easier by the fact that all of the top-ranked teams used the same language — Python — and the same type of model: extreme gradient boosted trees. (It's possible that the homogeneity of the top entries was a negative consequence of the open format of the contest... or maybe it just worked better than anything else.)

What now?

There will be more like this. It will have something to do with seismic data. I hope I have something to announce soon.

I (or, preferably, someone else) could write an entire thesis on learnings from this contest. I am busy writing a short article for next month's Leading Edge, so if you're interested in reading more, stay tuned for that. And I'm sure there wil be others.

If you took part in the contest, please leave a comment telling about your experience of it or, better yet, write a blog post somewhere and point us to it.

Blog