The norm: kings, crows and taxicabs

How far away is the supermarket from your house? There are lots of ways of answering this question:

  • As the crow flies. This is the green line from \(\mathbf{a}\) to \(\mathbf{b}\) on the map below.

  • The 'city block' driving distance. If you live on a grid of streets, all possible routes are the same length — represented by the orange lines on the map below.

  • In time, not distance. This is usually a more useful answer... but not one we're going to discuss today.

Don't worry about the mathematical notation on this map just yet. The point is that there's more than one way to think about the distance between two points, or indeed any measure of 'size'.

norms.png

Higher dimensions

The map is obviously two-dimensional, but it's fairly easy to conceive of 'size' in any number of dimensions. This is important, because we often deal with more than the 2 dimensions on a map, or even the 3 dimensions of a seismic stack. For example, we think of raw so-called 3D seismic data as having 5 dimensions (x position, y position, offset, time, and azimuth). We might even formulate a machine learning task with a hundred or more dimensions (or 'features').

Why do we care about measuring distances in high dimensions? When we're dealing with data in these high-dimensional spaces, 'distance' is a useful way to measure the similarity between two points. For example, I might want to select those samples that are close to a particular point of interest. Or, from among the points satisfying some constraint, select the one that's closest to the origin.

Definitions and nomenclature

We'll define norms in the context of linear algebra, which is the study of vector spaces (think of multi-dimensional 'data spaces' like the 5D space of seismic data). A norm is a function that assigns a positive scalar size to a vector \(\mathbf{v}\) , with a size of zero reserved for the zero vector (in the Cartesian plane, the zero vector has coordinates (0, 0) and is usually called the origin). Any norm \(\|\mathbf{v}\|\) of this vector satisfies the following conditions:

  1. Absolutely homogenous. The norm of \(\alpha\mathbf{v}\) is equal to \(|\alpha|\) times the norm of \(\mathbf{v}\).

  2. Subadditive. The norm of \( (\mathbf{u} + \mathbf{v}) \) is less than or equal to the norm of \(\mathbf{u}\) plus the norm of \(\mathbf{v}\). In other words, the norm satisfies the triangle inequality.

  3. Positive. The first two conditions imply that the norm is non-negative.

  4. Definite. Only the zero vector has a norm of 0.

Kings, crows and taxicabs

Let's return to the point about lots of ways to define distance. We'll start with the most familiar definition of distance on a map— the Euclidean distance, aka the \(\ell_2\) or \(L_2\) norm (confusingly, sometimes the two is written as a superscript), the 2-norm, or sometimes just 'the norm' (who says maths has too much jargon?). This is the 'as-the-crow-flies distance' on the map above, and we can calculate it using Pythagoras:

$$ \|\mathbf{v}\|_2 = \sqrt{(a_x - b_x)^2 + (a_y - b_y)^2} $$

You can extend this to an arbitrary number of dimensions, just keep adding the squared elementwise differences. We can also calculate the norm of a single vector in n-space, which is really just the distance between the origin and the vector:

$$ \|\mathbf{u}\|_2 = \sqrt{u_1^2 + u_2^2 + \ldots + u_n^2}  = \sqrt{\mathbf{u} \cdot \mathbf{u}} $$

As shown here, the 2-norm of a vector is the square root of its dot product with itself.

So the crow-flies distance is fairly intuitive... what about that awkward city block distance? This is usually referred to as the Manhattan distance, the taxicab distance, the \(\ell_1\) or \(L_1\) norm, or the 1-norm. As you can see on the map, it's just the sum of the absolute distances in each dimension, x and y in our case:

$$ \|\mathbf{v}\|_1 = |a_x - b_x| + |a_y - b_y| $$

What's this magic number 1 all about? It turns out that the distance metric can be generalized as the so-called p-norm, where p can take any positive value up to infinity. The definition of the p-norm is consistent with the two norms we just met:

$$ \| \mathbf{u} \|_p = \left( \sum_{i=1}^n | u_i | ^p \right)^{1/p} $$

[EDIT, May 2021: This generalized version is sometimes called the Minkowski distance, e.g. in the scipy documentation.]

In practice, I've only ever seen p = 1, 2, or infinity (and 0, but we'll get to that). Let's look at the meaning of the \(\infty\)-norm, aka the \(\ell_\infty\) or \(L_\infty\) norm, which is sometimes called the Chebyshev distance or chessboard distance (because it defines the minimum number of moves for a king to any given square):

$$ \|\mathbf{v}\|_\infty = \mathrm{max}(|a_x - b_x|, |a_y - b_y|) $$

In other words, the Chebyshev distance is simply the maximum element in a given vector. In a nutshell, the infinitieth root of the sum of a bunch of numbers raised to the infinitieth power, is the same as the infinitieth root of the largest of those numbers raised to the infinitieth power — because infinity is weird like that.

What about p = 0?

Infinity is weird, but so is zero sometimes. Taking the zeroeth root of a lot of ones doesn't make a lot of sense, so mathematicians often redefine the \(\ell_0\) or \(L_0\) "norm" (not a true norm) as a simple count of the number of non-zero elements in a vector. In other words, we toss out the 0th root, define \(0^0 := 0 \) and do:

$$ \| \mathbf{u} \|_0 = |u_1|^0 + |u_2|^0 + \cdots + |u_n|^0 $$

(Or, if we're thinking about the points \(\mathbf{a}\) and \(\mathbf{b}\) again, just remember that \(\mathbf{v}\) = \(\mathbf{a}\) - \(\mathbf{b}\).)

Computing norms

Let's take a quick look at computing the norm of some vectors in Python:

 
>>> import numpy as np

>>> a = np.array([1, 1]).T
>>> b = np.array([6, 5]).T

>>> L_0 = np.count_nonzero(a - b)
2

>>> L_1 = np.sum(np.abs(a - b))
9

>>> L_2 = np.sqrt((a - b) @ (a - b))
6.4031242374328485

>>> L_inf = np.max(np.abs(a - b))
5

>>> # Using NumPy's `linalg` module:
>>> import numpy.linalg as la
>>> for p in (0, 1, 2, np.inf):
>>>    print("L_{} norm = {}".format(p, la.norm(a - b, p)))
L_0 norm = 2.0
L_1 norm = 9.0
L_2 norm = 6.4031242374328485
L_inf norm = 5.0

What can we do with all this?

So far, so good. But what's the point of these metrics? How can we use them to solve problems? We'll get into that in a future post, so don't go too far!

For now I'll leave you to play with this little interactive demo of the effect of changing p-norms on a Voronoi triangle tiling — it's by Sarah Greer, a geophysics student at UT Austin. 


UPDATE — The next post is The norm and simple solutions, which looks at how these different norms can be used to solve real-world problems.

x lines of Python: Global seismic data

Today we'll look at finding and analysing global seismology data with Python and the wonderful seismology package ObsPy, from Moritz Beyreuther, Lion Krischer, and others originally at the Geophysical Observatory in Munich.

We've used ObsPy before to load SEG-Y files into Python, but that's not its core purpose. These tools are typically used by global seismologists and earthquake scientists, but we're going to download and analyse data from three non-earthquakes:

  1. A curious landslide and tsunami in Greenland.
  2. The recent nuclear bomb test in North Korea.
  3. Hurricane Irma's passage through the Caribbean.

We'll also look at an actual earthquake. This morning there was a very large earthquake off Mexico, killing at least 15 people. It's the first M8+ earthquake anywhere since the Illapel event, Chile, on 16 September 2015.

Only 4 lines?

Once you have ObsPy, only 4 lines of code (not counting imports) are needed to download and plot a seismic trace. Here's how to instantiate the ObsPy client using the IRIS data service, then get 5 minutes of waveform data from the Mudanjiang or MDJ station on the IC network, the New China Digital Seismograph Network, and finally plot it:

from obspy.clients.fdsn import Client
client = Client("IRIS")

from obspy import UTCDateTime
t = UTCDateTime("2017-09-03_03:30:00")
st = client.get_waveforms("IC", "MDJ", "00", "BHZ", t, t + 5*60)
st.plot()  
ObsPy_IC-MDJ.png

Pretty awesome, right? One day getting seismic and well data will be this simple! LOL


Check out the Jupyter Notebook! I cannot get this notebook to run on Azure Notebooks I'm afraid, so the only way to run it is to set up Python and Jupyter (best way: install Canopy or Anaconda) on your machine. I urge you to give it a go, because what could be more fun than playing around with decades of seismic data from all over the world?

90 years of well logs

Today is the 90th anniversary of the first well log. On 5 September 1927, three men from Schlumberger logged the Diefenbach [sic] well 2905 at Dieffenbach-lès-Wœrth in the Pechelbronn heavy oil field in the Alsace region of France.

The site of the Diefenbach 2905 well. © Google, according to terms.

The site of the Diefenbach 2905 well. © Google, according to terms.

 
Pechelbronn_log_plot.png

The geophysical services company Société de Prospection Électrique (Processes Schlumberger), or PROS, had only formed in July 1926 but already had sixteen employees. Headquartered in Paris at 42, rue Saint-Dominique, the company was attempting to turn its resistivity technology to industrial applications, especially mining and petroleum. Having had success with horizontal surface measurements, the Diefenbach well was the first attempt to measure resistivity in a wellbore. PROS went on to become Schlumberger.

The resistivity prospecting system had been designed by the Schlumberger brothers, Conrad (1878–1936, a professor at École des Mines) and Maurice (1884–1953, a mining engineer), over the period from about 1912 until 1923. The task of adapting the technology was given to Henri Doll (1902–1991), Conrad's son-in-law since 1923, and the Alsatian well was to be the first field test of the so-called "electrical coring" method. The client was Deutsche Erdöl Aktiengesellschaft, now DEA of Hamburg, Germany.

As far as I can tell, the well — despite usually being called "the Pechelbronn well" — was located at the site of a monument at the intersection of Route de Wœrth with Rue de Preuschdorf in Dieffenbach-lès-Wœrth, about 3 km west of Merkwiller-Pechelbronn. Henri Doll logged the well with Roger Jost and Charles Scheibli. Using rudimentary equipment, they logged about 145 m of the 488-metre hole, starting at 279 m MD, taking a reading every metre and plotting the log by hand. Yesterday I digitized this log; download it in LAS format here


Pechelbronn_thumbnail.png

The story of what the Schlumberger brothers and Henri Doll achieved is fascinating; I recommend reading Don Hill's brief history (2012) — it's free to read at Wiley. The period of invention that followed the Pechelbronn success was inspiring.

If you're looking at well logs today, take a second to thank Conrad, Maurice, and Henri for their remarkable idea.

PS If you're interested in petroleum history, the AOGHS page This Week is worth a look.


The French television programme Midi en France recorded this segment about the Pechelbronn field in 2014. The narration is in French, "The fields of maize gorge on sunshine, the pumps on petroleum...", but there are some nice pictures to look at.

References and bibliography

Clapp, Frederick G (1932). Oil and gas possibilities of France. AAPG Bulletin 16 (11), 1092–1143. Contains a good history of exploration and production from the Oligocene sands in Pechelbronn, up to about 1931 (the field produced up to 1970). AAPG Datapages.

Delacour, Jacques (2003). Une technique de prospection minière et pétrolière née en Pays d'Auge. SABIX 34, September 2003. Available online.

École des Mines page on Conrad Schlumberger at annales.org.

Hill, DG (2012). Appendix A: Historical Review (Milestone Developments in Petrophysics). In: Buryakovsky, L, Chilingar, GV, Rieke, HH, and Shin, S (2012). Petrophysics: Fundamentals of the Petrophysics of Oil and Gas Reservoirs, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9781118472750.app1. A nice potted history of well logging, including important dates.

Musée Français du Pétrole website, http://www.musee-du-petrole.com/historique/

Pike, B and Duey, R (2002). Logging history rich with innovation. Hart's E&P Magazine. September 2002. Available online. Interesting article, but beware: there are one or two inaccuracies in this article, and I believe the image of the well log is incorrect.

Subsurface Hackathon project round-up, part 2

Following on from Part 1 yesterday, here are the other seven team projects from the hackathon:


Interactive visualization of Water Table heights over many years.

Interactive visualization of Water Table heights over many years.

Water, water everywhere

Water Underground: Martin Bentley (NMMU), Joseph Barraud (Rolls Royce), Rabah Cheknoun (UPPA)

The team built readers for the groundwater data available from dinoloket.nl, both the groundwater levels and the hydrochemistry. They clustered the data by aggregating by month and then looking for similarities in levels in the boreholes and built an open Jupyter notebook.


  

 

 

Seismic from noise

OBSNoise: Fernando Villanueva-Robles (IPGP), Yann Huet (Setec-Lerm), Ngoc Huyen Luu (Ecole Polytechnique), Dorian Bagur (Telecom ParisTech), Jonathan Grandjean (Independent)

The OBSNoise project investigated the application of machine learning to coherently stack ambient noise records collected from ocean bottom seismic (OBS) arrays in order to extract reservoir information. The team's results from synthetic data showed promise. If fully developed, this technology could be a virtually real-time monitoring system of dynamic reservoir properties.


The Killers. Killing It. 

The Killers. Killing It. 

Global geochemical data analytics

The Killers: Alexandre Sache, Violaine Delahaye, Karl Sache (all from Institute Polytechnique UniLaSalle), Côme Arvis, Guillaume Ligner (Ecole Polytechnique)

Two geoscience undergrads and one automotive design student (I know right?) from UniLaSalle hooked up with two data science students from Ecole Polytechnique to interogate the massive GeoRoc database using some clever data analytics tricks and did some novel many-dimensional geochemical classifications.


Team LogFix.

Team LogFix.

Fixing broken well data

LogFix: Guillaume Coffin (Telecom Evolution), Florian Napierala (EISTI), Camille Gimenez (Université Paris-Saclay), Tristan Siméon (Université de Montpellier), Robert Leckenby (Independent)

A truly pristine, calibrated, and corrected petrophysical data is so rare it has a sort of mythical status. Team LogFix used machine learning to identify bad-data zones, repair, QC, and fill-in missing sections. They got an impressive way with the problem, using a dataset from the Athabasca of Canada.


Between the hand-drawn lines

Automagical: Louis Poirier (Independent), Maggie Baber (Independent), Georg Semmler (GiGa infosystems), Björn Wieczoreck (GiGa infosystems), Jonas Kopcsek (GiGa infosystems)

Automagical_Paris_Hackathon.png

You don't need to believe in magic. Team Automagical used machine learning to create 3D geological models from 2D cross-sections sections. They trained a predictive model using a collection of standardized hand-drawn cross-sections from human geoscientists. The model learns how to propagate rocks throughout a 3D scene. Their goal is to be able to generate cross-sections along any direction through the model. The AI learned how to do geologically realistic interpolation on simple structures. What kind of geologic complexity is possible with more input from more cross-sections?


The document on the left contains a log display with a lithology column. It's a 'hit'. The one on the right has no lithlogies and is a 'miss'. 

The document on the left contains a log display with a lithology column. It's a 'hit'. The one on the right has no lithlogies and is a 'miss'.

 

There's rocks in them hills! Hills of paper, that is

Logs on the Rocks: Daniel Stanton (Leeds University), Jack Woolam (Leeds University), Adam Goddard (Leeds University), Henri Blondelle (AgileDD)

If the oil and gas industry is to get more efficient, we better get really good at finding lithology and fluid information in the mountains of paper we've collectively built. Team Logs on the Rocks used CNNs to identify graphical depictions of rock types in a sea of unstructured PDFs and TIFFs. They introduced themselves as a team of non-coders, but these guys were were doing cloud computing on AWS and using NVIDIA's GPUs before the end of the weekend. 


Robot vision for seismic interpretation

It's not our FAULT! Claire Birnie (Leeds University), Carlos Alberto da Costa Filho (Edinburgh University), Matteo Ravasi (Statoil), Filippo Broggini (ETHZ), Gijs Straathof (SGS)

Geologic feature recognition using machine learning. The goal was to assist seismic interpreters in detecting geologic features – faults, folds, traps, etc. – in seismic data . They used Haar cascade classifiers, which are routinely used for identifying faces or kittens or beer bottles in photographs and video streams, specially trained to work on seismic data. They used the awesome OpenCV library to build this technology. At the time of writing, their website appears to be maxed out for the month, so if you're dying to see it, leave them a comment on LinkedIn asking them increase their capacity. And in the meantime, you can check out their project's repo on GitHub.

Kudos for the open source repo, team!


It was thrilling to see such a large range of data and applications. Digital thin-sections, ground water maps, seismic data, well logs, cross-sections, information in unstructured documents, and so on. Thanks to each and every individual that showed up with their expertise and enthusiasm. We're all better off because of it.

A quick reminder that our sponsors are awesome! Please high-five them next time you meet them...

Machine learning meets seismic interpretation

Agile has been reverberating inside the machine learning echo chamber this past week at EAGE. The hackathon's theme was machine learning, Monday's workshop was all about machine learning. And Matt was also supposed to be co-chairing the session on Applications of machine learning for seismic interpretation with Victor Aare of Schlumberger, but thanks to a power-cut and subsequent rescheduling, he found himself double-booked so, lucky me, he invited me to sit in his stead. Here are my highlights, from the best seat in the house.

Before I begin, I must mention the ambivalence I feel towards the fact that 5 of the 7 talks featured the open-access F3 dataset. A round of applause is certainly due to dGB Earth Sciences for their long time stewardship of open data. On the other hand, in the sardonic words of my co-chair Victor Aarre, it would have been quite valid if the session was renamed The F3 machine learning session. Is it really the only quality attribute research dataset our industry can muster? Let's do better.

Using seismic texture attributes for salt classification

Ghassan AlRegib ruled the stage throughout the session with not one, not two, but three great talks on behalf of himself and his grad students at Georgia Institute of Technology (rather than being a show of bravado, this was a result of problems with visas). He showed some exciting developments in shallow learning methods for predicting facies in seismic data. In addition to GLCM attributes, he also introduced a couple of new (to me anyway) attributes for salt classification. Namely, textural gradient and a thing he called seismic saliency, a metric modeled after the human visual system describing the 'reaction' between relative objects in a 3D scene. 

Twelve Seismic attributes used for multi-attribute salt-boundary classification. (a) is RMS Amplitude, (B) to (M) are TEXTURAL attributes. See abstract for details. This figure is copyright of Ghassan AlRegib and licensed CC-BY-SA by virtue of being generated from the F3 dataset of dGB and TNO.

Ghassan also won the speakers' lottery, in a way. Due to the previous day's power outage and subsequent reshuffle, the next speaker in the schedule was a no-show. As a result, Ghassan had an extra 20 minutes to answer questions. Now for most speakers that would be a public-speaking nightmare, but Ghassan hosted the onslaught of inquiring minds beautifully. If we hadn't had to move on to the next next talk, I'm sure he could have entertained questions all afternoon. I find it fascinating how unpredictable events like power outages can actually create the conditions for really effective engagement. 

Salt classification without using attributes (using deep learning)

Matt reported on Anders Waldeland's work a year ago, and it was interesting to see how his research has progressed, as he nears the completion of his thesis. 

Anders successfully demonstrated how convolutional neural networks (CNNs) can classify salt bodies in seismic datasets. So, is this a big deal? I think it is. Indeed, Anders's work seems like a breakthough in seismic interpretation, at least of salt bodies. To be clear, I don't think this means that it is time for seismic interpreters to pack up and go home. But maybe we can start looking forward to spending our time doing less tedious things than picking complex salt bodies.  

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset, coloured by elevation. Copyright of A Waldeland, used with permission.

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset, coloured by elevation. Copyright of A Waldeland, used with permission.

He trained a CNN on one manually labeled slice of a 3D cube and used the network to automatically classify the full 3D salt body (on the right in the figure). Conventional algorithms for salt picking, such as that used by AlRegib (see above), typically rely on seismic attributes to define a feature space. This requires professional insight and judgment, and is prone to error and bias. Nicolas Audebert mentioned the same shortcoming in his talk in the workshop Matt wrote about last week. In contrast, the CNN algorithm works directly on the seismic data, learning the most discriminative filters on its own, no attributes needed

Intuition training

Machine learning isn't just useful for computing in the inverse direction such as with inversion, seismic interpretation, and so on. Johannes Amtmann showed us how machine learning can be useful for ranking the performance of different clustering methods using forward models. It was exciting to see: we need to get back into the habit of forward modeling, each and every one of us. Interpreters build synthetics to hone their seismic intuition. It's time to get insanely good at building forward models for machines, to help them hone theirs. 

There were so many fascinating problems being worked on in this session. It was one of the best half-day sessions of technical content I've ever witnessed at a subsurface conference. Thanks and well done to everyone who presented.


Machine learning and analytics in geoscience

We're at EAGE in Paris. I'm sitting in a corner of the exhibition because the power is out in the main hall, so all the talks for the afternoon have been postponed. The poor EAGE team must be beside themselves, I feel for them. (Note to future event organizers: white boards!)

Yesterday Diego, Evan, and I — along with lots of hackathon participants — were at the Data Science for Geosciences workshop, an all-day machine learning fest. The session was chaired by Cyril Agut (Total), Marianne Cuif-Sjostrand (Total), Florence Delprat-Jannaud (IFPEN), and Noalwenn Dubos-Sallée (IFPEN), and they had assembled a good programme, with quite a bit of variety.

Michel Lutz, Group Data Officer at Total, and adjunct at École des Mines de Saint-Étienne, gave a talk entitled, Data science & application to geosciences: an introduction. It was high-level but thoughtful, and such glimpses into large companies are always interesting. The company seems to have a mature data science strategy, and a well-developed technology stack. Henri Blondelle (AgileDD) asked about open data at the end, and Michel somewhat sidestepped on specifics, but at least conceded that the company could do more in open source code, if not data.

Infrastructure, big data, and IoT

Next we heard a set of talks about the infrastructure aspect of big (really big) data.

Alan Smith of Luchelan told the group about some negative experiences with Hadoop and seismic data (though it didn't seem to me that his problems were insoluble since I know of several projects that use it), and the realization that sometimes you just need fast infrastructure and custom software.

Hadi Jamali-Rad of Shell followed with an IoT story from the field. He had deployed a large number of wireless seismic sensors around a village in Holland, then tested various aspects of the communication system to answer questions like, what's the packet loss rate when you collect data from the nodes? What about from a balloon stationed over the site?

Duncan Irving of Teradata asked, Why aren't we [in geoscience] doing live analytics on 100PB of live data like eBay? His hypothesis is that IT organizations in oil and gas failed to keep up with key developments in data analytics, so now there's a crisis of sorts and we need to change how we handle our processes and culture around big data. 

Machine learning

We shifted gears a bit after lunch. I started with a characteristically meta talk about how I think our community can help ensure that our research and practice in this domain leads to good places as soon as possible. I'll record it and post it soon.

Nicolas Audebert of ONERA/IRISA presented a nice application of a 3D convolutional neural network (CNN) to the segmentation and classification of hyperspectral aerial photography. His images have between about 100 and 400 channels, and he finds that CNNs reduce error rates by up to about 50% (compared to an SVM) on noisy or complex images. 

Henri Blondelle of Agile Data Decisions talked about his experience of the CDA's unstructured data challenge of 2016. About 80% of the dataset is unstructured (e.g. folders of PDFs and TIFFs), and Henri's vision is to transform 80% of that into structured data, using tools like AgileDD's IQC to do OCR and heuristic labeling. 

Irina Emelyanova of CSIRO provided another case study: unsupervised e-facies prediction using various types of clustering, from K-means to some interesting variants of self-organizing maps. It was refreshing to see someone revealing a lot of the details of their implementation.

Jan Limbeck, a research scientist at Shell wrapped up the session with an overview of Shell's activities around big data and machine learning, as they prepare for exabytes. He mentioned the Mauricio Araya-Polo et al. paper on deep learning in seismic shot gathers in the special March issue of The Leading Edge — clearly it's easiest to talk about things they've already published. He also listed a lot of Shell's machine learning projects (frac optimization, knowledge graphs, reservoir simulation, etc), but there's no way to know what state they are in or what their chances of success are. 

As well as all the 9 talks, there were 13 posters, about a third of which were on infrastructure stuff, with the rest providing more case studies. Unfortunately, I didn't get the chance to look at them in any detail, but I appreciated the organizers making time for discussion around the posters. If they'd also allowed more physical space for the discussion it could have been awesome.

Analytics!

After hearing about Mentimeter from Chris Jackson I took the opportunity to try it out on the audience. Here are the results, I think they are fairly self-explanatory... 

I also threw in the mindmap I drew at the end as a sort of summary. The vertical axis represents something like'abstraction' or 'time' (in a workflow sense) and I think each layer depends somewhat on those beneath it. It probably makes sense to no-one but me.

Breakout!

It seems clear that 2017 is the breakout year for machine learning in petroleum geoscience, and in petroleum in general. If your company or institution has not yet gone beyond "watching" or "thinking about" data science and machine learning, then it is falling behind by a little more every day, and it has been for at least a year. Now's the time to choose if you want to be part of what happens next, or a victim of it.

Looking forward to EAGE

Evan, Diego and I are flying to Paris today for the EAGE Conference and Exhibition. It's exciting. We're excited. 

But the excitement starts before the conference. The Subsurface Hackathon is this weekend!

My diary

Even the hackathon excitement starts before the weekend, because tomorrow, Friday, we're running the hacker's bootcamp — a sort of short course appetizer for the hackathon. We have about 25 geoscientists coming to the Booster TOTAL (an event space at TOTAL's La Défense offices) to get some hands-on practice with Python and the latest in machine learning tools. It's especially exciting because we'll also have engineers from NVIDIA on hand to help with the coaching. The idea is to help people hit the ground running when the hackathon starts on Saturday.

After that, on Saturday and Sunday,  it's the hackathon itself. We have no fewer than 60 geoscientists and engineers registered for this breakout event. They're coming to the Booster to work on a wide array of machine learning ideas for the subsurface. It's going to be epic. You can read all about what happens next week, I promise. 

Then on Monday it's the Data Science for Geoscience workshop, at which I'm giving a keynote. Since I'm far from possessing expertise, I'm using it as a chance to get people jazzed about helping make the coming AI revolution in geoscience a positive experience. I'm really looking forward to it.

The conference itself starts on Tuesday. In the afternoon I'm co-chairing a session on machine learning (have you spotted the theme yet?) in seismic interpretation, along with Victor Aare of Schlumberger. It will be awesome to see what kind of progress our community is making in this field — it's fun to imagine what seismic interpretation might be like in a few years. There are so many fascinating problems to work on! Here are the talks in that session:

On Wednesday we'll be taking in some more talks and posters, then in the afternoon I'm reprising my keynote talk at IFPEN, a subsurface research institute in the Bois de Boulogne. I've never been there before, although I have met a few IFP scientists before. I'm looking forward to it very much. 

It all ends for us on Thursday. Evan and Diego fly home and I'm off to Cambridge (the old one in the fens, not the one in Massachusetts) for a few days with family (and bookshops). Until then, expect much blogging!


Going to EAGE?

If you're reading this and would like to meet up with us at Agile or some of the Software Underground crowd — the friendliest bunch of coding geoscientists you could hope for — let's plan to meet at the end of the workshop, at the workshop location. Look for the Software Underground shirts.

Unweaving the rainbow

Last week at the Canada GeoConvention in Calgary I gave a slightly silly talk on colourmaps with Matteo Niccoli. It was the longest, funnest, and least fruitful piece of research I think I've ever embarked upon. And that's saying something.

Freeing data from figures

It all started at the Unsession we ran at the GeoConvention in 2013. We asked a roomful of geoscientists, 'What are the biggest unsolved problems in petroleum geoscience?'. The list we generated was topped by Free the data, and that one topic alone has inspired several projects, including this one. 

Our goal: recover digital data from any pseudocoloured scientific image, without prior knowledge of the colourmap.

I subsequently proferred this challenge at the 2015 Geophysics Hackathon in New Orleans, and a team from Colorado School of Mines took it on. Their first step was to plot a pseudocoloured image in (red, green blue) space, which reveals the colourmap and brings you tantalizingly close to retrieving the data. Or so it seems...

Here's our talk:

GeoConvention highlights

We were in Calgary last week at the Canada GeoConvention 2017. The quality of the talks seemed more variable than usual but, as usual, there were some gems in there too. Here are our highlights from the technical talks...

Filling in gaps

Mauricio Sacchi (University of Alberta) outlined a new reconstruction method for vector field data. In other words, filling in gaps in multi-compononent seismic records. I've got a soft spot for Mauricio's relaxed speaking style and the simplicity with which he presents linear algebra, but there are two other reasons that make this talk worthy of a shout out:

  1. He didn't just show equations in his talk, he used pseudocode to show the algorithm.
  2. He linked to his lab's seismic processing toolkit, SeismicJulia, on GitHub.

I am sure he'd be the first to admit that it is early days for for this library and it is very much under construction. But what isn't? All the more reason to showcase it openly. We all need a lot more of that.

Update on 2017-06-7 13:45 by Evan Bianco: Mauricio, has posted the slides from his talk

Learning about errors

Anton Birukov (University of Calgary & graduate intern at Nexen) gave a great talk in the induced seismicity session. It was a lovely mashing-together of three of our favourite topics: seismology, machine-learning, and uncertainty. Anton is researching how to improve microseismic and earthquake event detection by framing it as a machine-learning classification problem. He's using Monte Carlo methods to compute myriad synthetic seismic events by making small velocity variations, and then using those synthetic events to teach a model how to be more accurate about locating earthquakes.

Figure 2 from Anton Biryukov's abstract. An illustration of the signal classification concept. The signals originating from the locations on the grid (a) are then transformed into a feature space and labeled by the class containing the event or…

Figure 2 from Anton Biryukov's abstract. An illustration of the signal classification concept. The signals originating from the locations on the grid (a) are then transformed into a feature space and labeled by the class containing the event origin. From Biryukov (2017). Event origin depth uncertainty - estimation and mitigation using waveform similarity. Canada GeoConvention, May 2017.

The bright lights of geothermal energy
Matt Hall

Two interesting sessions clashed on Wednesday afternoon. I started off in the Value of Geophysics panel discussion, but left after James Lamb's report from the mysterious Chief Geophysicists' Forum. I had long wondered what went on in that secretive organization; it turns out they mostly worry about how to make important people like your CEO think geophysics is awesome. But the large room was a little dark, and — in keeping with the conference in general — so was the mood.

Feeling a little down, I went along to the Diversification of the Energy Industry session instead. The contrast was abrupt and profound. The bright room was totally packed with a conspicuously young audience numbering well over 100. The mood was hopeful, exuberant even. People were laughing, but not wistfully or ironically. I think I saw a rainbow over the stage.

If you missed this uplifting session but are interested in contributing to Canada's geothermal energy scene, which will certainly need geoscientists and reservoir engineers if it's going to get anywhere, there are plenty of ways to find out more or get involved. Start at cangea.ca and follow your nose.

We'll be writing more about the geothermal scene — and some of the other themes in this post — so stay tuned. 


DID YOU KNOW?

You can get regular updates right to your email, just drop your address in the box:

The fine print: No spam, we promise! We never share email addresses with 3rd parties. Unsubscribe any time with the link in the emails. The service is provided by MailChimp in accordance with Canada's anti-spam regulations.

The Computer History Museum

Mountain View, California, looking northeast over US 101 and San Francisco Bay. The Computer History Museum sits between the Googleplex and NASA Ames. Hangar 1, the giant airship hangar, is visible on the right of the image. Imagery and map data © Google, Landsat/Copernicus.

A few days ago I was lucky enough to have a client meeting in Santa Clara, California. I had not been to Silicon Valley before, and it was more than a little exciting to drive down US Route 101 past the offices of Google, Oracle and Amazon and basically every other tech company, marvelling at Intel’s factory and the hangars at NASA Ames, and seeing signs to places like Stanford, Mountain View, and Menlo Park.

I had a spare day before I flew home, and decided to visit Stanford’s legendary geophysics department, where there was a lecture that day. With an hour or so to kill, I thought I’d take in the Computer History Museum on the way… I never made it to Stanford.

The museum

The Computer History Museum was founded in 1996, building on an ambition of über-geek Gordon Bell. It sits in the heart of Mountain View, surrounded by the Googleplex, NASA Ames, and Microsoft. It’s a modern, airy building with the museum and a small café downstairs, and meeting facilities on the upper floor. It turns out to be an easy place to burn four hours.

I saw a lot of computers that day. You can see them too because much of the collection is in the online catalog. A few things that stood out for me were:

No seismic

I had been hoping to read more about the early days of Texas Instruments, because it was spun out of a seismic company, Geophysical Service or GSI, and at least some of their early integrated circuit research was driven by the needs of seismic imaging. But I was surprised not to find a single mention of seismic processing in the place. We should help them fix this!