No more rainbows!

"the rainbow color map can significantly reduce a person’s accuracy and efficiency"
Borkin et al. (2011)
File under "Aaarrrrrrgghhhhhhh"

File under "Aaarrrrrrgghhhhhhh"

The world has known for at least 20 years that the rainbow colourmap is A Bad Thing, perhaps even A Very Bad Thing. IBM researchers Bernice Rogowitz and Lloyd Treinish — whose research on the subject goes back to the early 90s — wrote their famous article Why should engineers and scientists worry about color? in 1996. Visualization guru Edward Tufte highlighted the problems with it in his 1997 book Visual Explanations (if you haven't read this book, you must buy it immediately). 

This isn't a matter of taste, or opinion. We know — for sure, with science! — that the rainbow is a bad choice for the visualization of data. And yet people use it every day, even in peer-reviewed literature. And — purely anecdotally — it seems to be especially rife in geoscience <citation needed>.

Why are we talking about this? 

The rainbow colourmap suffers from a number of severe problems:

  • It's been linked to inferior image interpretation by professionals (Borkin et al 2011).
  • It introduces ambiguity into the display: are we looking at the data's distribution, or the colourmap's?
  • It introduces non-existent structure into the display — notice the yellow and cyan stripes, which manifest as contours:
  • Colourblind people cannot read the colours properly — I made this protanopic simulation with Coblis
  • It does not have monotonically increasing lightness, so you can't reproduce it in greyscale.
  • There's no implicit order to hues, so it's hard to interpret meaning intuitively.
  • On a practical note, it uses every available colour, leaving you none for annotation.

For all of these reasons, MATLAB and Matplotlib no longer use rainbow-like colourmaps by default. And neither should you.

But I like rainbows!

People tend like things that are bad for them. Chris Jackson (Imperial, see here and here) and Bert Bril (dGB, in Slack) have both expressed an appreciation for rainbow-like colourmaps, or at least an indifference. Bert went so far as to say he doesn't like 'perceptual' colourmaps — those that monotonically and linearly increase in brightness. 

I don't think indifference is allowed. Research with professional image interpreters has shown us that rainbow colourmaps impair the quality of their work. We know that these colours are hard for colourblind people to use. The practical issues of not being readable in greyscale and leaving no colours for annotation are always present. There's just no way we can ask, "Does it matter?" — at least not without offering some evidence that goes beyond mere anecdote.

I think what people like is the colour variance — it acts like contours, highlighting subtle features in the surface. Some of this extra detail is probably noise, but some is certainly signal, maybe even opportunity. 

See what you think of these renderings of the seafloor pick on the Penobscot dataset, offshore Nova Scotia (licensed CC-BY-SA by dGB Earth Sciences and The Government of Nova Scotia). The top row are some rainbow-like colourmaps, all bad. The others are a selection of (more-or-less) perceptually awesome colourmaps. The names under each map are the names of the colourmaps in Python's matplotlib package.

The solution

We know what kind of colourmaps are good for interpretation: those that increase linearly and monotonically in brightness, with no jumps or stripes of luminance. I've linked to lots of places where you can read about these — see the end of the post. You already know one perceptual colourmap: the humble Greyscale. But there are lots of others, so let's start with one of them.

Next, instead of using something that acts like contours, let's try using contours!

I think that's a big improvement already. Some tips for contouring:

  1. Make them thin and black, with opacity at about 0.2 to 0.5. Transparency is essential. 
  2. Choose a fairly small interval; use index contours if there are more than about 10.
  3. Label the contours directly on a large map. State the contour interval in the caption.

Let's try hillshading instead:

Also really nice.

Given that this is a water-bottom horizon, I like the YlGnBu colourmap, which resembles the thing it is modeling. (I think this is also a good basis for selecting a colourmap, by the way, all else being equal.)

I must admit I do find a lot of these perceptual colormaps get too dark at the 'low' end, which can make annotation (or seeing contours) hard. So we will fix that with a function (see the notebook) that generates perceptually linear colourmaps.

Now tell me the spectrum beats a perceptual colourmap...


Let's check that it is indeed colourblind-safe and grey-safe:


There you have it. If you care about your data and your readers, avoid rainbow-like colourmaps in the lab and in publications. Go perceptual!

The Python code and data to generate these images is available on GitHub.

Binder     Better yet, click here to play with the data right in your browser!

What do you think? Are rainbow colourmaps here to stay? 

References and bibliography

Still not convinced?

For goodness sake, just listen to Kristin Thyng for 20 minutes:

Machine learning meets seismic interpretation

Agile has been reverberating inside the machine learning echo chamber this past week at EAGE. The hackathon's theme was machine learning, Monday's workshop was all about machine learning. And Matt was also supposed to be co-chairing the session on Applications of machine learning for seismic interpretation with Victor Aare of Schlumberger, but thanks to a power-cut and subsequent rescheduling, he found himself double-booked so, lucky me, he invited me to sit in his stead. Here are my highlights, from the best seat in the house.

Before I begin, I must mention the ambivalence I feel towards the fact that 5 of the 7 talks featured the open-access F3 dataset. A round of applause is certainly due to dGB Earth Sciences for their long time stewardship of open data. On the other hand, in the sardonic words of my co-chair Victor Aarre, it would have been quite valid if the session was renamed The F3 machine learning session. Is it really the only quality attribute research dataset our industry can muster? Let's do better.

Using seismic texture attributes for salt classification

Ghassan AlRegib ruled the stage throughout the session with not one, not two, but three great talks on behalf of himself and his grad students at Georgia Institute of Technology (rather than being a show of bravado, this was a result of problems with visas). He showed some exciting developments in shallow learning methods for predicting facies in seismic data. In addition to GLCM attributes, he also introduced a couple of new (to me anyway) attributes for salt classification. Namely, textural gradient and a thing he called seismic saliency, a metric modeled after the human visual system describing the 'reaction' between relative objects in a 3D scene. 

Twelve Seismic attributes used for multi-attribute salt-boundary classification. (a) is RMS Amplitude, (B) to (M) are TEXTURAL attributes. See abstract for details. This figure is copyright of Ghassan AlRegib and licensed CC-BY-SA by virtue of being generated from the F3 dataset of dGB and TNO.

Ghassan also won the speakers' lottery, in a way. Due to the previous day's power outage and subsequent reshuffle, the next speaker in the schedule was a no-show. As a result, Ghassan had an extra 20 minutes to answer questions. Now for most speakers that would be a public-speaking nightmare, but Ghassan hosted the onslaught of inquiring minds beautifully. If we hadn't had to move on to the next next talk, I'm sure he could have entertained questions all afternoon. I find it fascinating how unpredictable events like power outages can actually create the conditions for really effective engagement. 

Salt classification without using attributes (using deep learning)

Matt reported on Anders Waldeland's work a year ago, and it was interesting to see how his research has progressed, as he nears the completion of his thesis. 

Anders successfully demonstrated how convolutional neural networks (CNNs) can classify salt bodies in seismic datasets. So, is this a big deal? I think it is. Indeed, Anders's work seems like a breakthough in seismic interpretation, at least of salt bodies. To be clear, I don't think this means that it is time for seismic interpreters to pack up and go home. But maybe we can start looking forward to spending our time doing less tedious things than picking complex salt bodies.  

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset,&nbsp;coloured by elevation. Copyright of A Waldeland, used with permission.

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset, coloured by elevation. Copyright of A Waldeland, used with permission.

He trained a CNN on one manually labeled slice of a 3D cube and used the network to automatically classify the full 3D salt body (on the right in the figure). Conventional algorithms for salt picking, such as that used by AlRegib (see above), typically rely on seismic attributes to define a feature space. This requires professional insight and judgment, and is prone to error and bias. Nicolas Audebert mentioned the same shortcoming in his talk in the workshop Matt wrote about last week. In contrast, the CNN algorithm works directly on the seismic data, learning the most discriminative filters on its own, no attributes needed

Intuition training

Machine learning isn't just useful for computing in the inverse direction such as with inversion, seismic interpretation, and so on. Johannes Amtmann showed us how machine learning can be useful for ranking the performance of different clustering methods using forward models. It was exciting to see: we need to get back into the habit of forward modeling, each and every one of us. Interpreters build synthetics to hone their seismic intuition. It's time to get insanely good at building forward models for machines, to help them hone theirs. 

There were so many fascinating problems being worked on in this session. It was one of the best half-day sessions of technical content I've ever witnessed at a subsurface conference. Thanks and well done to everyone who presented.

x lines of Python: machine learning

You might have noticed that our web address has changed to, reflecting our continuing journey as a company. Links and emails to will redirect for the foreseeable future, but if you have bookmarks or other links, you might want to change them. If you find anything that's broken, we'd love it if you could let us know.

Artificial intelligence in 10 lines of Python? Is this really the world we live in? Yes. Yes it is.

After reminding you about the SEG machine learning contest just before Christmas, I thought I could show you how you train a model in a supervised learning problem, then use it to make predictions on unseen data. So we'll just break a simple contest entry down into ten easy steps (note that you could do this on anything, doesn't have to be this problem). 

A machine learning primer

Before we start, let's review quickly what a machine learning problem looks like, and introduct a bit of jargon. To begin, we have a dataset (e.g. the 'Old' well in the diagram below). This consists of records, called instances. In this problem, each instance is a depth location. Each instance is a feature vector: a row vector comprising attributes or features, which in our case are wireline log values for GR, ILD, and so on. Each feature vector is a row in a matrix we conventionally call \(X\). Associated with each instance is some target label — the thing we want to predict — which is a continuous quantity in a regression problem, discrete in a classification problem. The vector of labels is usually called \(y\). In the problem below, the labels are integers representing 9 different facies.

You can read much more about the dataset I'm using in Brendon Hall's tutorial (The Leading Edge, October 2016).

The ten steps to glory

Well, maybe not glory, but something. A prediction of facies at two wells, based on measurements made at 10 other wells. You can follow along in the notebook, but all the highlights are included here. We start by loading the data into a 'dataframe', which you can think of like a spreadsheet:

Now we specify the features we want to use, and make the matrix \(X\) and label vector \(y\):

  features = ['GR', 'ILD_log10', 'DeltaPHI', 'PHIND', 'PE']
  X = df[features].values
  y = df.Facies.values

Since this dataset is all we have, we'd like to set aside some data to test our model on. The library we're using, scikit-learn, has functions to do this sort of thing; by default, it'll split \(X\) and \(y\) into train and test datasets, with 25% of the data going into the test part:

  X_train, X_test, y_train, y_test = train_test_split(X, y)

Now we're ready to choose a model, instantiate it (with some parameters if we want), and train the model (i.e. 'fit' the data). I am calling the trained model augur, because I like that word.

  from sklearn.ensemble import ExtraTreesClassifier
  model = ExtraTreesClassifier()
  augur =, y_train)

Now we're ready to take the part of the dataset we reserved for validation, X_test, and predict its labels. Then we can compare those with the known labels, y_test, to see how well we did:

  y_pred = augur.predict(X_test)

We can get a quick idea of the quality of prediction with sklearn.metrics.accuracy_score(y_test, y_pred), but it's more interesting to look at the classification report, which shows us the precision and recall for each class, along with their harmonic mean, the F1 score:

  from sklearn.metrics import classification_report
  print(classification_report(y_test, y_pred))

Each row is a facies (facies 1, facies 2, etc.). The support is the number of instances representing that label. The key number here is 0.63 — we can regard this as an expression of the accuracy of our prediction. If that sounds low to you, I encourage you to enter the machine learning contest! If it sounds high, that's because it is — it's much too high. In fact, the instances of our dataset are not independent: they are spatially correlated (in depth). It would be smarter not to remove some random samples for validation, but to reserve entire wells. After all, this is how we typically collect subsurface data: one well at a time.

But now we're getting into the weeds of data science. I'll let you venture in there on your own...

Burning the surface onto the subsurface

Previously, I described a few of the reasons why we don't get a clean ground surface event on land seismic data like we do the water-bottom in marine seismic. In land data, the worst part of the image is right at the surface. But ground level is not just tricky to see, it's impossible to see. Since the vibe truck is on the ground, there's no reflection from that surface. Even if there was some kind of event there, processors apply a magic eraser to the top of the section — the mute — to erase the early arrivals. So it's not possible to see the ground in land data, and you can't pick what isn't there.  

But I still want to know where the ground is. Why can't we slap a ground-level seismic 'reflection' event on the section? 

What you need

We need the ground level, which is in depth of course, in the time domain of the seismic section. To compute this, let's call it \(t_\mathrm{G}\), we need three pieces of information at every trace location: the ground elevation \(G\), the seismic reference datum (SRD) which I'll call \(D\), and the replacement velocity \(V_\mathrm{r}\). 

$$ t_\mathrm{G} = \frac{2 (G - D)}{V_\mathrm{r}} $$

Ground elevation.  If you're lucky, you'll be able to find the ground elevation corresponding to each trace stored in the trace headers. Ground elevation might be located in bytes 41-44 or 45-48 of the trace header, which correspond to the receiver group elevation and the surface elevation of the source, respectively. These should be the same for a stacked trace, but as with any meta-data to do with SEGY, this info could be hiding somewhere else, or missing altogether. And if you're that unlucky, you might have to comb through processing reports for the missing information. If you are even more unlucky (as I was in this example), you won't have any kind of processing report to fall back on and you'll have to concoct something else. In the accompanying Jupyter notebook, I resorted to interpolating a digitized elevation profile from a JPEG plot of the seismic line. So if you're all out of options, you might find refuge in those legacy plots! 

This profile is particularly wonky, because the seismic reference datum (red) is not the same across the profile

This profile is particularly wonky, because the seismic reference datum (red) is not the same across the profile

Seismic reference datum. And to make life yet more complicated, the seismic reference datum is not flat across the profile. It goes downhill and then flattens out (red line below). Don't ask me what the advantages are of processing data to a variable datum, but whatever they are, I hope they offset the disadvantages of all-to-easily mistaking the datum to be flat.

The replacement velocity is given in the sidelabel of the raster image online (shown right). It's 10 000 ft/sec, or 3048 m/s. 

Byte locations 53-56 and 57-60 are the standard trace header placeholders reserved for holding the datum elevation at the receiver group and the datum elevation at source. Again, for a stacked trace, these should be the same value. If these fields are zeros, then check the fields of the Trace Header Extension. If they turn up empty, and if the datum is horizontal, it might be listed in the file's text header. 

Convert elevation to time

By definition, the seismic reference datum is horizontal in the time-domain (red line below). Notice how the ground elevation – in the time domain – plots mostly as negative values (before) time zero. In other words, most of the ground is being cut-off by the top of the section. So, if we want to see it, we need to shift everything down into the field of view. Conceptually, this means adjusting the seismic reference datum so it floats entirely above the ground-level. Computationally, we can achieve this easily enough by padding the top of the data with zeros.

A time-domain representation of the ground-level along the seismic profile. The surface of the earth extends above the start of the seismic data for most of the locations along the profile.&nbsp;

A time-domain representation of the ground-level along the seismic profile. The surface of the earth extends above the start of the seismic data for most of the locations along the profile. 

Make the ground a pickable event

As a final bit of post-processing, we could actually burn the ground-level into the data as a sort of synthetic seismic event. The reason I like this concept is that it alleviates the need to dig up old-processing reports, puzzle over missing header data, or worse, maintain and munge external text files containing elevation information. I say, let's make it self-contained. Let's put it directly into the data so that it can be treated like any other seismic reflection. Why would I do this?

  • You can see where there might be fold, velocity or other issues related to topography.
  • You can immediately see the polarity of the data. 
  • You could use the bandwidth of the data to make the pseudo-reflector, giving a visual hint to the interpreter.
  • Keeping track of amplitude adjustments and phase rotations would be self-documenting and reversible.
  • you could autotrack it to get a topographic map (or just get this from the processor).
  • It looks cool!
Seismic profile with ground level SYNTHETICALLY SLAPPED ON TOP. &nbsp;Bandlimited, of course, so you can Autotrack till your hearts content!

Seismic profile with ground level SYNTHETICALLY SLAPPED ON TOP.  Bandlimited, of course, so you can Autotrack till your hearts content!

I've deliberately constructed a band-limited reflection, opposed to placing a sharp spike at ground-level. The problem with a spike is that it has infinite bandwidth. It contains higher frequencies than the image, so as Carl Reine commented on that last post, that might not play nice with seismic attributes. Also, there's the problem of selecting an amplitude value to assign to the spike: we don't want to introduce amplitudes that are ridiculously out of range of the existing data.  

The whole image

I hereby propose that this synthetic ground level trick adopted as the new standard for any land seismic processing and interpretation. The great thing is, it can be done just as easily by interpreters and seismic data technologists, as by the processing companies that create the rest of the image. I realize we're adding stuff to the data that isn't actually signal. We do non-real things to signals all the time. The question is, do the benefits outweigh the artificiality?

Here's the view of the entire section:

The whole section, ground level included.

The whole section, ground level included.

The details of this exercise can be found in the this Jupyter Notebook.


The seismic is line 36_77_PR from the USGS data repository.

SEG Y rev 2 Data Exchange Format. SEG Technical Standards Committee. Draft 2.0, January, 2015. 

Where is the ground?

This is the upper portion of a land seismic profile in Alaska. Can you pick a horizon where the ground surface is? Have a go at

Pick the Ground surface at the top of the seismic section at .

Pick the Ground surface at the top of the seismic section at

Picking the ground surface on land-based seismic data is not straightforward. Picking the seafloor reflection on marine data, on the other hand, is usually a piece of cake, a warm-up pick. You can often auto-track the whole thing with a few seeds.

Seafloor reflection on Penobscot 3D survey, offshore Nova Scotia. from Matt's tutorial in the April 2016  The Leading Edge ,  The function of interpolation .

Seafloor reflection on Penobscot 3D survey, offshore Nova Scotia. from Matt's tutorial in the April 2016 The Leading Edge, The function of interpolation.

Why aren't interpreters more nervous that we don't know exactly where the surface of the earth is? I'm sure I'm not the only one that would like to have this information while interpreting. Wouldn't it be great if land seismic were more like marine?

Treacherously Jagged TopographY or Near-Surface processing ArtifactS?

Treacherously Jagged TopographY or Near-Surface processing ArtifactS?

If you're new to land-based seismic data, you might notice that there isn't a nice pickable event across the top of the section like we find in marine seismic data. Shot noise at the surface has been muted (deleted) in processing, and the low fold produces an unclean, jagged look at the top of the section. Additionally, the top of the section, time-zero — the seismic reference datum — usually floats somewhere above the land surface — and we can't know where that is unless it can be found in the file header, or looked up in the processing report.

The seismic reference datum, at a two-way time of zero seconds on seismic data,&nbsp;is typically set at mean sea level for offshore data. For land data, it is usually chosen to 'float' above the land surface.

The seismic reference datum, at a two-way time of zero seconds on seismic data, is typically set at mean sea level for offshore data. For land data, it is usually chosen to 'float' above the land surface.

Reframing the question

This challenge is a bit of a trick question. It begs the viewer to recognize that the seemingly simple task of mapping the ground level on a land seismic section is actually a rudimentary velocity modeling or depth conversion exercise in itself. Wouldn't it be nice to have the ground surface expressed as pickable seismic event? Shouldn't we have it always in our images? Baked into our data, so to speak, such that we've always got an unambiguous pick? In the next post, I'll illustrate what I mean and show what's involved in putting it in. 

In the meantime, I challenge you to pick where you think the (currently absent) ground surface is on this profile, so in the next post we can see how well you did.

Pick This again, again

Today we're proud to be launching the latest, all new iteration of Pick This!

Last June I told you about some new features we'd added to our social image interpretation tool. This new release is not really about features, but more about architecture. Late in 2015, we were challenged by BG Group, a UK energy company, to port the app to Amazon's cloud (AWS), so that they could run it in their own environment. Once we'd done that, we brought the data over from Google — where it was hosted — and set up the new public site on AWS. It will be much easier for us to add new features to this version.

One notable feature is that you no longer have to have a Google account to log in! This may have been a show-stopper for some people.

The app has been completely re-written from scratch, so there are a few differences. But fundamentally it's the same as before — you can ask your peers questions about images, and they can draw their answers. For example, Don Herron's "Where's the unconformity?" now has over 450 interpretations!

As we improve the tool over the coming weeks, we'll add ways to filter the results down, to attenuate some of the 'interpretation noise'. It's interesting to think about ways to represent this result — what is the 'true interpretation'? Is it the cloud of all opinions? Is there one answer?

Click here to visit the new site. For now it only plays nicely on a desktop computer (mobile is such a headache, but we will get there!). But you should be able to log in, interpret images, and upload new ones. You can let me know about bugs, or tweet @nowpickthis. If you like it, and I really hope you do, please tell your friends!

A quick reminder about the hackathon in Vienna next month. It will be an intense weekend of learning about programming and building some fun projects. I hope you can come, and if you know any geos in central Europe, please let them know!

Helpful horizons

Ah, the smell of a new seismic interpretation project. All those traces, all that geology — perhaps unseen by humans or indeed any multicellular organism at all since the Triassic. The temptation is to Just Start Interpreting, why, you could have a map by lunchtime tomorrow! But wait. There are some things to do first.

Once I've made sure all is present and correct (see How to QC a seismic volume), I spend a bit of time making some helpful horizons... 

  • The surface. One of the fundamental horizons, the seafloor or ground surface is a must-have. You may have received it from the processor (did you ask for it?) or it may be hidden in the SEG-Y headers — ask whoever received or loaded the data. If not, ground elevation is usually easy enough to get from your friendly GIS guru. If you have to interpret the seafloor, at least it should autotrack quite well.
  • Seafloor multiple model. In marine data, I always make a seafloor multiple model — just multiply the seafloor pick by 2. This will help you make sense of any anomalous reflectors or amplitudes at that two-way time. Maybe make a 3× version too if things look really bad. Remember, the 2× multiple will be reverse polarity.
  • Other multiples. You can model the surface multiple of any strong reflectors with the same arithmetic — but the chances are that any residual multiple energy is quite subtle. You may want to seek help modeling them properly, once you have a 3D velocity model.

A 2D seismic dataset with some of the suggested helpful horizons. Please see the footnote about this dataset. Click the image to enlarge.

  • Water depth markers. I like to make flat horizons* at important water depths, eg shelf edge (usually about 100–200 m), plus 1000 m, 2000 m, etc. This mainly helps to keep track of where you are, and also to think about prospectivity, accessibility, well cost, etc. You only want these to exist in the water, so delete them anywhere they are deeper than the seafloor horizon. Your software should have an easy way to implement a simple model for time t in ms, given depth d in m and velocity** V in m/s, e.g.

$$ t = \frac{2000 d}{V} \approx \frac{2000 d}{1490} \qquad \qquad \mathrm{e.g.}\ \frac{2000 \times 1000}{1490} = 1342\ \mathrm{ms} $$

  • Hydrate stability zone. In marine data and in the Arctic you may want to model the bottom of the gas hydrate stability zone (GHSZ) to help interpret bottom-simulating reflectors, or BSRs. I usually do this by scanning the literature for reports of BSRs in the area, or data on hydrate encounters in wells. In the figure above, I just used the seafloor plus 400 ms. If you want to try for more precision, Bale et al. (2014) provided several models for computing the position of the GHSZ — thank you to Murray Hoggett at Birmingham for that tip.
  • Fold. It's very useful to be able to see seismic fold on a map along with your data, so I like to load fold maps at some strategic depths or, better yet, load the entire fold volume. That way you can check that anomalies (especially semblance) don't have a simple, non-geological explanation. 
  • Gravity and magnetics. These datasets are often readily available. You will have to shift and scale them to some sensible numbers, either at the top or the bottom of your sections. Gravity can be especially useful for interpreting rifted margins. 
  • Important boundaries. Your software may display these for you, but if not, you can fake it. Simply make a horizon that only exists within the polygon — a lease boundary perhaps — by interpolating within a polygon. Make this horizon flat and deep (deeper than the seismic), then merge it with a horizon that is flat and shallow (–1 ms, or anything shallower than the seismic). You should end up with almost-vertical lines at the edges of the feature.
  • Section headings. I like to organize horizons into groups — stratigraphy, attributes, models, markers, etc. I make empty horizons to act only as headings so I can see at a glance what's going on. Whether you need do this, and how you achieve it, depends on your software.

Most of these horizons don't take long to make, and I promise you'll find uses for them throughout the interpretation project. 

If you have other helpful horizon hacks, I'd love to hear about them — put your favourites in the comments. 


* It's not always obvious how to make a flat horizon. A quick way is to take some ubiquitous horizon — the seafloor maybe — and multiply it by zero.

** The velocity of sound in seawater is not a simple subject. If you want to be precise about it, you can try this online calculator, or implement the equations yourself.

The 2D seismic dataset shown is from the Laurentian Basin, offshore Newfoundland. The dataset is copyright of Natural Resources Canada, and subject to the Open Government License – Canada. You can download it from the OpendTect Open Seismic Repository. The cultural boundary and gravity data is fictitious — I made them up for the purposes of illustration.


Bale, Sean, Tiago M. Alves, Gregory F. Moore (2014). Distribution of gas hydrates on continental margins by means of a mathematical envelope: A method applied to the interpretation of 3D seismic data. Geochem. Geophys. Geosyst. 15, 52–68, doi:10.1002/2013GC004938. Note: the equations are in the Supporting Information.

Toolbox wishlist

Earlier this week, the conversation on Software Underground* turned to well-tie software.

Someone was complaining that, despite having several well-tie tools at their disposal, none of them was quite right. I've written about this phenomenon before. We, as a discipline, do not know how to tie wells. I don't mean that you don't know, I know you know, but I bet if you compared the workflows of ten geoscientists, they would all be different. That's why every legacy well in every project has thirty time-depth tables, including at least three endearingly hopeful ones called final, and the one everyone uses, called test.

As a result of all this, the topic of "what tools do people need?" came up. Leo Uieda, a researcher in Brazil, asked:

I just about remembered that I had put up this very question on Tricider some time ago. Tricider is not a website about apple-based beverages, but a site for sharing and voting on ideas. You can start with a few ideas, get votes and comments on them, and even get new ideas. Here's the top idea as of right now: an open-source petrophysics tool.

Do check out the list, and vote or comment if you like. It might help someone find a project to work on, or spark an idea for a new app or even a new company.

Another result of the well-tie software conversation was, "What are the features of the one well-tie app to rule them all?" I'll leave you to stew on that one for a while. Meanwhile, please share your thoughts in the comments.

* Software Underground is an open Slack team. In essence, it's a chat room for geocomputing geeks: software, underground, geddit? It's completely free and open to anyone — pop along to to sign up.

It even has its own radio station!

White magic: calibrating seismic attributes

This post is part of a series on seismic attributes; the previous posts were...

  1. An attribute analysis primer
  2. Attribute analysis and statistics

Last time, I hinted that there might be a often-overlooked step in attribute analysis:

Calibration is a gaping void in many published workflows. How can we move past "that red blob looks like a point bar so I drew a line around it in PowerPoint" to "there's a 70% chance of finding reservoir quality sand at that location"?

Why is this step such a 'gaping void'? A few reasons:

  • It's fun playing with attributes, and you can make hundreds without a second thought. Some of them look pretty interesting, geological even. "That looks geological" is, however, not an attribute calibration technique. You have to prove it.
  • Nobody will be around when we find out the answer. There's a good chance that well will never be drilled, but when it is, you'll be on a different project, in a different company, or have left the industry altogether and be running a kayak rental business in Belize.
  • The bar is rather low. The fact that many published examples of attribute analysis include no proof at all, just a lot of maps with convincing-looking polygons on them, and claims of 'better reservoir quality over here'. 

This is getting discouraging. Let's look at an example. Now, it's hard to present this without seeming over-critical, but I know these gentlemen can handle it, and this was only a magazine article, so we needn't make too much of it. But it illustrates the sort of thing I'm talking about, so here goes.

Quoting from Chopra & Marfurt (AAPG Explorer, April 2014), edited slightly for brevity:

While coherence shows the edges of the channel, it gives little indication of the heterogeneity or uniformity of the channel fill. Notice the clear definition of this channel on the [texture attribute — homogeneity].
We interpret [the] low homogeneity feature [...] to be a point bar in the middle of the incised valley (green arrow). This internal architecture was not delineated by coherence.

A nice story, making two claims:

  1. The attribute incompletely represents the internal architecture of the channel.
  2. The labeled feature on the texture attribute is a point bar.

I know explorers have to be optimists, and geoscience is all about interpretation, but as scientists we must be skeptical optimists. Claims like this are nice hypotheses, but you have to take the cue: go off and prove them. Remember confirmation bias, and Feynman's words:

The first principle is that you must not fool yourself — and you are the easiest person to fool.

The twin powers

Making geological predictions with seismic attribute analysis requires two related workflows:

  1. Forward modeling — the best way to tune your intuition is to make a cartoonish model of the earth (2D, isotropic, homogeneous lithologies) and perform a simplified seismic experiment on it (convolutional, primaries only, noise-free). Then you can compare attribute behaviour to the known model.
  2. Calibration — you are looking for an explicit, quantitative relationship between a physical property you care about (porosity, lithology, fluid type, or whatever) and a seismic attribute. A common way to show this is with a cross-plot of the seismic amplitude against the physical property.

When these foundations are not there, we can be sure that one or more bad things will happen:

  • The relationship produces a lot of type I errors (false positives).
  • It produces a lot of type II error (false negatives).
  • It works at some wells and not at others.
  • You can't reproduce it with a forward model.
  • You can't explain it with physics.

As the industry shrivels and questions — as usual — the need for science and scientists, we have to become more stringent, more skeptical, and more rigorous. Doing anything else feeds the confirmation bias of the non-scientific continent. Because it says, loud and clear: geoscience is black magic.

The image is part of the figure from Chopra, S and K Marfurt (2014). Extracting information from texture attributes. AAPG Explorer, April 2014. It is copyright of the Authors and AAPG.

How to QC a seismic volume

I've had two emails recently about quality checking seismic volumes. And last month, this question popped up on LinkedIn:

We have written before about making a data quality volume for your seismic — a handy way to incorporate uncertainty into risk maps — but these recent questions seem more concerned with checking a new volume for problems.

First things first

Ideally, you'd get to check the volume before delivery (at the processing shop, say), otherwise you might have to actually get it loaded before you can perform your QC. I am assuming you've already been through the processing, so you've seen shot gathers, common-offset gathers, etc. This is all about the stack. Nonetheless, the processor needs to prepare some things:

  • The stack volume, of course, with and without any 'cosmetic' filters (eg fxy, fk).
  • A semblance (coherency, similarity, whatever) volume.
  • A fold volume.
  • Make sure the processor has some software that can rapidly scan the data, plot amplitude histograms, compute a spectrum, pick a horizon, and compute phase. If not, install OpendTect (everyone should have it anyway), or you'll have to load the volume yourself.

There are also some things you can do ahead of time. 

  1. Be part of the processing from the start. You don't want big surprises at this stage. If a few lines got garbled during file creation, no problem. If there's a problem with ground-roll attentuation, you're not going to be very popular.
  2. Make sure you know how the survey was designed — where the corners are, where you would expect live traces to be, and which way the shot and receiver lines went (if it was an orthogonal design). Get maps, take them with you.
  3. Double-check the survey parameters. The initial design was probably changed. The PowerPoint presentation was never updated. The processor probably has the wrong information. General rule with subsurface data: all metadata is probably wrong. Ideally, talk to someone who was involved in the planning of the survey.
  4. You didn't skip (2) did you? I'm serious, double check everything.

Crack open the data

OK, now you are ready for a visit with the processor. Don't fall into the trap of looking at the geology though — it will seduce you (it's always pretty, especially if it's the first time you've seen it). There is work to do first.

  1. Check the cornerpoints of the survey. I like the (0, 0) trace at the SW corner. The inline and crossline numbering should be intuitive and simple. Make sure the survey is the correct way around with respect to north.
  2. Scan through timeslices. All of them. Is the sample interval what you were expecting? Do you reach the maximum time you expected, based on the design? Make sure the traces you expect to be live are live, and the ones you expect to be dead are dead. Check for acquisition footprint. Start with greyscale, then try another colourmap.
  3. Repeat (5) but in a similarity volume (or semblance, coherency, whatever). Look for edges, and geometric shapes. Check again for footprint.
  4. Look through the inlines and crosslines. These usually look OK, because it's what processors tend to focus on.
  5. Repeat (7) but in a similarity volume.

Dive into the details

  1. Check some spectrums. Select some subsets of the data — at least 100 traces and 1000 ms from shallow, deep, north, south, east, west — and check the average spectrums. There should be no conspicuous notches or spikes, which could be signs of all sorts of things from poorly applied filters to reverberation.
  2. Check the amplitude histograms from those same subsets. It should be 32-bit data — accept no less. Check the scaling — the numbers don't mean anything, so you can make them range over whatever you like. Something like ±100 or ±1000 tends to make for convenient scaling of amplitude maps and so on; ±1.0 or less can be fiddly in some software. Check for any departures from an approximately Laplacian (double exponential) distribution: clipping, regular or irregular spikes, or a skewed or off-centre distribution:
  1. Interpret a horizon and check its phase. See Purves (Leading Edge, October 2014) or SubSurfWiki for some advice.
  2. By this time, the fold volume should yield no surprises. If any of the rest of this checklist throws up problems, the fold volume might help troubleshoot.
  3. Check any other products you asked for. If you asked for gathers or angle stacks (you should), check them too.

Last of all, before actual delivery, talk to whoever will be loading the data about what kind of media they prefer, and what kind of file organization. They may also have some preferences for the contents of the SEG-Y file and trace headers. Pass all of this on to the processor. And don't forget to ask for All The Seismic

What about you?

Have I forgotten anything? Are there things you always do to check a new seismic volume? Or if you're really brave, maybe you have some pitfalls or even horror stories to share...