December 15, 2017

No more rainbows!

December 15, 2017/ Matt Hall

"the rainbow color map can significantly reduce a person’s accuracy and efficiency"
Borkin et al. (2011)

The world has known for at least 20 years that the rainbow colourmap is A Bad Thing, perhaps even A Very Bad Thing. IBM researchers Bernice Rogowitz and Lloyd Treinish — whose research on the subject goes back to the early 90s — wrote their famous article Why should engineers and scientists be worried about color? in 1996. Visualization guru Edward Tufte highlighted the problems with it in his 1997 book Visual Explanations (if you haven't read this book, you must buy it immediately).

This isn't a matter of taste, or opinion. We know — for sure, with science! — that the rainbow is a bad choice for the visualization of data. And yet people use it every day, even in peer-reviewed literature. And — purely anecdotally — it seems to be especially rife in geoscience <citation needed>.

Why are we talking about this?

The rainbow colourmap suffers from a number of severe problems:

It's been linked to inferior image interpretation by professionals (Borkin et al 2011).
It introduces ambiguity into the display: are we looking at the data's distribution, or the colourmap's?
It introduces non-existent structure into the display — notice the yellow and cyan stripes, which manifest as contours:

Colourblind people cannot read the colours properly — I made this protanopic simulation with Coblis.

It does not have monotonically increasing lightness, so you can't reproduce it in greyscale.

There's no implicit order to hues, so it's hard to interpret meaning intuitively.
On a practical note, it uses every available colour, leaving you none for annotation.

For all of these reasons, MATLAB and Matplotlib no longer use rainbow-like colourmaps by default. And neither should you.

But I like rainbows!

People tend like things that are bad for them. Chris Jackson (Imperial, see here and here) and Bert Bril (dGB, in Slack) have both expressed an appreciation for rainbow-like colourmaps, or at least an indifference. Bert went so far as to say he doesn't like 'perceptual' colourmaps — those that monotonically and linearly increase in brightness.

I don't think indifference is allowed. Research with professional image interpreters has shown us that rainbow colourmaps impair the quality of their work. We know that these colours are hard for colourblind people to use. The practical issues of not being readable in greyscale and leaving no colours for annotation are always present. There's just no way we can ask, "Does it matter?" — at least not without offering some evidence that goes beyond mere anecdote.

I think what people like is the colour variance — it acts like contours, highlighting subtle features in the surface. Some of this extra detail is probably noise, but some is certainly signal, maybe even opportunity.

See what you think of these renderings of the seafloor pick on the Penobscot dataset, offshore Nova Scotia (licensed CC-BY-SA by dGB Earth Sciences and The Government of Nova Scotia). The top row are some rainbow-like colourmaps, all bad. The others are a selection of (more-or-less) perceptually awesome colourmaps. The names under each map are the names of the colourmaps in Python's matplotlib package.

The solution

We know what kind of colourmaps are good for interpretation: those that increase linearly and monotonically in brightness, with no jumps or stripes of luminance. I've linked to lots of places where you can read about these — see the end of the post. You already know one perceptual colourmap: the humble Greyscale. But there are lots of others, so let's start with one of them.

Next, instead of using something that acts like contours, let's try using contours!

I think that's a big improvement already. Some tips for contouring:

Make them thin and black, with opacity at about 0.2 to 0.5. Transparency is essential.
Choose a fairly small interval; use index contours if there are more than about 10.
Label the contours directly on a large map. State the contour interval in the caption.

Let's try hillshading instead:

Also really nice.

Given that this is a water-bottom horizon, I like the YlGnBu colourmap, which resembles the thing it is modeling. (I think this is also a good basis for selecting a colourmap, by the way, all else being equal.)

I must admit I do find a lot of these perceptual colormaps get too dark at the 'low' end, which can make annotation (or seeing contours) hard. So we will fix that with a function (see the notebook) that generates perceptually linear colourmaps.

Now tell me the spectrum beats a perceptual colourmap...

Let's check that it is indeed colourblind-safe and grey-safe:

There you have it. If you care about your data and your readers, avoid rainbow-like colourmaps in the lab and in publications. Go perceptual!

The Python code and data to generate these images is available on GitHub.

Better yet, click here to play with the data right in your browser!

What do you think? Are rainbow colourmaps here to stay?

References and bibliography

Borkin et al.'s 2011 paper is quite readable.
Read Rogowicz & Treinish's article — it's a classic.
Matteo Niccoli's series of posts, starting with this one.
Peter Kovesi's work on colours is seminal. (And he's a geoscientist!)
Read Ken Moreland on diverging perceptual colourmaps.
Here's viz guru Robert Kosara on why rainbows are rubbish.
You need to know about astronomer Dave Green's work on the cubehelix colourmap.
An extended bibliography from Steve Eddins at MathWorks.
Nathaniel Smith talking about matplotlib's colourmaps at SciPy in 2015.
This post by astronomer Jake VanderPlas is another classic. Listen to this guy.

Still not convinced?

For goodness sake, just listen to Kristin Thyng for 20 minutes:

November 30, 2017

The post of Christmas present

November 30, 2017/ Matt Hall

It's nearly the end of another banner year for humanity, which seems determined as ever to destroy the good things it has achieved. Here's hoping certain world 'leaders' have their Scrooge moments sooner rather than later.

One positive thing we can all do is bring a little more science into the world. And I don't just mean for the scientists you already know. Let's infect everyone we can find! Maybe your niece will one day detect a neutron star collision in the Early Cretaceous, or your small child's intuition for randomness will lead to more breakthroughs in quantum computing.

Build a seismic station

There's surely no better way to discover the wonder of waves than to build a seismometer. There are at least a couple of good options. I built a single component 10 Hz Raspberry Shake myself; it was easy to do and, once hooked up to Ethernet, the device puts itself online and starts streaming data immediately.

The Lego seismometer kit (above right) looks like a slightly cheaper option, and you might want to check that they can definitely ship in time for Xmas, but it's backed by the British Geological Survey so I think it's legit. And it looks very cool indeed.

Everyone needs a globe!

As I mentioned last year, I love globes. We have several at home and more at the office. I don't yet have a Moon globe, however, so I've got my eye on this Replogle edition, NASA approved apparently ("Yup, that's the moon alright!"), and not too pricey at about USD 85.

They seem to be struggling to fill orders, but I can't mention globes without mentioning Little Planet Factory. These beautiful little 3D-printed worlds can be customized in all sorts of ways (clouds or no clouds, relief or smooth, etc), and look awesome in sets.

The good news is that you can pick up LPF's little planets direct from Shapeways, a big 3D printing service provider. They aren't lacquered, but until LPF get back on track, they're the next best thing.

Geology as a lifestyle

Brenda Houston like minerals. A lot. She's made various photomicrographs into wallpaper and fabrics (below, left), and they are really quite awesome. Especially if you always wanted to live inside a geode

OK, some of them might make your house look a bit... Bond-villainy.

If you prefer the more classical imagery of geology, how about this Ancient Dorset duvet cover (USD 120) by De la Beche?

I love this tectonic pewter keychain (below, middle) — featuring articulated fault blocks, and tiny illustrations of various wave modes. And it's under USD 30.

A few months ago, Mark Tingay posted on Twitter about his meteorite-faced watch (below, right). Turns out it's a thing (of course it's a thing) and you can drop substantial sums of money on such space-time trinkets. Like $235,000.

Algorithmic puzzles and stuff

These are spectacular: randomly generated agate-like jigsaw puzzles. Every one is different! Even the shapes of the wooden pieces are generated with maths. They cost about USD 95, and come from Boston-based Nervous System. The same company has lots of other rock- and fossil-inspired stuff, like ammonity jewellery (from about USD 50) and some very cool coasters that look a bit like radiolarians (USD 48 for 4).

There's always books

You can't go wrong with books. These all just came out, and just might appeal to a geoscientist. And if these all sound a bit too much like reading for work, try the Atlas of Beer instead. Click on a book to open its page at Amazon.com.

The posts of Christmas past

If by any chance there aren't enough ideas here, or you are buying for a very large number of geoscientists, you'll have to dredge through the historical listicles of yesteryear — 2011, 2012, 2013, 2014, 2015, or 2016. You'll find everything there, from stocking stuffers to Triceratops skulls.

The images in this post are all someone else's copyright and are used here under fair use guidelines. I'm hoping the owners are cool with people helping them sell stuff!

November 23, 2017

Not getting hacked

November 23, 2017/ Matt Hall

This kind of password is horrible for lots of reasons. The real solution to password madness is a password manager.

The end of the year is a great time to look around at your life and sort stuff out. One of the things you almost certainly need to sort out is your online security. Because if you haven't been hacked already (you probably have), you're just about to be.

Just look at some recent stories from the world of data security:

Yesterday, it emerged that Uber concealed a hack that exposed data of 57 million users
Last month, we learned that when Yahoo said 1 billion accounts had been compromised in August 2013... it was wrong. It was 3 billion. In other words: all of their user accounts.
On 29 July, hackers stole 143 million account details, seriously compromising hundreds of thousands of people.

There are plenty of others; Wired has been keeping track of them — read more here. Or check out Wikipedia's list.

Despite all this, I see hardly anyone using a password manager, and anecdotally I hear that hardly anyone uses two-factor authentication either. This tells me that at least 80% of smart people, inlcuding lots of my friends and relatives, are in daily peril. Oh no!

After reading this post, I hope you do two things:

Start using a password manager. If you only do one thing, do this.
Turn on two-factor authentication for your most vulnerable accounts.

Start using a password manager

Please, right now, download and install LastPass on every device and in every browser you use. It's awesome:

It stores all your passwords! This way, they can all be different, and each one can be highly secure.
It generates secure, random passwords for new accounts you create.
It scores you on the security level of your passwords, and lets you easily change insecure ones.
The free version is awesome, and the premium version is only $2/month.

There are other password managers, of course, but I've used this one for years and it's excellent. Once you're set up, you can start changing passwords that are insecure, or re-used on multiple sites... or which are at Uber, Yahoo, or Equifax.

One surprise from using LastPass is being able to count the number of accounts I have created around the web over the years. I have 473 accounts stored in LastPass! That's 473 places to get hacked... how many places are you exposed?

The one catch: you need a bulletproof key for your password manager. Best advice: use a long pass-phrase instead.

The obligatory password cartoon, by xkcd and licensed CC-BY-NC

Two-factor authentication

Sure, it's belt and braces — but you don't want your security trousers to fall down, right?

Er, anyway, the point is that even with a secure password, your password can still be stolen and your account compromised. But it's much, much harder if you use two-factor authentication, aka 2FA. This requires you to enter a code — from a hardware key or an app, or received via SMS — as well as your password. If you use an app, it introduces still another layer of security, because your phone should be locked.

I use Google's Authenticator app, and I like it. There's a little bit of hassle the first time you set it up, but after that it's plain sailing. I have 2FA turned on for all my 'high risk' accounts: Google, Twitter, Facebook, Apple, AWS, my credit card processor, my accounting software, my bank, my domain name provider, GitHub, and of course LastPass. Indeed, LastPass even lets me specify that logins must originate in Canada.

What else can you do?

There are some other easy things you can do to make yourself less hackable:

Install updates on your phones, tablets, and other computers. Keep browsers and operating systems up to date.
Be on high alert for phishing attempts. Don't follow links to sites like your bank or social media sites — type them into your browser if possible. Be very suspicious of anyone contacting you, especially banks.
Don't use USB sticks. The cloud is much safer — I use Dropbox myself, it's awesome.

For more tips, check out this excellent article from Motherboard on not getting hacked.

November 15, 2017

x lines of Python: Let's play golf!

November 15, 2017/ Matt Hall

Normally in the x lines of Python series, I'm trying to do something useful in as few lines of code as possible, but — and this is important — without sacrificing clarity. Code golf, on the other hand, tries solely to minimize the number of characters used, and to heck with clarity. This might, and probably will, result in rather obfuscated code.

So today in x lines, we set x = 1 and see what kind of geophysics we can express. Follow along in the accompanying notebook if you like.

A Ricker wavelet

One of the basic building blocks of signal processing and therefore geophysics, the Ricker wavelet is a compact, pulse-like signal, often employed as a source in simulation of seismic and ground-penetrating radar problems. Here's the equation for the Ricker wavelet:

$$ A = (1-2 \pi^2 f^2 t^2) e^{-\pi^2 f^2 t^2} $$

where $A$ is the amplitude at time $t$, and $f$ is the centre frequency of the wavelet. Here's one way to translate this into Python, more or less as expressed on SubSurfWiki:

  
    import numpy as np 
def ricker(length, dt, f):
    """Ricker wavelet at frequency f Hz, length and dt in seconds.
    """
    t = np.arange(-length/2, length/2, dt)
    y = (1.0 - 2.0*(np.pi**2)*(f**2)*(t**2)) * np.exp(-(np.pi**2)*(f**2)*(t**2))
    return t, y
  

That is alredy pretty terse at 261 characters, but there are lots of obvious ways, and some non-obvious ways, to reduce it. We can get rid of the docstring (the long comment explaining what the function does) for a start. And use the shortest possible variable names. Then we can exploit the redundancy in the repeated appearance of $\pi^2f^2t^2$... eventually, we get to:

  
    def r(l,d,f):import numpy as n;t=n.arange(-l/2,l/2,d);k=(n.pi*f*t)**2;return t,(1-2*k)/n.exp(k)
  

This weighs in at just 95 characters. Not a bad reduction from 261, and it's even not too hard to read. In the notebook accompanying this post, I check its output against the version in our geophysics package bruges, and it's legit:

The 95-character Ricker wavelet in green, with the points computed by the function in BRuges.

What else can we do?

In the notebook for this post, I run through some more algorithms for which I have unit-tested examples in bruges:

The Ormsby wavelet, reduced from 1545 characters in bruges to 168 characters. (Not bad, considering it took me 111 characters just to express the mathematical equation in the wiki!)
The 4-term Aki-Richards equation, weighing in at only 179 characters (9.8% of the 1828 characters in bruges).
The exact Zoeppritz solution for a PP reflection, at 386 characters.

To give you some idea of why we don't normally code like this, here's what the Aki–Richards solution looks like:

def r(a,c,e,b,d,f,t):import numpy as n;w=f-e;x=f+e;y=d+c;p=n.pi*t/180;s=n.sin(p);return w/x-(y/a)**2*w/x*s**2+(b-a)/(b+a)/n.cos((p+n.arcsin(b/a*s))/2)**2-(y/a)**2*(2*(d-c)/y)*s**2

A bit hard to debug! But there is still some point to all this — I've found I've had to really understand Python's order of mathematical operations, and find different ways of doing familiar things. Playing code golf also makes you think differently about repetition and redundancy. All good food for developing the programming brain.

Do have a play with the notebook, which you can even run in Microsoft Azure, right in your browser! Give it a try. (You'll need an account to do this. Create one for free.)

Many thanks to Jesper Dramsch and Ari Hartikainen for helping get my head into the right frame of mind for this silliness!

November 09, 2017

A new blog, and a new course

November 09, 2017/ Matt Hall

There's a great new geoscience blog on the Internet — I urge you to add it to your blog-reading app or news reader or list of links or whatever it is you use to keep track of these things. It's called Geology and Python, and it contains exactly what you'd expect it to contain!

The author, Bruno Ruas de Pinho, has nine posts up so far, all excellent. The range of topics is quite broad:

In each post, Bruno takes some geoscience challenge — nothing too huge, but the problems aren't trivial either — and then methodically steps through solving the problem in Python. He's clearly got a good quantitative brain, having recently graduated in geological engineering from the Federal University of Pelotas, aka UFPel, Brazil, and he is now available for hire. (He seems to be pretty sharp, so if you're doing anything with computers and geoscience, you should snag him.)

A new course for Calgary

We've run lots of Introduction to Python courses before, usually with the name Creative Geocomputing. Now we're adding a new dimension, combining a crash introduction to Python with a crash introduction to machine learning. It's ambitious, for sure, but the idea is not to turn you into a programmer. We aim to:

Help you set up your computer to run Python, virtual environments, and Jupyter Notebooks.
Get you started with downloading and running other people's packages and notebooks.
Verse you in the basics of Python and machine learning so you can start to explore.
Set you off with ideas and things to figure out for that pet project you've always wanted to code up.
Introduce you to other Calgarians who love playing with code and rocks.

We do all this wielding geoscientific data — it's all well logs and maps and seismic data. There are no silly examples, and we don't shy away from so-called advanced things — what's the point in computers if you can't do some things that are really, really hard to do in your head?

Tickets are on sale now at Eventbrite, it's $750 for 2 days — including all the lunch and code you can eat.

November 07, 2017

The Surmont Supermerge

November 07, 2017/ Matt Hall

In my recent Abstract horror post, I mentioned an interesting paper in passing, Durkin et al. (2017):

Paul R. Durkin, Ron L. Boyd, Stephen M. Hubbard, Albert W. Shultz, Michael D. Blum (2017). Three-Dimensional Reconstruction of Meander-Belt Evolution, Cretaceous Mcmurray Formation, Alberta Foreland Basin, Canada. Journal of Sedimentary Research 87 (10), p 1075–1099. doi: 10.2110/jsr.2017.59

I wanted to write about it, or rather about its dataset, because I spent about 3 years of my life working on the USD 75 million seismic volume featured in the paper. Not just on interpreting it, but also on acquiring and processing the data.

Let's start by feasting our eyes on a horizon slice, plus interpretation, of the Surmont 'Supermerge' 3D seismic volume:

Figure 1 from Durkin et al (2017), showing a stratal slice from 10 ms below the top of the McMurray Formation (left), and its interpretation (right). © 2017, SEPM (Society for Sedimentary Geology) and licensed CC-BY. — Figure 1 from Durkin et al (2017), showing a stratal slice from 10 ms below the top of the McMurray Formation (left), and its interpretation (right). © 2017, SEPM (Society for Sedimentary Geology) and licensed CC-BY.

A decade ago, I was 'geophysics advisor' on Surmont, which is jointly operated by ConocoPhillips Canada, where I worked, and Total E&P Canada. My line manager was a Total employee; his managers were ex-Gulf Canada. It was a fantastic, high-functioning team, and working on this project had a profound effect on me as a geoscientist.

The Surmont bitumen field

The dataset covers most of the Surmont lease, in the giant Athabasca Oil Sands play of northern Alberta, Canada. The Surmont field alone contains something like 25 billions barrels of bitumen in place. It's ridiculously massive — you'd be delighted to find 300 million bbl offshore. Given that it's expensive and carbon-intensive to produce bitumen with today's methods — steam-assisted gravity drainage (SAGD, "sag-dee") in Surmont's case — it's understandable that there's a great deal of debate about producing the oil sands. One factoid: you have to burn about 1 Mscf or 30 m³ of natural gas, costing about USD 10–15, to make enough steam to produce 1 bbl of bitumen.

Detail from Figure 12 from Durkin et al (2017), showing a seismic section through the McMurray Formation. Most of the abandoned channels are filled with mudstone (really a siltstone). The dipping heterolithic strata of the point bars, so obvious in … — Detail from Figure 12 from Durkin et al (2017), showing a seismic section through the McMurray Formation. Most of the abandoned channels are filled with mudstone (really a siltstone). The dipping heterolithic strata of the point bars, so obvious in horizon slices, are quite subtle in section. © 2017, SEPM (Society for Sedimentary Geology) and licensed CC-BY.

The field is a geoscience wonderland. Apart from the 600 km² of beautiful 3D seismic, there are now about 1500 wells, most of which are on the 3D. In places there are more than 20 wells per section (1 sq mile, 2.6 km², 640 acres). Most of the wells have a full suite of logs, including FMI in 2/3 wells and shear sonic as well in many cases, and about 550 wells now have core through the entire reservoir interval — about 65–75 m across most of Surmont. Let that sink in for a minute.

What's so awesome about the seismic?

OK, I'm a bit biased, because I planned the acquisition of several pieces of this survey. There are some challenges to collecting great data at Surmont. The reservoir is only about 500 m below the surface. Much of the pay sand can barely be called 'rock' because it's unconsolidated sand, and the reservoir 'fluid' is a quasi-solid with a viscosity of 1 million cP. The surface has some decent topography, and the near surface is glacial till, with plenty of boulders and gravel-filled channels. There are surface lakes and the area is covered in dense forest. In short, it's a geophysical challenge.

Nonetheless, we did collect great data; here's how:

General information

The ca. 600 km² Supermerge consists of a dozen 3Ds recorded over about a decade starting in 2001.
The northern 60% or so of the dataset was recombined from field records into a single 3D volume, with pre- and post-stack time imaging.
The merge was performed by CGG Veritas, cost nearly $2 million, and took about 18 months.

Geometry

Most of the surveys had a 20 m shot and receiver spacing, giving the volume a 10 m by 10 m natural bin size
The original survey had parallel and coincident shot and receiver lines (Megabin); later surveys were orthogonal.
We varied the line spacing between 80 m and 160 m to get trace density we needed in different areas.

Sources

Some surveys used 125 g dynamite at a depth of 6 m; others the IVI EnviroVibe sweeping 8–230 Hz.
We used an airgun on some of the lakes, but the data was terrible so we stopped doing it.

Receivers

Most of the surveys were recorded into single-point 3C digital MEMS receivers planted on the surface.

Bandwidth

Most of the datasets have data from about 8–10 Hz to about 180–200 Hz (and have a 1 ms sample interval).

The planning of these surveys was quite a process. Because access in the muskeg is limited to 'freeze up' (late December until March), and often curtailed by wildlife concerns (moose and elk rutting), only about 6 weeks of shooting are possible each year. This means you have to plan ahead, then mobilize a fairly large crew with as many channels as possible. After acquisition, each volume spent about 6 months in processing — mostly at Veritas and then CGG Veritas, who did fantastic work on these datasets.

Kudos to ConocoPhillips and Total for letting people work on this dataset. And kudos to Paul Durkin for this fine piece of work, and for making it open access. I'm excited to see it in the open. I hope we see more papers based on Surmont, because it may be the world's finest subsurface dataset. I hope it is released some day, it would have huge impact.

References & bibliography

Hall, M (2007). Cost-effective, fit-for-purpose, lease-wide 3D seismic at Surmont. SEG Development and Production Forum, Edmonton, Canada, July 2007.

Hall, M (2009). Lithofacies prediction from seismic, one step at a time: An example from the McMurray Formation bitumen reservoir at Surmont. Canadian Society of Exploration Geophysicists National Convention, Calgary, Canada, May 2009. Oral paper.

Zhu, X, S Shaw, B Roy, M Hall, M Gurch, D Whitmore and P Anno (2008). Near-surface complexity masquerades as anisotropy. SEG Annual Convention, Las Vegas, USA, November 2008. Oral paper. doi: 10.1190/1.3063976.

Surmont SAGD Performance Review (2016), by ConocoPhillips and Total geoscientists and engineers. Submitted to AER, 258 pp. Available online [PDF] — and well worth looking at.

Trad, D, M Hall, and M Cotra (2008). Reshooting a survey by 5D interpolation. Canadian Society of Exploration Geophysicists National Convention, Calgary, Canada, May 2006. Oral paper.

November 02, 2017

The abstract lead-time problem

November 02, 2017/ Matt Hall

On Tuesday I wrote about the generally low quality of abstracts submitted to conferences. In particular, their vagueness and consequent uninterestingness. Three academics pointed out to me that there's an obvious reason.

Brian Romans (Virginia Tech) —

One issue, among many, with conference abstracts is the lead time between abstract submission and presentation (if accepted). AAPG is particularly bad at this and it is, frankly, ridiculous. The conference is >6 months from now! A couple years ago, when it was in Calgary in June, abstracts were due ~9 months prior. This is absurd. It can lead to what you are calling vague abstracts because researchers are attempting to anticipate some of what they will do. People want to present their latest and greatest, and not just recycle the same-old, which leads to some of this anticipatory language.

Chris Jackson (Imperial College London) and Zane Jobe (Colorado School of Mines) both responded on Twitter —

Also, by having abstract deadlines so far in advance of a conference, it's no surprise the abstract is vague.
— Christopher Jackson (@seis_matters) October 31, 2017

I agree that many abstracts need more meat, but big problem is that submitting an abstract for a conference 8 months away requires "vague" https://t.co/IqFU3MXYaf
— off the shelf edge (@ZaneJobe) November 1, 2017

What's the problem?

As I explained last time, most abstracts aren't fun to read. And people seem to be saying that this overlong lead time is to blame. I think they're probably right. So much of my advice was useless: you can't be precise about non-existent science.

In this light, another problem occurs to me. Writing abstracts months in advance seems to me to potentially fuel confirmation bias, as we encourage people to set out their hypothetical stalls before they've done the work. (I know people tend to do this anyway, but let's not throw more flammable material at it.)

So now I'm worried that we don't just have boring abstracts, we may be doing bad science too.

Why is it this way?

I think the scholarly societies' official line might be, "Propose talks on completed work." But let's face it, that's not going to happen, and thank goodness because it would lead to even more boring conferences. Like PowerPoint-only presentations, committees powered by Robert's Rules, and terrible coffee, year-old research is no longer good enough.

What can we do about it?

If we can't trust abstracts, how can we select who gets to present at a conference? I can't think of a way that doesn't introduce all sorts of bias or other unfairness, or is horribly prone to gaming.

So maybe the problem isn't abstracts, it's talks.

Maybe we don't need to select anything. We just need to let the research community take over the process of telling people about their work, in whatever way they want.

In this alternate reality, the role of the technical society is not to maintain a bunch of clunky processes to 'manage' (but not manage) the community. Instead, their role is to create the conditions for members of the community to dynamically share and progress their work. Research don't need 6 months' lead time, or giant spreadsheets full of abstracts, or broken websites (yes, I'm looking at you, Scholar One). They need an awesome space, whiteboards, Wi-Fi, AV equipment, and good coffee.

In short, maybe this is one of the nudges we need to start talking seriously about unconferences.

October 31, 2017

Abstract horror

October 31, 2017/ Matt Hall

This isn't really a horror story, more of a Grimm fairy tale. Still, I thought it worthy of a Hallowe'eny title.

I've been reviewing abstracts for the 2018 AAPG annual convention. It's fun, because you get to read about new research months ahead of the rest of the world. But it's also not fun because... well, most abstracts aren't that great. I have no idea what proportion of abstracts the conference accepts, but I hope it's not too far above about 50%. (There was some speculation at SEG that there are so many talks now — 18 parallel sessions! — because giving a talk is the only way for many people to get permission to travel to it. I hope this isn't true.)

Some of the abstracts were great; at least 1 in 4 was better than 'good'. So what's wrong with the others? Here are the three main issues I saw:

Lots of abstracts were uninteresting.
Even more of them were vague.
Almost all of them were about unreproducible research.

Let's look at each of these in turn and ask what we can do about it.

Uninteresting

Let's face it, not all research is interesting research. That's OK — it might still be useful or otherwise important. I think you can still write an interesting abstract about it. Here are some tips:

Don't be vague! Details are interesting. See the next section.
Break things up a bit. Use at least 2 paragraphs, maybe 3 or 4. Maybe a list or two.
Use natural, everyday language. Try reading your abstract aloud.
In the first sentence, tell me why I should come to your talk or visit your poster.

Vague

I scribbled 'Vague' on nearly every abstract. In almost every case, either the method or the results, and usually both, were described in woolly language. For example (this is not a direct quote, but paraphrased):

Machine learning was used to predict the reservoir quality in most of the wells in the area, using millions of training examples and getting good results. The inputs were wireline log data from nearby wells.

This is useless information — which algorithm? How did you optimize it? How much training data did you have, and how many data instances did you validate against? How many features did you use? What kind of validation did you do, and what scores did you achieve? Which competing methods did you compare with? Use numbers, be specific:

We used a 9-dimensional support vector machine, implemented in scikit-learn, to model the permeability. With over 3 million training examples from logs in 150 nearby wells in the training set, and 1 million in cross-validation, we achieved an F1 score of 0.75 or more in 18 of the 20 wells.

A roughly 50% increase in the number of words, but an ∞% increase in the information content.

Unreproducible

Maybe I'm being unfair on this one, because I can't really tell if something is going to be reproducible or not from an abstract... or can I?

I'd venture to say that, if the formations are called A, B, C, and D, and the wells are called 1, 2, 3, and 4, then I'm pretty sure I'm not going to find out much about your research. (I had a long debate with someone in Houston recently about whether this sort of thing even qualifies as science.)

So what can you do to make a more useful abstract?

Name your methods and algorithms. Where did they come from? Which other work did you build on?
Name the dataset and tell me where it came from. Don't obfuscate the details — they're what make you interesting! Share as much of the data as you can.
Name the software you're using. If it's open source, it's the least you can do. If it's not open source, it's not reproducible, but I'd still like to know how you're doing what you do.

I realize not everyone is in a position to do 100% reproducible research, but you can aim for something over 50%. If your work really is top secret (<50% reproducible), then you might think twice about sharing your work at conferences, since no-one can really learn anything from you. Ask yourself if your paper is really just an advertisement.

So what does a good abstract look like?

Well, I do like this one-word abstract from Gardner & Knopoff (1974), from the Bulletin of the Seismological Society of America:

Is the sequence of earthquakes in Southern California, with aftershocks removed, Poissonian?

Yes.

A classic, but I'm not sure it would get your paper accepted at a conference. I don't collect awesome abstracts — maybe I should — but here are some papers with great abstracts that caught my interest recently:

Dean, T (2017). The seismic signature of rain. Geophysics 82 (5). The title is great too; what curious person could resist this paper?
Durkin, P et al. (2017) on their beautiful McMurry Fm interpretation in JSR 27 (10). It could arguably be improved by a snappier first sentence that gives punchline of the paper.
Doughty-Jones, G, et al (2017) in AAPG Bulletin 101 (11). There's maybe a bit of an assumption that the reader cares about intraslope minibasins, but the abstract has meat.

Becoming a better abstracter

The number one thing to improve as a writer is probably asking other people — friendly but critical ones — for honest feedback. So start there.

As I mentioned in my post More on brevity way back in March 2011, you should probably read Landes (1966) once every couple of years:

Landes, K (1966). A scrutiny of the abstract II. AAPG Bulletin 50 (9). Available online. (An update to his original 1951 piece, A scrutiny of the abstract, AAPG Bulletin 35, no 7.)

There's also this plea from geophysicist Paul Lowman, to stop turning abstracts into introductions:

Lowman, Paul (1988). The abstract rescrutinized. Geology 16 (12). Available online.

Give those a read — they are very short — and maybe pay extra attention to the next dozen or so abstracts you read. Do they tell you what you need to know? Are they either useful or interesting? Do they paint a vivid picture? Or are they too... abstract?

October 25, 2017

EarthArXiv wants your preprints

October 25, 2017/ Matt Hall

If you're into science, and especially physics, you've heard of arXiv, which has revolutionized how research in physics is shared. BioarXiv, SocArXiv and PaleorXiv followed, among others*.

Well get excited, because today, at last, there is an open preprint server especially for earth science — EarthArXiv has landed!

I could write a long essay about how great this news is, but the best way to get the full story is to listen to two of the founders — Chris Jackson (Imperial College London and fellow University of Manchester alum) and Tom Narock (University of Maryland, Baltimore) — on Undersampled Radio this morning:

Congratulations to Chris and Tom, and everyone involved in EarthArXiv!

Friedrich Hawemann, ETH Zurich, Switzerland
Daniel Ibarra, Earth System Science, Standford University, USA
Sabine Lengger, University of Plymouth, UK
Andelo Pio Rossi, Jacobs University Bremen, Germany
Divyesh Varade, Indian Institute of Technology Kanpur, India
Chris Waigl, University of Alaska Fairbanks, USA

Sara Bosshart, International Water Association, UK
Alodie Bubeck, University of Leicester, UK
Allison Enright, Rutgers - Newark, USA
Jamie Farquharson, Université de Strasbourg, France
Alfonso Fernandez, Universidad de Concepcion, Chile
Stéphane Girardclos, University of Geneva, Switzerland
Surabhi Gupta, UGC, India

Don't underestimate how important this is for earth science. Indeed, there's another new preprint server coming to the earth sciences in 2018, as the AGU — with Wiley! — prepare to launch ESSOAr. Not as a competitor for EarthArXiv (I hope), but as another piece in the rich open-access ecosystem of reproducible geoscience that's developing. (By the way, AAPG, SEG, SPE: you need to support these initiatives. They want to make your content more relevant and accessible!)

It's very, very exciting to see this new piece of infrastructure for open access publishing. I urge you to join in! You can submit all your published work to EarthArXiv — as long as the journal's policy allows it — so you should make sure your research gets into the hands of the people who need it.

I hope every conference from now on has an EarthArXiv Your Papers party.

* Including snarXiv, don't miss that one!

October 23, 2017

x lines of Python: load curves from LAS

October 23, 2017/ Matt Hall

Welcome to the latest x lines of Python post, in which we have a crack at some fundamental subsurface workflows... in as few lines of code as possible. Ideally, x < 10.

We've met curves once before in the series — in the machine learning edition, in which we cheated by loading the data from a CSV file. Today, we're going to get it from an LAS file — the popular standard for wireline log data.

Just as we previously used the pandas library to load CSVs, we're going to save ourselves a lot of bother by using an existing library — lasio by Kent Inverarity. Indeed, we'll go even further by also using Agile's library welly, which uses lasio behind the scenes.

The actual data loading is only 1 line of Python, so we have plenty of extra lines to try something more ambitious. Here's what I go over in the Jupyter notebook that goes with this post:

Load an LAS file with lasio.
Look at its header.
Look at its curve data.
Inspect the curves as a pandas DataFrame.
Load the LAS file with welly.
Look at welly's Curve objects.
Plot part of a curve.
Smooth a curve.
Export a set of curves as a matrix.
BONUS: fix some broken things in the file header.

Each one of those steps is a single line of Python. Together, I think they cover many of the things we'd like to do with well data once we get our hands on it. Have a play with the notebook and explore what you can do.

Next time we'll take things a step further and dive into some seismic petrophysics.

Blog