Survival of the fittest or overcrowding?

If you’ve been involved in drilling boreholes in your career, you’ll be familiar with desurvey. Desurvey is the process of converting a directional well survey — which provides measured depth, inclination and azimuth points from the borehole — into a position log. The position log is an arbitrarily finely sampled set of (x, y, z) points along the wellbore. We need this to convert from measured depth (distance along the borehole path) to depth in the earth, which is usually given relative to mean sea-level.

I hunted around recently and found no fewer than nine Python packages that contain desurvey code. They are of various vintages and licences:

Other larger tools that contain desurvey code but not as re-usable

This is amazing — look at all the activity in the last 2 years or so! I think this is a strong sign that we've hit a new era in geocomputing. Fifteen years ago a big chunk of the open source geoscience stuff out there was the ten-or-so seismic processing libraries. Then about 7 or 8 years ago there was a proliferation of geophysical modeling and inversion tools. Now perhaps we're seeing the geological and wells communities entering the same stage of evolution. This is encouraging.

What now?

But there’s a problem here too. — this is a small community, with limited resources. Can we afford this diversity? I accept that there might be a sort of Cambrian explosion event that has to happen with new technology. But how can we ensure that it results in an advantageous “survival of the fittest” ecology, and avoid it becoming just another way for the community to spread itself too thinly? To put it another way, how can we accelerate the evolutionary process, so that we don’t drain the ecosystem of valuable resources?

There’s also the problem of how to know which of these tools is the fittest. As far as I’m aware, there are no well trajectory benchmarks to compare against. Some of the libraries I listed above do not have rigorous tests, certainly they all have bugs (it’s software) — how can we ensure quality emerges from this amazing pool of technology?

I don’t know, but I think this is a soluble problem. We’ve done the hard part! Nine times :D


swung_round_135px.png

There was another sign of subsurface technology maturity in the TRANSFORM conference this year. Several of the tutorials and hackathon projects were focused on integrating tools like GemPy, Devito, segyio, and the lasio/welly/striplog family. This is exciting to see — 2021 could be an important year in the history of the open subsurface Python stack. Stay tuned!

A (useless) map of geo-mathematics

Most scientific problems involve at least a bit of maths, even if it’s just adding things up or finding averages.

But some problems require quite a bit of maths, like solving an equation, or throwing vectors around, or even a Fourier transform or two. A lot of people switch off at this point.

Yet other problems require a lot of maths. Maybe we need a finite difference model, a volume integral, or a deep neural network. Most of us back away from the problem at this point and look for a collaborator with a lot of equations on their whiteboard.

But it’s pretty hard to find new collaborators right now. And what if you run into these problems a lot? Maybe you need to be the one with the equationy whiteboard!

What then? Where do you start? We need a map!

A roadmap for learning…

…is what I set out to draw. I failed. Possibly there exists a map, with START HERE in one corner and a whiteboard full of equations in the other. But I doubt it.

I ended up drawing this:

math_map_2400px.png

It was fun to draw, but I highly doubt that it’s any practical use. The conclusion I came to is: there is no path. In fact, this artificially flattened projection of the n-dimensional mathiverse — no doubt reflecting my own weak grasp on half of these topics — is probably a unique, personal perspective. It reflects my interests and my nonlinear journey from A-level calculus (which I loved) to undergraduate maths (which I found very hard) to… whatever half-truths I know today.

But I want to learn, where do I start?

So if there is no path, what can you do to improve? Where should you start? How can you learn? Easy: follow your nose. Start with a project — something that interests you, something you’ll stick with. Maybe it’s a spreadsheet you have, or a plot you want to make. When you get to the maths, as you inevitably will, dig in. Read around. Google things. Get a whiteboard.

As an example, when I worked on the tricky (and unsolved!) task of recovering data from pseudocolour images, my maths journey looked something like this:

Images ➡ clustering ➡ RGB vectors ➡ distance metrics ➡ k-d trees ➡ graphs ➡ Hamiltonian pathsTSP solvers

Admittedly, there’s some computer science in there too, but hey, this is applied maths.

As I described recently in Illuminated equations, there are several ways to serve, and consume, mathematical ideas: words, pictures, plots, symbols, annotations, and code (and probably some others). Seek out sources that give you three or more of these things. For me, being able to run some code makes a huge difference. Indeed, learning Python has directly led to me reading entire books on graph theory, linear algebra, deep learning, Fourier transforms, and all sorts of other things.

Well, learning Python and watching Numberphile.

I think it’s a myth that you have to be good at maths to learn to code. Instead, I think learning to code can — if you want — help make you good, or at least better, at maths. By giving you a way to try things without fully knowing what you are doing (after all, np.fft.fft(x) is pretty easy to type!), code gives you a way to peek at the answer. If you do it often enough, and follow up with some reading, understanding follows eventually.

Which programming language should you learn first?

The question I get asked most often is:

I want to learn to code, where should I start?

To which there’s really no perfect answer. It depends on a lot of things… Why do you want to learn to program? What domain are you in? Have you tried before? Do you like computers? Do your colleagues use anything in particular?

Undeterred by the futility, and inspired by an awesome blog post on freecodecamp.org, which advises you to learn JavaScript (not terrible advice), I thought I’d try to answer the question — for scientists. I adapted a rather old decision tree in that blog post to the specific needs of scientists. Here is the latest version:

which.png

Yes, it does have a lot of Python on it. Yes, I am biased.

These sorts of things are necessarily rather one-dimensional. In an effort to give general advice, all the interesting corners are sanded down. For instance, there is definitely some domain specificity to languages. In the subsurface domain, a lot of geophysicists learned their craft in MATLAB, but today are excited about Julia. I think environmental folks are more into R. Geologists mostly still like coloured pencils best. And of course reservoir engineers mastered VBA years ago. Clearly, if you’re learning to code to start a postgraduate degree, you should probably find out what language others in your lab are using before you crack into that old copy of FORTRAN in a Weekend.*

Anyway, this decision tree thing provoked quite a bit of discussion on Software Underground and Twitter. Some people felt challenged, although my purpose was to suggest a starting point for people, not to say “Never touch Java” (although, seriously, never touch Java). It’s natural — learning to master a language takes years and people are sensitive to perceived criticism of how they spent their time. But this misses the point a bit — programming is really just about getting things done, preferably in an open language (<cough> not MATLAB). So what’s the quickest path for a new programmer to start getting things done?

I appreciated this thoughtful comment from Kris Kuhlman:

I think it worked out more or less that way for me. I learned a bit of BASIC as a 12-year-old, and knew enough assembly to crash a BBC Micro. Then I learned awk in 1993, and used it for basically everything — including many things it certainly was not designed for. I tried and failed to learn Java in 2002, instead picking up MATLAB… which led to Python in about 2008. I was a slow learner though; it took years to be convinced that I needed NumPy. (Yes, you can load seismic as a list of lists.)

In the end, you need several tools in your belt. Several people pointed out that SQL (a so-called ‘domain specific language’ rather than a full-blown programming language) is incredibly useful to know. I think you could say the same for HTML and maybe even XML — or perhaps JSON these days. Then again, maybe these stretch the definition of ‘programming language’ a bit too far. Besides, if you write code, you’ll meet them eventually.

In the end, the point is to get things done. Every language on that tree will enable you to get things done. (Admittedly, Scratch, Processing, ChucK have rather narrow domains.) Fortran has been around for 70 years (not a typo) and is still in the top 20 languages. So don’t sweat it — if Kris is right, you’ll need to learn 2 languages before one sticks anyway.


* There is no such book, lol.

A forensic audit of seismic data

The SEG-Y “standard” is famously non-standard. (Those air quotes are actually part of the “standard”.)

For example, the inline and crossline location of a given trace — two things that you must have in order to load the data vaguely properly — are “recommended” (remember, it’s a “standard”) to be given in the trace’s header, at byte locations 189 and 193 respectively. Indeed, they might well be there. Or 1 and 5 (well, 5 or 9). Or somewhere else. Or not there at all.

Don Robinson at Resolve told me recently that he has seen more than 180 byte-location combinations, and he said another service company had seen more than 300.

All this can make loading seismic data really, really annoying.

I’d like to propose that the community performs a kind of forensic audit of SEG-Y files. I have 5 main questions:

  1. What proportion of files claim to be Rev 0, Rev 1, and Rev 2? And what standard are they actually? (If any!)

  2. What proportion of files in the wild use IBM vs IEEE floats? What about integers?

  3. What proportion of files in the wild use little-endian vs big-endian byte order. (Please tell me there's no middle-endian data out there!)

  4. What proportion of files in the wild use EBCDIC vs ASCII encoded textual file headers? (Again, I would hope there are no other encodings in use, but I bet there are.)

  5. What proportion of files use the Strongly recommended and Recommended byte locations for trace numbers, sample counts, sample interval, coordinates and inline–crossline numbers?

For each of these <things> it would also be interesting to know:

  • How does <thing> vary with the other things? That is, what's the cross-correlation matrix?

  • How does <thing> vary with the age of the file? Is there a temporal trend?

  • How does <thing> vary with the provenance of the file? What's the geographic trend? (For example, Don told me that the prevalence of PC-based interpretation packages in Canada led to widespread early adoption of IEEE floats and little-endian byte order; indeed, he says that 90% of the SEG-Y he sees in the wild is still IBM ormatted floats!)

While we’re at it, I'd also like in some more esoteric things:

  • How many files have cornerpoints in the text header, and/or trace locations in trace headers?

  • How many files have an unambiguous CRS in the text header?

  • How many files have information about the processing sequence in the text header? (E.g. imaging details, filters, etc.)

  • How many files have incorrect information in the headers (e.g. locations, sample interval, byte format, etc)

  • How many processors bother putting useful things like elevation, filters, sweeps, fold at target, etc, in the trace headers?

I don’t quite know how such a survey would happen. Most of these things are obviously detectable from the files themselves. Perhaps some of the many seismic data management systems already track these things. Or maybe you’re a data manager and you have some anecdotal data you can share.

What do you think? I’d love to hear your thoughts in the comments. Maybe there’s a good hackathon project here!

Transformation in 2021

SU_square_rnd.png

Virtual confererences have become — for now — the norm. In many ways they are far better than traditional conferences: accessible to all, inexpensive to organize and attend, asynchronous, recorded, and no-one has to fly 5,000 km to deliver a PowerPoint. In other ways, they fall short, for example as a way to meet new collaborators or socialize with old ones. As face-to-face meetings become a possibility again this summer, smart organizations will figure out ways to get the best of both worlds.

The Software Underground is continuing its exploration of virtual events next month with the latest edition of the TRANSFORM festival of the digital subsurface. In broad strokes, here’s what’s on offer:

  • The Subsurface Hackathon, starting on 16 April — all are welcome, including those new to programming.

  • 20 free & awesome tutorials, covering topics from Python to R, geothermal wells to seismic, and even reservoir simulation! And of course there’s a bit of machine learning and physics-based modeling in there too. Look forward to content from scientists in North & South America, Norway, Nigeria, and New Zealand.

  • Lightning talks from 24 members of the community — would you like to do one?

  • Birds of a Feather community meet-ups, a special Xeek challenge, and other special events.

  • The Annual General Meeting of the Software Underground, where we’ll adopt our by-law and appoint the board.

gather-town-rosay.png

We’ll even try to get at that tricky “hang out with other scientists” component, because we will have a virtual Gather.town world in which to hang out and hack, chat, or watch the livestreams.

If last year’s event is anything to go by, we can expect fantastic tutorial content, innovative hackathon projects, and great conversation between at least 750 digital geoscientists and engineers. (If you missed TRANSFORM 2020, don’t worry — all the content from last year is online and free forever, so it’s not too late to take part! Check it out.)


Registering for TRANSFORM

Registration is free, or pay-what-you-like. In other words, if you have funding or expenses for conferences and training, there’s an option to pay a small amount. But anyone can attend TRANSFORM free of charge. Thank you to the event sponsors, Studio X, for making this possible. (I will write about Studio X at a later date — they are doing some really cool things in the digital subsurface.)

 
 

To register for any part of TRANSFORM — even if you just want to come to the hackathon or a tutorial — click this button and complete the process on the Software Underground website. It’s a ‘pay what you like’ event, so there are 3 registration options with different prices — these are just different donation amounts. They don’t change anything about your registration.

I hope we see you at TRANSFORM. In the meantime, please jump into the Software Underground Slack and get involved in the conversations there. (You can also catch up on recent Software Underground highlights in the new series of blog posts.)

All the wedges

Wedges are a staple of the seismic interpreter’s diet. These simple little models show at a glance how two seismic reflections interact with each other when a rock layer thins down to — and below — the resolution limit of the data. We can also easily study how the interaction changes as we vary the wavelet’s properties, especially its frequency content.

Here’s how to make and plot a basic wedge model with Python in the latest version of bruges, v0.4.2:

 
import bruges as bg
import matplotlib.pyplot as plt

wedge, *_ = bg.models.wedge()

plt.imshow(wedge)
wedge_basic.png

It really is that simple! This model is rather simplistic though: it contains no stratigraphy, and the numerical content of the 2D array is just a bunch of integers. Let’s instead make a P-wave velocity model, with an internal bed of faster rock inside the wedge:

 
strat = [2.35, (2.40, 2.50, 2.40), 2.65]
wedge, *_ = bg.models.wedge(strat=strat)

plt.imshow(wedge)
plt.colorbar()
wedge_layer.png

We can also choose to make the internal geometry top- or bottom-conformant, mimicking onlap or an unconformity, respectively.

 
strat = strat=[0, 7*[1,2], 3]
wedge, *_ = bg.models.wedge(strat=strat,
                            conformance='base'
                           )

plt.imshow(wedge)
wedge_unconformity.png

The killer feature of this new function might be using a log to make the stratigraphy, rather than just a few beds. This is straightforward to do with welly, because it makes selecting depth intervals and resampling a bit easier:

 
import welly

gr = welly.Well.from_las('R-39.las').data['GR']
log_above = gr.to_basis(stop=2620, step=1.0)
log_wedge = gr.to_basis(start=2620, stop=2720, step=1.0)
log_below = gr.to_basis(start=2720, step=1.0)

strat = (log_above, log_wedge, log_below)
depth, width = (100, 400, 100), (40, 200, 40)
wedge, top, base, ref = bg.models.wedge(depth=depth,
                                        width=width,
                                        strat=strat,
                                        thickness=(0, 1.5)
                                       )

plt.figure(figsize=(15, 6))
plt.imshow(wedge, aspect='auto')
plt.axvline(ref, color='k', ls='--')
plt.plot(top, 'r', lw=2)
plt.plot(base, 'r', lw=2)
wedge_log.png

Notice that the function returns to you the top and base of the wedgy part, as well as the position of the ‘reference’, in this case the well.

I’m not sure if anyone wanted this feature… but you can make clinoform models too:

wedge_log_clino.png

Lastly, the whole point of all this was to make a synthetic — the forward model of the seismic experiment. We can make a convolutional model with just a few more lines of code:

 
strat = np.array([2.32 * 2.65,  # Layer 1
                  2.35 * 2.60,  # Layer 2
                  2.35 * 2.62,  # Layer 3
                 ])

# Fancy indexing into the rocks with the model.
wedge, top, base, ref = bg.models.wedge(strat=strat)

# Make reflectivity.
rc = (wedge[1:] - wedge[:-1]) / (wedge[1:] + wedge[:-1])

# Get a wavelet.
ricker = bg.filters.ricker(0.064, 0.001, 40)

# Repeated 1D convolution for a synthetic.
syn = np.apply_along_axis(np.convolve, arr=rc, axis=0, v=ricker, mode='same')
wedge_synthetic.png

That covers most of what the tool can do — for now. I’m working on extending the models to three dimensions, so you will be able to vary layers or rock properties in the 3rd dimension. In the meantime, take these wedges for a spin and see what you can make! Do share your models on Twitter or LinkedIn, and have fun!

The hot rock hack is back

shirt_skull.png

Last year we ran the first ever Geothermal Hackathon. As with all things, we started small, but energetic: fourteen of us worked on six projects. Topics ranged from project management to geological mapping to natural language processing. It was a fun two days not thinking about coronavirus.

This year we’ll be meeting up on Thursday 13 and Friday 14 May, starting right after the Geoscience Virtual Event of the World Geothermal Congress. Everyone is invited — geoscientists, engineers, data nerds, programmers. No experience of geothermal is necessary, just creativity and curiosity.

Projects are already being discussed on the Software Underground; here are some of the ideas:

  • Data-munging project for Utah Forge, especially well 58-32.

  • Update the Awesome list Thomas Martin started last year.

  • Implementing classic, or newly published, equations and algorthims from the literature.

I expect the preceeding WGC event will spark some last-minute projects too. But for the time being, you’re welcome to add or vote on ideas on the event page. What tools or visualizations would you find useful?


Build some digital geo skills

📣 If you’re looking to build up your coding skills before the hackathon — or for a research project or an idea at work — join us for a Python class. We teach the fundamentals of Python, NumPy and matplotlib using geological and geophysical examples and geo-familiar datasets. There are two classes coming up in May (Digital Geology) and June (Digital Geophysics).

Which open licence should I choose?

I’ve written about open data a few times recently. And not-so-recently. And there’s been quite a bit of chat about open subsurface benchmarks in the Software Underground recently. As more people consider openly releasing data — or code, or other content — one question comes up fairly often is: Which licence should I choose?

I’ll start at the beginning, and I am not a lawyer, but this is going to be very high level. So do click on the links to read more.

What is copyright?

You automatically own the copyright to anything original that you create. You don’t have to register it, but the thing you made — and it must be a thing, you can’t copyright ideas — must be original. It could be a photo, a song, or a seismic interpretation. Physical measurements with no creative input, such as well logs, are not copyrightable… but a database consisting of such data is (so-called database rights). Your rights are exclusive, worldwide, and last until some years after you die (it varies).

If someone wants to use your work, even if they just found it on the Internet, they must either claim Fair Use, or seek permission from you. Giving permission means granting a licence; it can be as restrictive and arcane as you want.

If you don’t want people bothering you about licences, or if you want to actively encourage people to use and adapt your work, you can preemptively grant an open licence.

What is openness?

Before you start thinking about licences, there are two more big things to learn about:

  1. What is open? Not all licences, not even all Creative Commons licences, meet the Open Definition. In brief, this states that “Open data and content can be freely used, modified, and shared by anyone for any purpose” — you can’t restrict people based on their use case or location. So licences that forbid commercial application are not open.

  2. What is permissiveness? Once you’ve decided to go open, you need to decide where you stand on permissiveness. Some licences, notably those advocated by the GNU Free Software Movement, compel licensees (users) to preserve the openness of the work in any future redistribution. This ‘viral’ condition is sometimes called copyleft.

In some circles, a near-religious war smoulders on the permissiveness issue. You need to make up your own mind where you stand, or at least understand the issues.

By the way, granting a licence does not mean giving up your rights. In fact, you must own the copyright in order to grant the licence. Many scientists don’t realize we’ve been giving away the copyright in our work for decades, as a (completely unnecessary and made up) condition of publication.

Another source of confusion: open licences are also not the same thing as public domain. Public domain means that the work is free from copyright restrictions. In general though, it cannot be applied to a copyrighted work (though CC0 tries to relinquish copyright where possible). For example, On The Origin of Species is public domain, as is most work produced by the United States government (for example, by the USGS).

One last thing: an often overlooked aspect of licensing is protection for you, the licensor. All common licences include language that indemnifies you from misuse or misinterpretation of your work. So be careful about putting your stuff ‘out there’ with anything other than a standard licence: you may be leaving yourself open to liability issues later.

Open licences

Rather than writing a lot of stuff that’s been written by smarter people than me, I thought I’d draw a diagram to try to explain the differences between some common licences (there are certainly a lot more than the ones I mention here).

Just to re-iterate: there are a lot more licences than the ones mentioned here, these are just examples.

What do I recommend?

For content, my personal belief is that CC-BY most aptly captures the way science works. Scientists 'build on the shoulders of giants' by re-using the work of others with fastidious attribution, usually by citation. Accordingly, the CC-BY protects the licensor, ensures attribution, and that's it. If you prefer copyleft licences, the equivalent licence is CC-BY-SA.

But Creative Commons recommend against using CC licences for source code, so what should you do then?

For code, the permissive licence closest to CC-BY is the MIT/BSD/Apache family of licences, of which only the Apache 2.0 licence offers some specific protections with respect to patents (in particular, it protects licensees from ‘upstream’ patent infringements). The equivalent copyleft licences are the GPL (for applications) and LGPL (for libraries).

For data, I tend to use CC-BY, but there are some specialist data licences (beware, they are poorly named in my opinion: the seemingly ‘vanilla’ ODbL is copyleft; the permissive equivalent is ODC-By).

What about mixed content, like a Jupyter Notebook? You have to be practical; maybe it depends on whether you consider your notebooks to be 'content' or 'source code'. I sometimes put at the bottom of a notebook something like Open source content. Text is CC-BY, code is Apache 2.0 and I think this makes my intent clear.

Tools

There are some tools around to help you make a choice of licence:

Last thing

Note that open licences are just one piece of the jigsaw puzzle of reproducible science and reusable content. You also need to think about open and accessible data formats (e.g. CSV not XLS), accessible content (DOIs and open indexes), and documentation.

Although insufficient, open licences are a necessary component though. And while licences can be changed, they cannot be revoked… so it’s worth putting some thought into your choices before you start pushing your content out into the world.

If it seems hard to navigate, do get in touch, we’d be happy to help if and where we can (notwithstadning IANAL). If your situation is at all complicated I recommend seeking professional legal advice — but do go out of your way to find one who understands both the motivation for, and the legal issues around, open licensing.

The procedural generation of geology

Procedural generation is a way of faking stuff with computers. But by writing code, or otherwise defining algorithms — not by manually choosing or composing or sculpting things. It’s used to produce landscapes and other assets in computer games, or just to make beautiful things. Honestly, I know almost nothing about it, and I don’t play computer games, so I’m really just coming at it from the ‘beautiful things’ side. So let’s just stick to looking at some examples…

Robert Hodgin produces jaw-dropping images, and happily one of his favourite subjects is meandering rivers. Better yet, Harold Fisk’s maps are among his inspirations. The results are mindblowing — just check this animation out:

What’s really remarkable is that everything on that map is procedurally generated: the roadways, the vegetation, the wonderful names.

If you love meanders (who doesn’t love meanders?), then you also need to know about Zoltan Sylvester’s work (not to mention his Etsy store). He produces some great animations, and also maintains some open-source Python projects (e.g. meanderpy) for producing them, so you can get stuck in and make your own.

It’s not just about meanders. Artist Tyler Hobbs has produced some striking images that strongly resemble structural cross-sections. In this thread he mentions that this wasn’t his goal, they just came out that way.

Mattias Herder, a space-obsessed viz wizard, did intend to produce crystals though. He’s using Houdini software, which I believe is also what Robert Hodgin uses for his maps. I wonder if any geologists are using it…

Landscapes are one of the big areas of application of this sort of tech, and while not strictly geological, I love these frozen vistas by French artist Guillaume Cottet:

Finally, this example from digital artist Ian Smith hints at a bit of the creative process. This guy really knows how to make rocks…

This is all so much magic to me, but I’m intrigued. Like the black hole in Interstellar, could this kind of work actually shed light on how dynamic, non-linear natural systems work? Or is it just an illusion?

What is an Ormsby wavelet anyway?

If you dabble in reflection seismic analysis, you probably know the Ricker wavelet. We’ve visited it a few times on this blog — Evan once showed how to make and plot one, I looked at some analytic properties of it, and we even played golf with it.

The Ricker is everywhere, but it has an important limitation — bandwidth. Its shape in the frequency domain is roughly Gaussian (below, left), which is the reason it only really has one parameter: the central frequency. This kind of spectrum is sometimes okay for older seismic surveys, especially marine data, because it matches the bandwidth reasonably. But for modern surveys, or even old land data, we often want a broader set of frequencies — more of a trapezoidal spectral shape. We need the Ormsby wavelet:

ricker-vs-ormsby.png

How to make an Ormsby wavelet

The earliest reference I can find to the Ormsby wavelet is in an article by Harold Ryan entitled, Ricker, Ormsby, Klauder, Butterworth — a choice of wavelets, in the September 1994 issue of the CSEG Recorder. It’s not clear at all who Ormsby was, other than “an aeronautical engineer”. And I don’t think anyone outside exploration geophysics knows what an Ormsby is, they just call it a ‘bandpass filter’.

Ryan helpfully provided both a time-domain analytic expression — which turns out to have four typos use the classical definiton of the sinc function — and a plot:

The equation in Ryan, and my modified Figure 3 (right). the result of the equation is in red.

The equation in Ryan, and my modified Figure 3 (right). the result of the equation is in red.

ryan_ormsby-cf-expression-2.png

This equation does not produce the wavelet (black) in the plot, however, it produces the one I’ve added in red. If you find this surprising, you shouldn’t — in my experience, it’s very common for the words and/or maths in a paper not to match its figures. [Edit: as noted above, in this case it’s because of how NumPy’s sinc function is defined; see the comment from Robert Kern, below.] We established this at the SEG Repro Zoo in 2018. If an author is not required to produce code or data, it’s not very surprising; and even if they do, the peer review system is not set up for referees to do this kind of check — apart from anything else, it’s far too onerous. But I digress.

After some fiddling around, I realized that the expression being passed to NumPy’s sinc function should be \(ft\), not \(\pi ft\). This produces a result that matches the figure almost exactly (and, counting wiggles, has the right frequency). So here’s the result of that new expression, shown in green here with the original figure (black) and the same red wavelet as above:

ryan_ormsby-cf-bruges.png

This green thing is the wavelet implemented in bruges so it’s easy to produce it; the arguments are the duration (0.4 seconds), the sample interval dt (4 ms) and the corner frequencies f (5, 10, 40, and 45 Hz respectively):

bruges.filters.ormsby(duration=0.4, dt=0.004, f=[5, 10, 40, 45])

What about other examples from the literature?

Good question! Apart from my own Python code in bruges, I did find one or two other implementations:

So it seems from this tiny experiment that only one of the implementations I found matched the figure in the Ryan article perfectly. The other wavelets are variations on the theme. Which is probably fine — after all, they are all only models for real seismic impulses — but in the interests of scientific reproducibility, I think it underscores the importance of transparent methodology and publishing your code.


Update on 9 Feb: A conversation in Software Underground revealed that Petrel’s version of the Ormsby wavelet matches the bruges implementation — but with a triangular window multiplied in (similar to how a Hamming window is multiplied into the seismic.jl version.


518px-Jupyter_logo.svg.png

I pushed my Python Jupyter Notebook to the new repro-zoo repository on GitHub. Please feel free to fork this project and add your own attempted reproductions of computational geoscience papers.

The original repro-zoo repo from the 2018 event is on SEG’s GitHub.


References

Ryan, H (1994). Ricker, Ormsby, Klauder, Butterworth — a choice of wavelets. CSEG Recorder 19 (7). Available online.

Soo-Kyung Miong, Robert R. Stewart and Joe Wong (2007). Characterizing the near surface with VSP and well logs. CREWES Research Report 19. Available online.