TRANSFORM happened!

transform_sticker.jpg

How do you describe the indescribable?

Last week, Agile hosted the TRANSFORM unconference in Normandy, France. We were there to talk about the open suburface stack — the collection of open-source Python tools for earth scientists. We also spent time on the state of the Software Underground, a global community of practice for digital subsurface scientists and engineers. In effect, this was the first annual Software Underground conference. This was SwungCon 1.

The space

I knew the Château de Rosay was going to be nice. I hoped it was going to be very nice. But it wasn’t either of those things. It exceeded expectations by such a large margin, it seemed a little… indulgent, Excessive even. And yet it was cheaper than a Hilton, and you couldn’t imagine a more perfect place to think and talk about the future of open source geoscience, or a more productive environment in which to write code with new friends and colleagues.

It turns out that a 400-year-old château set in 8 acres of parkland in the heart of Normandy is a great place to create new things. I expect Gustave Flaubert and Guy de Maupassant thought the same when they stayed there 150 years ago. The forty-two bedrooms house exactly the right number of people for a purposeful scientific meeting.

This is frustrating, I’m not doing the place justice at all.

The work

This was most people’s first experience of an unconference. It was undeniably weird walking into a week-long meeting with no schedule of events. But, despite being inexpertly facilitated by me, the 26 participants enthusiastically collaborated to create the agenda on the first morning. With time, we appreciated the possibilities of the open space — it lets the group talk about exactly what it needs to talk about, exactly when it needs to talk about it.

The topics ranged from the governance and future of the Software Underground, to the possibility of a new open access journal, interesting new events in the Software Underground calendar, new libraries for geoscience, a new ‘core’ library for wells and seismic, and — of course — machine learning. I’ll be writing more about all of these topics in the coming weeks, and there’s already lots of chatter about them on the Software Underground Slack (which hit 1500 members yesterday!).

The food

I can’t help it. I have to talk about the food.

…but I’m not sure where to start. The full potential of food — to satisfy, to delight, to start conversations, to impress, to inspire — was realized. The food was central to the experience, but somehow not even the most wonderful thing about the experience of eating at the chateau. Meals were prefaced by a presentation by the professionals in the kitchen. No dish was repeated… indeed, no seating arrangement was repeated. The cheese was — if you are into cheese — off the charts.

There was a professionalism and thoughtfulness to the dining that can perhaps only be found in France.

Sorry everyone. This was one of those occasions when you had to be there. If you weren’t there, you missed out. I wish you’d been there. You would have loved it.

The good news is that it will happen again. Stay tuned.

The order of stratigraphic sequences

Much of stratigraphic interpretation depends on a simple idea:

Depositional environments that are adjacent in a geographic sense (like the shoreface and the beach, or a tidal channel and tidal mudflats) are adjacent in a stratigraphic sense, unless separated by an unconformity.

Usually, geologists are faced with only the stratigraphic picture, and are challenged with reconstructing the geographic picture.

One interpretation strategy might be to look at which rocks tend to occur together in the stratigraphy. The idea is that rock types tend to be associated with geographic environments — maybe fine sand on the shoreface, coarse sand on the beach; massive silt in the tidal channel, rhythmically laminated mud in the mud-flats. Since if two rocks tend to occur together, their environments were probably adjacent, we can start to understand associations between the rock types, and thus piece together the geographic picture.

So which rock types tend to occur together, and which juxtapositions are spurious — perhaps the result of allocyclic mechanisms like changes in relative sea-level, or sediment supply? To get at this question, some stratigraphers turn to Markov chain analysis.

What is a Markov chain?

Markov chains are sequences of events, or states, resulting from a Markov process. Here’s how Wikipedia describes a Markov process:

A stochastic process that satisfies the Markov property (sometimes characterized as “memorylessness”). Roughly speaking, a process satisfies the Markov property if one can make predictions for the future of the process based solely on its present state just as well as one could knowing the process’s full history, hence independently from such history; i.e., conditional on the present state of the system, its future and past states are independent.

So if we believe that a stratigraphic sequence (I’m using ‘sequence’ here in the most general sense) can be modeled by a process like this — i.e. that its next state depends substantially on its present state — then perhaps we can model it as a Markov chain.

For example, we might have a hunch that we can model a shallow marine system as a sequence like:

offshore mudstone > lower shoreface siltstone > upper shoreface sandstone > foreshore sandstone

Then we might expect to see these transitions occur more often than other, non-successive transitions. In other words — if we compare the transition frequencies we observe to the transition frquencies we would expect from a random sequence of the same beds in the same proportions, then autocyclic or genetic transitions might happen unusually frequently.

The Powers & Easterling method

Several workers have gone down this path. The standard approach seems to be that of Powers & Easterling (1982). Here are the steps they describe:

  • Count the upwards transitions for each rock type. This results in a matrix of counts. Here’s the transition frequency matrix for the example used in the Powers & Easterling paper, in turn take from Gingerich (1969):

 
data = [[ 0, 37,  3,  2],
        [21,  0, 41, 14],
        [20, 25,  0,  0],
        [ 1, 14,  1,  0]]
  • Compute the expected counts by an iterative process, which usually converges in a few steps. The expected counts represent what Goodman (1968) called a ‘quasi-independence’ model — a random sequence:

 
array([[ 0. , 31.3,  8.2,  2.6],
       [31.3,  0. , 34.1, 10.7],
       [ 8.2, 34. ,  0. ,  2.8],
       [ 2.6, 10.7,  2.8,  0. ]])
  • Now we can compare our observed frequencies with the expected ones in two ways. First, we can inspect the \(\chi^2\) statistic, and compare it with the \(\chi^2\) distribution, given the degrees of freedom (5 in this case). In this example, it’s 35.7, which is beyond the 99.999th percentile of the chi-squared distribution. This rejects the hypothesis of quasi-independence. In other words: the succession appears to be organized. Phew!

  • Secondly, we can compute a matrix of so-called normalized differences. This lets us compare the observed and expected data. By calculating Z-scores, which are approximately normally distributed; since 95% of the distribution falls between −2 and +2, any value greater in magnitude than 2 is ‘fairly unusual’, in the words of Powers & Easterling. In the example, we can see that the large number of transitions from C (third row) to A (first column) is anomalous:

 
 
array([[ 0. ,  1. , -1.8, -0.3],
       [-1.8,  0. ,  1.2,  1. ],
       [ 4.1, -1.6,  0. , -1.7],
       [-1. ,  1. , -1.1,  0. ]])
powers_easterling_normdiff.png
  • The normalized difference matrix can also be interpreted as a directed graph, indicating the ‘strengths’ of the connections (edges) between rock types (nodes):

powers_easterling_graph.png

It would be all too easy to over-interpret this graph — B and D seem to go together, as do A and C, and C tends to pass into A, which tends to pass into a B/D system before passing back into C — and one could get carried away. But as a complement to sedimentological interpretation, knowledge of processes and the succession in hand, perhaps inspecting Markov chains can help understand the stratigraphic story.

One last thing… there is another use for Markov chains. We can also use the model to produce stochastic realizations of stratigraphy. These will share the same statistics as the original data, but are otherwise quite random. Here are 20 random beds generated from our model:

 
'ABABCBABABCABDABABCA'

The code to build your own Markov chains is all in this notebook. It’s very much a work in progress. Eventually I hope to merge it into the striplog library, but for now it’s a ‘minimum viable product’. Stay tuned for more on striplog.

Open In Colab   ⇐   Launch the notebook right here in your browser!


References

Gingerich, PD (1969). Markov analysis of cyclic alluvial sediments. Journal of Sedimentary Petrology, 39, p. 330-332. https://doi.org/10.1306/74D71C4E-2B21-11D7-8648000102C1865D

Goodman, LA (1968), The analysis of cross-classified data: independence, quasi-independence, and interactions in contingency tables with or without missing entries. Journal of American Statistical Association 63, p. 1091-1131. https://doi.org/10.2307/2285873

Powers, DW and RG Easterling (1982). Improved methodology for using embedded Markov chains to describe cyclical sediments. Journal of Sedimentary Petrology 52 (3), p. 0913-0923. https://doi.org/10.1306/212F808F-2B24-11D7-8648000102C1865D

The next thing

Over the last several years, Agile has been testing some of the new ways of collaborating, centered on digital connections:

2010-2019-timeline.png
  • It all started with this blog, which started in 2010 with my move from Calgary to Nova Scotia. It’s become a central part of my professional life, but we’re all about collaboration and blogs are almost entirely one-way, so…

  • In 2011 we launched SubSurfWiki. It didn’t really catch on, although it was a good basis for some other experiments and I still use it sometimes. Still, we realized we had to do more to connect the community, so…

  • In 2012 we launched our 52 Things collaborative, open access book series. There are well over 5000 of these out in the wild now, but it made us crave a real-life, face-to-face collaboration, so…

  • In 2013 we held the first ‘unsession’, a mini-unconference, at the Canada GeoConvention. Over 50 people came to chat about unsolved problems. We realized we needed a way to actually work on problems, so…

  • Later that year, we followed up with the first geoscience hackathon. Around 15 or so of us gathered in Houston for a weekend of coding and tacos. We realized that the community needed more coding skills, so…

  • In 2014 we started teaching a one-day Python course aimed squarely at geoscientists. We only teach with subsurface data and algorithms, and the course is now 5 days long. We now needed a way to connect all these new hackers and coders, so…

  • In 2014, together with Duncan Child, we also launched Software Underground, a chat room for discussing topics related to the earth and computers. Initially it was a Google Group but in 2015 we relaunched it as an open Slack team. We wanted to double down on scientific computing, so…

  • In 2015 and 2016 we launched a new web app, Pick This (returning soon!), and grew our bruges and welly open source Python projects. We also started building more machine learning projects, and getting really good at it.

Growing and honing

We have spent the recent years growing and honing these projects. The blog gets about 10,000 readers a month. The sixth 52 Things book is on its way. We held two public unsessions this year. The hackathons have now grown to 60 or so hackers, and have had about 400 participants in total, and five of them this year already (plus three to come!). We have also taught Python to 400 geoscientists, including 250 this year alone. And the Software Underground has over 1000 members.

In short, geoscience has gone digital, and we at Agile are grateful and excited to be part of it. At no point in my career have I been more optimistic and energized than I am right now.

So it’s time for the next thing.

The next thing is starting with a new kind of event. The first one is 5 to 11 May 2019, and it’s happening in France. I’ll tell you all about it tomorrow.

Reproducibility Zoo

repro-zoo-main-banner.png

The Repro Zoo was a new kind of event at the SEG Annual Meeting this year. The goal: to reproduce the results from well-known or important papers in GEOPHYSICS or The Leading Edge. By reproduce, we meant that the code and data should be open and accessible. By results, we meant equations, figures, and other scientific outcomes.

And some of the results are scary enough for Hallowe’en :)

What we did

All the work went straight into GitHub, mostly as Jupyter Notebooks. I had a vague goal of hitting 10 papers at the event, and we achieved this (just!). I’ve since added a couple of other papers, since the inspiration for the work came from the Zoo… and I haven’t been able to resist continuing.

The scene at the Repro Zoo. An air of quiet productivity hung over the booth. Yes, that is Sergey Fomel and Jon Claerbout. Thank you to David Holmes of Dell EMC for the picture.

The scene at the Repro Zoo. An air of quiet productivity hung over the booth. Yes, that is Sergey Fomel and Jon Claerbout. Thank you to David Holmes of Dell EMC for the picture.

Here’s what the Repro Zoo team got up to, in alphabetical order:

  • Aldridge (1990). The Berlage wavelet. GEOPHYSICS 55 (11). The wavelet itself, which has also been added to bruges.

  • Batzle & Wang (1992). Seismic properties of pore fluids. GEOPHYSICS 57 (11). The water properties, now added to bruges.

  • Claerbout et al. (2018). Data fitting with nonstationary statistics, Stanford. Translating code from FORTRAN to Python.

  • Claerbout (1975). Kolmogoroff spectral factorization. Thanks to Stewart Levin for this one.

  • Connolly (1999). Elastic impedance. The Leading Edge 18 (4). Using equations from bruges to reproduce figures.

  • Liner (2014). Long-wave elastic attentuation produced by horizontal layering. The Leading Edge 33 (6). This is the stuff about Backus averaging and negative Q.

  • Luo et al. (2002). Edge preserving smoothing and applications. The Leading Edge 21 (2).

  • Yilmaz (1987). Seismic data analysis, SEG. Okay, not the whole thing, but Sergey Fomel coded up a figure in Madagascar.

  • Partyka et al. (1999). Interpretational aspects of spectral decomposition in reservoir characterization.

  • Röth & Tarantola (1994). Neural networks and inversion of seismic data. Kudos to Brendon Hall for this implementation of a shallow neural net.

  • Taner et al. (1979). Complex trace analysis. GEOPHYSICS 44. Sarah Greer worked on this one.

  • Thomsen (1986). Weak elastic anisotropy. GEOPHYSICS 51 (10). Reproducing figures, again using equations from bruges.

As an example of what we got up to, here’s Figure 14 from Batzle & Wang’s landmark 1992 paper on the seismic properties of pore fluids. My version (middle, and in red on the right) is slightly different from that of Batzle and Wang. They don’t give a numerical example in their paper, so it’s hard to know where the error is. Of course, my first assumption is that it’s my error, but this is the problem with research that does not include code or reference numerical examples.

Figure 14 from Batzle & Wang (1992). Left: the original figure. Middle: My attempt to reproduce it. Right: My attempt in red, overlain on the original.

This was certainly not the only discrepancy. Most papers don’t provide the code or data to reproduce their figures, and this is a well-known problem that the SEG is starting to address. But most also don’t provide worked examples, so the reader is left to guess the parameters that were used, or to eyeball results from a figure. Are we really OK with assuming the results from all the thousands of papers in GEOPHYSICS and The Leading Edge are correct? There’s a long conversation to have here.

What next?

One thing we struggled with was capturing all the ideas. Some are on our events portal. The GitHub repo also points to some other sources of ideas. And there was the Big Giant Whiteboard (below). Either way, there’s plenty to do (there are thousands of papers!) and I hope the zoo continues in spirit. I will take pull requests until the end of the year, and I don’t see why we can’t add more papers until then. At that point, we can start a 2019 repo, or move the project to the SEG Wiki, or consider our other options. Ideas welcome!

IMG_20181017_163926.jpg

Thank you!

The following people and organizations deserve accolades for their dedication to the idea and hard work making it a reality. Please give them a hug or a high five when you see them.

  • David Holmes (Dell EMC) and Chance Sanger worked their tails off on the booth over the weekend, as well as having the neighbouring Dell EMC booth to worry about. David also sourced the amazing Dell tech we had at the booth, just in case anyone needed 128GB of RAM and an NVIDIA P5200 graphics card for their Jupyter Notebook. (The lights in the convention centre actually dimmed when we powered up our booths in the morning.)

  • Luke Decker (UT Austin) organized a corps of volunteer Zookeepers to help manage the booth, and provided enthusiasm and coding skills. Karl Schleicher (UT Austin), Sarah Greer (MIT), and several others were part of this effort.

  • Andrew Geary (SEG) for keeping things moving along when I became delinquent over the summer. Lots of others at SEG also helped, mainly with the booth: Trisha DeLozier, Rebecca Hayes, and Beth Donica all contributed.

  • Diego Castañeda got the events site in shape to support the Repro Zoo, with a dashboard showing the latest commits and contributors.

Reproduce this!

logo_simple.png

There’s a saying in programming: untested code is broken code. Is unreproducible science broken science?

I hope not, because geophysical research is — in general — not reproducible. In other words, we have no way of checking the results. Some of it, hopefully not a lot of it, could be broken. We have no way of knowing.

Next week, at the SEG Annual Meeting, we plan to change that. Well, start changing it… it’s going to take a while to get to all of it. For now we’ll be content with starting.

We’re going to make geophysical research reproducible again!

Welcome to the Repro Zoo!

If you’re coming to SEG in Anaheim next week, you are hereby invited to join us in Exposition Hall A, Booth #749.

We’ll be finding papers and figures to reproduce, equations to implement, and data tables to digitize. We’ll be hunting down datasets, recreating plots, and dissecting derivations. All of it will be done in the open, and all the results will be public and free for the community to use.

You can help

There are thousands of unreproducible papers in the geophysical literature, so we are going to need your help. If you’ll be in Anaheim, and even if you’re not, here some things you can do:

That’s all there is to it! Whether you’re a coder or an interpreter, whether you have half an hour or half a day, come along to the Repro Zoo and we’ll get you started.

Figure 1 from Connolly’s classic paper on elastic impedance. This is the kind of thing we’ll be reproducing.

Figure 1 from Connolly’s classic paper on elastic impedance. This is the kind of thing we’ll be reproducing.

Are there benefits to pseudoscience?

No, of course there aren't. 

Balance! The scourge of modern news. CC-BY by SkepticalScience.com

Balance! The scourge of modern news. CC-BY by SkepticalScience.com

Unless... unless you're a journalist, perhaps. Then a bit of pseudoscience can provide some much-needed balance — just to be fair! — to the monotonic barrage of boring old scientific consensus. Now you can write stories about flat-earthers, anti-vaxxers, homeopathy, or the benefits of climate change!*

So far, so good. It's fun to pillory the dimwits who think the moon landings were filmed in a studio in Utah, or that humans have had no impact on Earth's climate. The important thing is for the journalist to have a clear and unequivocal opinion about it. If an article doesn't make it clear that the deluded people at the flat-earth convention ("Hey, everyone thought Copernicus was mad!") have formed their opinions in spite of, not because of, the overwhelming evidence before them, then readers might think the journalist — and the publisher — agree with them.

In other words, if you report on hogwash, then you had better say that it's hogwash, or you end up looking like one of the washers of the hog.


Fake geoscience?

AAPG found this out recently, when the August issue of its Explorer magazine published an article by Ken Milam called Are there benefits to climate change? Ken was reporting on a talk by AAPG member Greg Wrightstone at URTeC in July. Greg wrote a book called Inconvenient Facts: The Science That Al Gore Doesn't Want You To Know. The gist: no need to be concerned about carbon dioxide because, "The U.S. Navy’s submarines often exceed 8,000 ppm (20 times current levels) and there is no danger to our sailors" — surely some of the least watertight reasoning I've ever encountered. Greg's basic idea is that, since the earth has been warmer before, with higher levels of CO2, there's nothing to worry about today (those Cretaceous conurbations and Silurian civilizations had no trouble adapting!) So he thinks, "the correct policy to address climate change is to have the courage to do nothing".

So far, so good. Except that Ken — in reporting 'just the facts' — didn't mention that Greg's talk was full of half-truths and inaccuracies and that few earth scientists agree with him. He forgot to remark upon the real news story: how worrying it is that URTeC 2018 put on a breakfast promoting Greg and his marginal views. He omitted to point out that this industry needs to grow up and face the future with reponsibility, supporting society with sound geoscience.

So it looked a bit like Explorer and AAPG were contributing to the washing of this particular hog.


Discussion

As you might expect, there was some discussion about the article — both on aapg.org and on Twitter (and probably elsewhere). For example, Mark Tingay (University of Adelaide) called AAPG and SPE out:

So did Brian Romans (Virginia Tech):

And there was further discussion (sort of) involving Greg Wrightstone himself. Trawl through Mark Tingay's timeline, especially his systematic dismantling of Greg's 'evidence', if your curiosity gets the better of you.


Response

Of course AAPG noticed the commotion. The September issue of Explorer contains two statements from AAPG staff. David Curtiss, AAPG Executive Director, said this in his column:

Milam was assigned to report on an invited presentation by Greg Wrightstone, a past president of AAPG’s Eastern Section, based on a recently self-published book on climate change, at the Unconventional Resources Technology Conference in July. Here was an AAPG Member and past section officer speaking about climate change – an issue of interest to many of our members, who had been invited by a group of his geoscience and engineering peers to present at a topical breakfast – not a technical session – at a major conference.

This sounds fine, on the face of it, but details matter. A glance at the book in question should have been enough to indicate that the content of the talk could only have been presented in a non-technical session, with a side of hash browns.

Anyway, David does go on to point out the tension between the petroleum industry's activities and society's environmental concerns. The tension is real, and AAPG and its members, are in the middle of it. We can contribute scientifically to the conversations that need to happen to resolve that tension. But pushing junk science and polemical bluster is definitely not going to help. I believe that most of the officers and members of AAPG agree. 

The editor of Explorer, Brian Ervin, had this to say:

For the record, none of our coverage of any issue or any given perspective on an issue should be taken as an endorsement — explicit or implicit — of that perspective. Also, the EXPLORER is — quite emphatically — not a scientific journal. Our content is not peer-reviewed. [...] No, the EXPLORER exists for an entirely different purpose. We provide news about Earth science, the industry and the Association, so our mission is different and unrelated to that of a scientific publication.

He goes on to say that he knew that Wrightstone's views are not popular and that it would provoke some reaction, but wanted to present it impartially and "give [readers] the opportunity to evaluate his position for themselves".

I just hope Explorer doesn't start doing this with too many other marginal opinions.


I'd have preferred to see AAPG back-pedal a bit more energetically. Publishing this article was a mistake. AAPG needs to think about the purpose, and influence, of its reporting, as well as its stance on climate change (which, according to David Curtiss, hasn't been discussed substantially in more than 10 years). This isn't about pushing agendas, any more than talking about the moon landings is about pushing agendas. It's about being a modern scientific association with high aspirations for itself, its members, and society.

Get out of the way

This tweet from the Ecological Society of America conference was interesting:

This kind of thing is not new — many conferences have 'No photos' signs around the posters and the talk sessions. 'No tweeting' seems pretty extreme though. I'm not sure if that's what the ESA was pushing for in this case, but either way the message is: 'No sharing stuff'. They do have a hashtag though, so...

Anyway, I tweeted this in response:

I think this tells you just as much about how broken the conference model is, as about how naïve/afraid our technical societies are.

I think there's a general rule: if you're trying to control the flow of information, you're getting in the way. You're also going to be disappointed because you can't control the flow of information — perhaps because it's not yours to control. I want to say to the organizers: The people you invited into your society are, thankfully, enthusiastic collaborators who can't wait to share the exciting things they heard at your conference. Why on earth would you try to shut that down? Why wouldn't you go out of your way to support them, amplify them, and find more people like them?

But wait, the no-tweeting society asks, what if the author didn't want anyone to share their work? My first question is: why did you give a talk then? My second question is: did the sharer give you proper attribution? If not — you are right to be annoyed and your society should help set this norm in your community. If so — see my first question.

Technical societies need to get over the idea that they own their communities and the knowledge their communities produce. They fret about revenue and membership numbers, but they just need to focus on making their members' technical and professional lives richer and more connected. The rest will take care of itself.


Interested in this topic? Here's a great post about tweeting at conferences, by Jacquelyn Gill. It also links to lots of other opinions, and there are lots of comments.

Image by Rob Salguero-Gómez.

It's Dynamic Range Day!

OK signal processing nerds, which side are you on in the Loudness War?

If you haven't heard of the Loudness War, you have some catching up to do! This little video by Matt Mayfield is kinda low-res but it's the shortest and best explanation I've been able to find. Watch it, then choose sides >>>>

There's a similar-but-slightly-different war going on in photography: high-dynamic-range or HDR photography is, according to some purists, an existential threat to photography. I'm not going to say any more about it today, but these HDR disasters speak volumes.

True amplitudes

The ideology at the heart of the Loudness War is that music production should be 'pure'. It's analogous to the notion that amplitudes in seismic images should be 'true', and just as nuanced. For some, the idea could be to get as close as possible to a live performance, for others it might be to create a completely synthetic auditory experience; for a record company the main point is to be noticed and then purchased (or at least searched for on Spotify). It reminds me a bit of the aesthetically

For a couple of decades, mainstream producers succumbed to the misconception that driving up the loudness — by increasing the mean amplitude, in turn by reducing the peaks and boosting the quiet passages — was the solution. But this seems to be changing. Through his tireless dedication to the cause, engineer Ian Shepherd has been a key figure in unpeeling this idée fixe. As part of his campaigning, he instituted Dynamic Range Day, and tomorrow is the 8th edition. 

If you want to hear examples of well-produced, dynamic music, check out the previous winners and runners up of the Dynamic Range Day Award — including tunes by Daft Punk, The XX, Kendrick Lamar, and at the risk of dating myself, Orbital.

The end is in sight

I'll warn you right now — this Loudness War thing is a bit of a YouTube rabbithole. But if you still haven't had enough, it's worth listening to the legendary Bob Katz talking about the weapons of war.

My takeaway: the war is not over, but battles are being won. For example, Spotify last year reduced its target output levels, encouraging producers to make more dynamic records. Katz ends his video with "2020 will be like 1980" — which is a good thing, in terms of audio engineering — and most people seem to think the Loudness War will be over.

Real and apparent seismic frequency

There's a Jupyter Notebook for you to follow along with this tutorial. You can run it right here in your browser.


We often use Ricker wavelets to model seismic, for example when making a synthetic seismogram with which to help tie a well. One simple way to guesstimate the peak or central frequency of the wavelet that will model a particlar seismic section is to count the peaks per unit time in the seismic. But this tends to overestimate the actual frequency because the maximum frequency of a Ricker wavelet is more than the peak frequency. The question is, how much more?

To investigate, let's make a Ricker wavelet and see what it looks like in the time and frequency domains.

>>> T, dt, f = 0.256, 0.001, 25

>>> import bruges
>>> w, t = bruges.filters.ricker(T, dt, f, return_t=True)

>>> import scipy.signal
>>> f_W, W = scipy.signal.welch(w, fs=1/dt, nperseg=256)
The_frequency_of_a_Ricker_2_0.png

When we count the peaks in a section, the assumption is that this apparent frequency — that is, the reciprocal of apparent period or distance between the extrema — tells us the dominant or peak frequency.

To help see why this assumption is wrong, let's compare the Ricker with a signal whose apparent frequency does match its peak frequency: a pure cosine:

>>> c = np.cos(2 * 25 * np.pi * t)
>>> f_C, C = scipy.signal.welch(c, fs=1/dt, nperseg=256)
The_frequency_of_a_Ricker_4_0.png

Notice that the signal is much narrower in bandwidth. If we allowed more oscillations, it would be even narrower. If it lasted forever, it would be a spike in the frequency domain.

Let's overlay the signals to get a picture of the difference in the relative periods:

The_frequency_of_a_Ricker_6_1.png

The practical consequence of this is that if we estimate the peak frequency to be \(f\ \mathrm{Hz}\), then we need to reduce \(f\) by some factor if we want to design a wavelet to match the data. To get this factor, we need to know the apparent period of the Ricker function, as given by the time difference between the two minima.

Let's look at a couple of different ways to find those minima: numerically and analytically.

Find minima numerically

We'll use scipy.optimize.minimize to find a numerical solution. In order to use it, we'll need a slightly different expression for the Ricker function — casting it in terms of a time basis t. We'll also keep f as a variable, rather than hard-coding it in the expression, to give us the flexibility of computing the minima for different values of f.

Here's the equation we're implementing:

$$ w(t, f) = (1 - 2\pi^2 f^2 t^2)\ e^{-\pi^2 f^2 t^2} $$

In Python:

>>> def ricker(t, f):
>>>     return (1 - 2*(np.pi*f*t)**2) * np.exp(-(np.pi*f*t)**2)

Check that the wavelet looks like it did before, by comparing the output of this function when f is 25 with the wavelet w we were using before:

>>> f = 25
>>> np.allclose(w, ricker(t, f=25))
True

Now we call SciPy's minimize function on our ricker function. It itertively searches for a minimum solution, then gives us the x (which is really t in our case) at that minimum:

>>> import scipy.optimize
>>> f = 25
>>> scipy.optimize.minimize(ricker, x0=0, args=(f))

fun: -0.4462603202963996
 hess_inv: array([[1]])
      jac: array([-2.19792128e-07])
  message: 'Optimization terminated successfully.'
     nfev: 30
      nit: 1
     njev: 10
   status: 0
  success: True
        x: array([0.01559393])

So the minimum amplitude, given by fun, is -0.44626 and it occurs at an x (time) of \(\pm 0.01559\ \mathrm{s}\).

In comparison, the minima of the cosine function occur at a time of \(\pm 0.02\ \mathrm{s}\). In other words, the period appears to be \(0.02 - 0.01559 = 0.00441\ \mathrm{s}\) shorter than the pure waveform, which is...

>>> (0.02 - 0.01559) / 0.02
0.22050000000000003

...about 22% shorter. This means that if we naively estimate frequency by counting peaks or zero crossings, we'll tend to overestimate the peak frequency of the wavelet by about 22% — assuming it is approximately Ricker-like; if it isn't we can use the same method to estimate the error for other functions.

This is good to know, but it would be interesting to know if this parameter depends on frequency, and also to have a more precise way to describe it than a decimal. To get at these questions, we need an analytic solution.

Find minima analytically

Python's SymPy package is a bit like Maple — it understands math symbolically. We'll use sympy.solve to find an analytic solution. It turns out that it needs the Ricker function writing in yet another way, using SymPy symbols and expressions for \(\mathrm{e}\) and \(\pi\).

import sympy as sp
t, f = sp.Symbol('t'), sp.Symbol('f')
r = (1 - 2*(sp.pi*f*t)**2) * sp.exp(-(sp.pi*f*t)**2)

Now we can easily find the solutions to the Ricker equation, that is, the times at which the function is equal to zero:

>>> sp.solvers.solve(r, t)
[-sqrt(2)/(2*pi*f), sqrt(2)/(2*pi*f)]

But this is not quite what we want. We need the minima, not the zero-crossings.

Maybe there's a better way to do this, but here's one way. Note that the gradient (slope or derivative) of the Ricker function is zero at the minima, so let's just solve the first time derivative of the Ricker function. That will give us the three times at which the function has a gradient of zero.

>>> dwdt = sp.diff(r, t)
>>> sp.solvers.solve(dwdt, t)
[0, -sqrt(6)/(2*pi*f), sqrt(6)/(2*pi*f)]

In other words, the non-zero minima of the Ricker function are at:

$$ \pm \frac{\sqrt{6}}{2\pi f} $$

Let's just check that this evaluates to the same answer we got from scipy.optimize, which was 0.01559.

>>> np.sqrt(6) / (2 * np.pi * 25)
0.015593936024673521

The solutions agree.

While we're looking at this, we can also compute the analytic solution to the amplitude of the minima, which SciPy calculated as -0.446. We just plug one of the expressions for the minimum time into the expression for r:

>>> r.subs({t: sp.sqrt(6)/(2*sp.pi*f)})
-2*exp(-3/2)

Apparent frequency

So what's the result of all this? What's the correction we need to make?

The minima of the Ricker wavelet are \(\sqrt{6}\ /\ \pi f_\mathrm{actual}\ \mathrm{s}\) apart — this is the apparent period. If we're assuming a pure tone, this period corresponds to an apparent frequency of \(\pi f_\mathrm{actual}\ /\ \sqrt{6}\ \mathrm{Hz}\). For \(f = 25\ \mathrm{Hz}\), this apparent frequency is:

>>> (np.pi * 25) / np.sqrt(6)
32.06374575404661

If we were to try to model the data with a Ricker of 32 Hz, the frequency will be too high. We need to multiply the frequency by a factor of \(\sqrt{6} / \pi\), like so:

>>> 32.064 * np.sqrt(6) / (np.pi)
25.00019823475659

This gives the correct frequency of 25 Hz.

To sum up, rearranging the expression above:

$$ f_\mathrm{actual} = f_\mathrm{apparent} \frac{\sqrt{6}}{\pi} $$

Expressed as a decimal, the factor we were seeking is therefore \(\sqrt{6}\ /\ \pi\):

>>> np.sqrt(6) / np.pi
0.779696801233676

That is, the reduction factor is 22%.


Curious coincidence: in the recent Pi Day post, I mentioned the Riemann zeta function of 2 as a way to compute \(\pi\). It evaluates to \((\pi / \sqrt{6})^2\). Is there a million-dollar connection between the humble Ricker wavelet and the Riemann hypothesis?

I doubt it.

 
 

Happy π day, Einstein

It's Pi Day today, and also Einstein's 139th birthday. MIT celebrates it at 6:28 pm — in honour of pi's arch enemy, tau — by sending out its admission notices.

And Stephen Hawking died today. He will leave a great, black hole in modern science. I saw him lecture in London not long after A Brief History of Time came out. It was one of the events that inspired me along my path to science. I recall he got more laughs than a lot of stand-ups I've seen.

But I can't really get behind 3/14. The weird American way of writing dates, mixed-endian style, really irks me. As a result, I have previously boycotted Pi Day, instead celebrating it on 31/4, aka 31 April, aka 1 May. Admittedly, this takes the edge off the whole experience a bit, so I've decided to go full big-endian and adopt ISO-8601 from now on, which means Pi Day is on 3141-5-9. Expect an epic blog post that day.

Transcendence

Anyway, I will transcend the bickering over dates (pausing only to reject 22/7 and 6/28 entirely so don't even start) to get back to pi. It so happens that Pi Day is of great interest in our house this year because my middle child, Evie (10), is a bit obsessed with pi at the moment. Obsessed enough to be writing a book about it (she writes a lot of books; some previous topics: zebras, Switzerland, octopuses, and Settlers of Catan fan fiction, if that's even a thing).

I helped her find some ways to generate pi numerically. My favourite one uses Riemann's zeta function, which we'd recently watched a Numberphile video about. It's the sum of the reciprocals of the natural numbers raised to increasing powers:

$$\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}$$

Leonhard Euler solved the Basel problem in 1734, proving that \(\zeta(2) = \pi^2 / 6\), so you can compute pi slowly with a naive implementation of the zeta function:

 
def zeta(s, terms=1000):
    z = 0
    for t in range(1, int(terms)):
        z += 1 / t**s
    return z

(6 * zeta(2, terms=1e7))**0.5

Which returns pi, correct to 6 places:

 
3.141592558095893

Or you can use one of the various optimized versions of the zeta function, for example this one from the floating point math library mpmath (which I got from this awesome list of 100 ways to compute pi):

 
>>> from mpmath import *
>>> mp.dps = 50
>>> mp.pretty = True
>>>
>>> sqrt(6*zeta(2))
3.1415926535897932384626433832795028841971693993751068

...which is correct to 50 decimal places.

Here's the bit of Evie's book where she explains a bit about transcendental numbers.

Evie's book shows the relationships between the sets of natural numbers (N), integers (Z), rationals (Q), algebraic numbers (A), and real numbers (R). Transcendental numbers are real, but not algebraic. (Some definitions also let them be complex.)

Evie's book shows the relationships between the sets of natural numbers (N), integers (Z), rationals (Q), algebraic numbers (A), and real numbers (R). Transcendental numbers are real, but not algebraic. (Some definitions also let them be complex.)

I was interested in this, because while I 'knew' that pi is transcendental, I couldn't really articulate what that really meant, and why (say) √2, which is also irrational, is not also transcendental. Succinctly, transcendental means 'non-algebraic' (equivalent to being non-constructible). Since √2 is obviously the solution to \(x^2 - 2 = 0\), it is algebraic and therefore not transcendental. 

Weirdly, although hardly any numbers are known to be transcendental, almost all real numbers are. Isn't maths awesome?

Have a transcendental pi day!


The xkcd comic is by Randall Munroe and licensed CC-BY-NC.