Real and apparent seismic frequency

There's a Jupyter Notebook for you to follow along with this tutorial. You can run it right here in your browser.

We often use Ricker wavelets to model seismic, for example when making a synthetic seismogram with which to help tie a well. One simple way to guesstimate the peak or central frequency of the wavelet that will model a particlar seismic section is to count the peaks per unit time in the seismic. But this tends to overestimate the actual frequency because the maximum frequency of a Ricker wavelet is more than the peak frequency. The question is, how much more?

To investigate, let's make a Ricker wavelet and see what it looks like in the time and frequency domains.

>>> T, dt, f = 0.256, 0.001, 25

>>> import bruges
>>> w, t = bruges.filters.ricker(T, dt, f, return_t=True)

>>> import scipy.signal
>>> f_W, W = scipy.signal.welch(w, fs=1/dt, nperseg=256)

When we count the peaks in a section, the assumption is that this apparent frequency — that is, the reciprocal of apparent period or distance between the extrema — tells us the dominant or peak frequency.

To help see why this assumption is wrong, let's compare the Ricker with a signal whose apparent frequency does match its peak frequency: a pure cosine:

>>> c = np.cos(2 * 25 * np.pi * t)
>>> f_C, C = scipy.signal.welch(c, fs=1/dt, nperseg=256)

Notice that the signal is much narrower in bandwidth. If we allowed more oscillations, it would be even narrower. If it lasted forever, it would be a spike in the frequency domain.

Let's overlay the signals to get a picture of the difference in the relative periods:


The practical consequence of this is that if we estimate the peak frequency to be \(f\ \mathrm{Hz}\), then we need to reduce \(f\) by some factor if we want to design a wavelet to match the data. To get this factor, we need to know the apparent period of the Ricker function, as given by the time difference between the two minima.

Let's look at a couple of different ways to find those minima: numerically and analytically.

Find minima numerically

We'll use scipy.optimize.minimize to find a numerical solution. In order to use it, we'll need a slightly different expression for the Ricker function — casting it in terms of a time basis t. We'll also keep f as a variable, rather than hard-coding it in the expression, to give us the flexibility of computing the minima for different values of f.

Here's the equation we're implementing:

$$ w(t, f) = (1 - 2\pi^2 f^2 t^2)\ e^{-\pi^2 f^2 t^2} $$

In Python:

>>> def ricker(t, f):
>>>     return (1 - 2*(np.pi*f*t)**2) * np.exp(-(np.pi*f*t)**2)

Check that the wavelet looks like it did before, by comparing the output of this function when f is 25 with the wavelet w we were using before:

>>> f = 25
>>> np.allclose(w, ricker(t, f=25))

Now we call SciPy's minimize function on our ricker function. It itertively searches for a minimum solution, then gives us the x (which is really t in our case) at that minimum:

>>> import scipy.optimize
>>> f = 25
>>> scipy.optimize.minimize(ricker, x0=0, args=(f))

fun: -0.4462603202963996
 hess_inv: array([[1]])
      jac: array([-2.19792128e-07])
  message: 'Optimization terminated successfully.'
     nfev: 30
      nit: 1
     njev: 10
   status: 0
  success: True
        x: array([0.01559393])

So the minimum amplitude, given by fun, is -0.44626 and it occurs at an x (time) of \(\pm 0.01559\ \mathrm{s}\).

In comparison, the minima of the cosine function occur at a time of \(\pm 0.02\ \mathrm{s}\). In other words, the period appears to be \(0.02 - 0.01559 = 0.00441\ \mathrm{s}\) shorter than the pure waveform, which is...

>>> (0.02 - 0.01559) / 0.02

...about 22% shorter. This means that if we naively estimate frequency by counting peaks or zero crossings, we'll tend to overestimate the peak frequency of the wavelet by about 22% — assuming it is approximately Ricker-like; if it isn't we can use the same method to estimate the error for other functions.

This is good to know, but it would be interesting to know if this parameter depends on frequency, and also to have a more precise way to describe it than a decimal. To get at these questions, we need an analytic solution.

Find minima analytically

Python's SymPy package is a bit like Maple — it understands math symbolically. We'll use sympy.solve to find an analytic solution. It turns out that it needs the Ricker function writing in yet another way, using SymPy symbols and expressions for \(\mathrm{e}\) and \(\pi\).

import sympy as sp
t, f = sp.Symbol('t'), sp.Symbol('f')
r = (1 - 2*(sp.pi*f*t)**2) * sp.exp(-(sp.pi*f*t)**2)

Now we can easily find the solutions to the Ricker equation, that is, the times at which the function is equal to zero:

>>> sp.solvers.solve(r, t)
[-sqrt(2)/(2*pi*f), sqrt(2)/(2*pi*f)]

But this is not quite what we want. We need the minima, not the zero-crossings.

Maybe there's a better way to do this, but here's one way. Note that the gradient (slope or derivative) of the Ricker function is zero at the minima, so let's just solve the first time derivative of the Ricker function. That will give us the three times at which the function has a gradient of zero.

>>> dwdt = sp.diff(r, t)
>>> sp.solvers.solve(dwdt, t)
[0, -sqrt(6)/(2*pi*f), sqrt(6)/(2*pi*f)]

In other words, the non-zero minima of the Ricker function are at:

$$ \pm \frac{\sqrt{6}}{2\pi f} $$

Let's just check that this evaluates to the same answer we got from scipy.optimize, which was 0.01559.

>>> np.sqrt(6) / (2 * np.pi * 25)

The solutions agree.

While we're looking at this, we can also compute the analytic solution to the amplitude of the minima, which SciPy calculated as -0.446. We just plug one of the expressions for the minimum time into the expression for r:

>>> r.subs({t: sp.sqrt(6)/(2*sp.pi*f)})

Apparent frequency

So what's the result of all this? What's the correction we need to make?

The minima of the Ricker wavelet are \(\sqrt{6}\ /\ \pi f_\mathrm{actual}\ \mathrm{s}\) apart — this is the apparent period. If we're assuming a pure tone, this period corresponds to an apparent frequency of \(\pi f_\mathrm{actual}\ /\ \sqrt{6}\ \mathrm{Hz}\). For \(f = 25\ \mathrm{Hz}\), this apparent frequency is:

>>> (np.pi * 25) / np.sqrt(6)

If we were to try to model the data with a Ricker of 32 Hz, the frequency will be too high. We need to multiply the frequency by a factor of \(\sqrt{6} / \pi\), like so:

>>> 32.064 * np.sqrt(6) / (np.pi)

This gives the correct frequency of 25 Hz.

To sum up, rearranging the expression above:

$$ f_\mathrm{actual} = f_\mathrm{apparent} \frac{\sqrt{6}}{\pi} $$

Expressed as a decimal, the factor we were seeking is therefore \(\sqrt{6}\ /\ \pi\):

>>> np.sqrt(6) / np.pi

That is, the reduction factor is 22%.

Curious coincidence: in the recent Pi Day post, I mentioned the Riemann zeta function of 2 as a way to compute \(\pi\). It evaluates to \((\pi / \sqrt{6})^2\). Is there a million-dollar connection between the humble Ricker wavelet and the Riemann hypothesis?

I doubt it.


Happy π day, Einstein

It's Pi Day today, and also Einstein's 139th birthday. MIT celebrates it at 6:28 pm — in honour of pi's arch enemy, tau — by sending out its admission notices.

And Stephen Hawking died today. He will leave a great, black hole in modern science. I saw him lecture in London not long after A Brief History of Time came out. It was one of the events that inspired me along my path to science. I recall he got more laughs than a lot of stand-ups I've seen.

But I can't really get behind 3/14. The weird American way of writing dates, mixed-endian style, really irks me. As a result, I have previously boycotted Pi Day, instead celebrating it on 31/4, aka 31 April, aka 1 May. Admittedly, this takes the edge off the whole experience a bit, so I've decided to go full big-endian and adopt ISO-8601 from now on, which means Pi Day is on 3141-5-9. Expect an epic blog post that day.


Anyway, I will transcend the bickering over dates (pausing only to reject 22/7 and 6/28 entirely so don't even start) to get back to pi. It so happens that Pi Day is of great interest in our house this year because my middle child, Evie (10), is a bit obsessed with pi at the moment. Obsessed enough to be writing a book about it (she writes a lot of books; some previous topics: zebras, Switzerland, octopuses, and Settlers of Catan fan fiction, if that's even a thing).

I helped her find some ways to generate pi numerically. My favourite one uses Riemann's zeta function, which we'd recently watched a Numberphile video about. It's the sum of the reciprocals of the natural numbers raised to increasing powers:

$$\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}$$

Leonhard Euler solved the Basel problem in 1734, proving that \(\zeta(2) = \pi^2 / 6\), so you can compute pi slowly with a naive implementation of the zeta function:

def zeta(s, terms=1000):
    z = 0
    for t in range(1, int(terms)):
        z += 1 / t**s
    return z

(6 * zeta(2, terms=1e7))**0.5

Which returns pi, correct to 6 places:


Or you can use one of the various optimized versions of the zeta function, for example this one from the floating point math library mpmath (which I got from this awesome list of 100 ways to compute pi):

>>> from mpmath import *
>>> mp.dps = 50
>>> mp.pretty = True
>>> sqrt(6*zeta(2))

...which is correct to 50 decimal places.

Here's the bit of Evie's book where she explains a bit about transcendental numbers.

Evie's book shows the relationships between the sets of natural numbers (N), integers (Z), rationals (Q), algebraic numbers (A), and real numbers (R). Transcendental numbers are real, but not algebraic. (Some definitions also let them be complex.)

Evie's book shows the relationships between the sets of natural numbers (N), integers (Z), rationals (Q), algebraic numbers (A), and real numbers (R). Transcendental numbers are real, but not algebraic. (Some definitions also let them be complex.)

I was interested in this, because while I 'knew' that pi is transcendental, I couldn't really articulate what that really meant, and why (say) √2, which is also irrational, is not also transcendental. Succinctly, transcendental means 'non-algebraic' (equivalent to being non-constructible). Since √2 is obviously the solution to \(x^2 - 2 = 0\), it is algebraic and therefore not transcendental. 

Weirdly, although hardly any numbers are known to be transcendental, almost all real numbers are. Isn't maths awesome?

Have a transcendental pi day!

The xkcd comic is by Randall Munroe and licensed CC-BY-NC.

Jounce, Crackle and Pop


I saw this T-shirt recently, and didn't get it. (The joke or the T-shirt.)

It turns out that the third derivative of displacement \(x\) with respect to time \(t\) — that is, the derivative of acceleration \(\mathbf{a}\) — is called 'jerk' (or sometimes, boringly, jolt, surge, or lurch) and is measured in units of m/s³. 

So far, so hilarious, but is it useful? It turns out that it is. Since the force \(\mathbf{F}\) on a mass \(m\) is given by \(\mathbf{F} = m\mathbf{a}\), you can think of jerk as being equivalent to a change in force. The lurch you feel at the onset of a car's acceleration — that's jerk. The designers of transport systems and rollercoasters manage it daily.

$$ \mathrm{jerk,}\ \mathbf{j} = \frac{\mathrm{d}^3 x}{\mathrm{d}t^3}$$

Here's a visualization of velocity (green line) of a Tesla Model S driving in a parking lot. The coloured stripes show the acceleration (upper plot) and the jerk (lower plot). Notice that the peaks in jerk correspond to changes in acceleration.


The snap you feel at the start of the lurch? That's jounce  — the fourth derivative of displacement and the derivative of jerk. Eager et al (2016) wrote up a nice analysis of these quantities for the examples of a trampolinist and roller coaster passenger. Jounce is sometimes called snap... and the next two derivatives are called crackle and pop. 

What about momentum?

If the momentum \(\mathrm{p}\) of a mass \(m\) moving at a velocity \(v\) is \(m\mathbf{v}\) and \(\mathbf{F} = m\mathbf{a}\), what is mass times jerk? According to the physicist Philip Gibbs, who investigated the matter in 1996, it's called yank:

Momentum equals mass times velocity.
Force equals mass times acceleration.
Yank equals mass times jerk.
Tug equals mass times snap.
Snatch equals mass times crackle.
Shake equals mass times pop.

There are jokes in there, help yourself.

What about integrating?

Clearly the integral of jerk is acceleration, and that of acceleration is velocity, the integral of which is displacement. But what is the integral of displacement with respect to time? It's called absement, and it's a pretty peculiar quantity to think about. In the same way that an object with linearly increasing displacement has constant velocity and zero acceleration, an object with linearly increasing absement has constant displacement and zero velocity. (Constant absement at zero displacement gives rise to the name 'absement': an absence of displacement.)

Integrating displacement over time might be useful: the area under the displacement curve for a throttle lever could conceivably be proportional to fuel consumption for example. So absement seems to be a potentially useful quantity, measured in metre-seconds.

Integrate absement and you get absity (a play on 'velocity'). Keep going and you get abseleration, abserk, and absounce. Are these useful quantities? I don't think so. A quick look at them all — for the same Tesla S dataset I used before — shows that the loss of detail from multiple cumulative summations makes for rather uninformative transformations: 


You can reproduce the figures in this article with the Jupyter Notebook Jerk_jounce_etc.ipynb. Or you can launch a Binder right here in your browser and play with it there, without installing a thing!


David Eager et al (2016). Beyond velocity and acceleration: jerk, snap and higher derivatives. Eur. J. Phys. 37 065008. DOI: 10.1088/0143-0807/37/6/065008

Amarashiki (2012). Derivatives of position. The Spectrum of Riemannium blog, retrieved on 4 Mar 2018.

The dataset is from Jerry Jongerius's blog post, The Tesla (Elon Musk) and
New York Times (John Broder) Feud
. I have no interest in the 'feud', I just wanted a dataset.

The T-shirt is from Chummy Tees; the image is their copyright and used here under Fair Use terms.

The vintage Snap, Crackle and Pop logo is copyright of Kellogg's and used here under Fair Use terms.

The norm and simple solutions

Last time I wrote about different ways of calculating distance in a vector space — say, a two-dimensional Euclidean plane like the streets of Portland, Oregon. I showed three ways to reckon the distance, or norm, between two points (i.e. vectors). As a reminder, using the distance between points u and v on the map below this time:

$$ \|\mathbf{u} - \mathbf{v}\|_1 = |u_x - v_x| + |u_y - v_y| $$

$$ \|\mathbf{u} - \mathbf{v}\|_2 = \sqrt{(u_x - v_x)^2 + (u_y - v_y)^2} $$

$$ \|\mathbf{u} - \mathbf{v}\|_\infty = \mathrm{max}(|u_x - v_x|, |u_y - v_y|) $$

Let's think about all the other points on Portland's streets that are the same distance away from u as v is. Again, we have to think about what we mean by distance. If we're walking, or taking a cab, we'll need to think about \(\ell_1\) — the sum of the distances in x and y. This is shown on the left-most map, below.

For simplicity, imagine u is the origin, or (0, 0) in Cartesian coordinates. Then v is (0, 4). The sum of the distances is 4. Looking for points with the same sum, we find the pink points on the map.

If we're thinking about how the crow flies, or \(\ell_2\) norm, then the middle map sums up the situation: the pink points are all equidistant from u. All good: this is what we usually think of as 'distance'.


The \(\ell_\infty\) norm, on the other hand, only cares about the maximum distance in any direction, or the maximum element in the vector. So all points whose maximum coordinate is 4 meet the criterion: (1, 4), (2, 4), (4, 3) and (4, 0) all work.

You might remember there was also a weird definition for the \(\ell_0\) norm, which basically just counts the non-zero elements of the vector. So, again treating u as the origin for simplicity, we're looking for all the points that, like v, have only one non-zero Cartesian coordinate. These points form an upright cross, like a + sign (right).

So there you have it: four ways to draw a circle.

Wait, what?

A circle is just a set of points that are equidistant from the centre. So, depending on how you define distance, the shapes above are all 'circles'. In particular, if we normalize the (u, v) distance as 1, we have the following unit circles:

It turns out we can define any number of norms (if you like the sound of \(\ell_{2.4}\) or \(\ell_{240}\) or \(\ell_{0.024}\)... but most of the time, these will suffice. You can probably imagine the shapes of the unit circles defined by these other norms.

What can we do with this stuff?

Let's think about solving equations. Think about solving this:

$$ x + 2y = 8 $$


I'm sure you can come up with a soluiton in your head, x = 6 and y = 1 maybe. But one equation and two unknowns means that this problem is underdetermined, and consequently has an infinite number of solutions. The solutions can be visualized geometrically as a line in the Euclidean plane (right).

But let's say I don't want solutions like (3.141590, 2.429205) or (2742, –1367). Let's say I want the simplest solution. What's the simplest solution?


This is a reasonable question, but how we answer it depends how we define 'simple'. One way is to ask for the nearest solution to the origin. Also reasonable... but remember that we have a few different ways to define 'nearest'. Let's start with the everyday definition: the shortest crow-flies distance from the origin. The crow-flies, \(\ell_2\) distances all lie on a circle, so you can imagine starting with a tiny circle at the origin, and 'inflating' it until it touches the line \(x + 2y - 8 = 0\). This is usually called the minimum norm solution, minimized on \(\ell_2\). We can find it in Python like so:

    import numpy.linalg as la
    A = [[1, 2]]
    b = [8]
    la.lstsq(A, b)

The result is the vector (1.6, 3.2). You could almost have worked that out in your head, but imagine having 1000 equations to solve and you start to appreciate numpy.linalg. Admittedly, it's even easier in Octave (or MATLAB if you must) and Julia:

    A = [1 2]
    b = [8]
    A \ b

But remember we have lots of norms. It turns out that minimizing other norms can be really useful. For example, minimizing the \(\ell_1\) norm — growing a diamond out from the origin — results in (0, 4). The \(\ell_0\) norm gives the same sparse* result. Minimizing the \(\ell_\infty\) norm leads to \( x = y = 8/3 \approx 2.67\).

This was the diagram I wanted to get to when I started with the 'how far away is the supermarket' business. So I think I'll stop now... have fun with Norm!

* I won't get into sparsity now, but it's a big deal. People doing big computations are always looking for sparse representations of things. They use less memory, are less expensive to compute with, and are conceptually 'neater'. Sparsity is really important in compressed sensing, which has been a bit of a buzzword in geophysics lately.

The norm: kings, crows and taxicabs

How far away is the supermarket from your house? There are lots of ways of answering this question:

  • As the crow flies. This is the green line from \(\mathbf{a}\) to \(\mathbf{b}\) on the map below.
  • The 'city block' driving distance. If you live on a grid of streets, all possible routes are the same length — represented by the orange lines on the map below.
  • In time, not distance. This is usually a more useful answer... but not one we're going to discuss today.

Don't worry about the mathematical notation on this map just yet. The point is that there's more than one way to think about the distance between two points, or indeed any measure of 'size'.


Higher dimensions

The map is obviously two-dimensional, but it's fairly easy to conceive of 'size' in any number of dimensions. This is important, because we often deal with more than the 2 dimensions on a map, or even the 3 dimensions of a seismic stack. For example, we think of raw so-called 3D seismic data as having 5 dimensions (x position, y position, offset, time, and azimuth). We might even formulate a machine learning task with a hundred or more dimensions (or 'features').

Why do we care about measuring distances in high dimensions? When we're dealing with data in these high-dimensional spaces, 'distance' is a useful way to measure the similarity between two points. For example, I might want to select those samples that are close to a particular point of interest. Or, from among the points satisfying some constraint, select the one that's closest to the origin.

Definitions and nomenclature

We'll define norms in the context of linear algebra, which is the study of vector spaces (think of multi-dimensional 'data spaces' like the 5D space of seismic data). A norm is a function that assigns a positive scalar size to a vector \(\mathbf{v}\) , with a size of zero reserved for the zero vector (in the Cartesian plane, the zero vector has coordinates (0, 0) and is usually called the origin). Any norm \(\|\mathbf{v}\|\) of this vector satisfies the following conditions:

  1. Absolutely homogenous. The norm of \(\alpha\mathbf{v}\) is equal to \(|\alpha|\) times the norm of \(\mathbf{v}\).
  2. Subadditive. The norm of \( (\mathbf{u} + \mathbf{v}) \) is less than or equal to the norm of \(\mathbf{u}\) plus the norm of \(\mathbf{v}\). In other words, the norm satisfies the triangle inequality.
  3. Positive. The first two conditions imply that the norm is non-negative.
  4. Definite. Only the zero vector has a norm of 0.

Kings, crows and taxicabs

Let's return to the point about lots of ways to define distance. We'll start with the most familiar definition of distance on a map— the Euclidean distance, aka the \(\ell_2\) or \(L_2\) norm (confusingly, sometimes the two is written as a superscript), the 2-norm, or sometimes just 'the norm' (who says maths has too much jargon?). This is the 'as-the-crow-flies distance' on the map above, and we can calculate it using Pythagoras:

$$ \|\mathbf{v}\|_2 = \sqrt{(a_x - b_x)^2 + (a_y - b_y)^2} $$

You can extend this to an arbitrary number of dimensions, just keep adding the squared elementwise differences. We can also calculate the norm of a single vector in n-space, which is really just the distance between the origin and the vector:

$$ \|\mathbf{u}\|_2 = \sqrt{u_1^2 + u_2^2 + \ldots + u_n^2}  = \sqrt{\mathbf{u} \cdot \mathbf{u}} $$

As shown here, the 2-norm of a vector is the square root of its dot product with itself.

So the crow-flies distance is fairly intuitive... what about that awkward city block distance? This is usually referred to as the Manhattan distance, the taxicab distance, the \(\ell_1\) or \(L_1\) norm, or the 1-norm. As you can see on the map, it's just the sum of the absolute distances in each dimension, x and y in our case:

$$ \|\mathbf{v}\|_1 = |a_x - b_x| + |a_y - b_y| $$

What's this magic number 1 all about? It turns out that the distance metric can be generalized as the so-called p-norm, where p can take any positive value up to infinity. The definition of the p-norm is consistent with the two norms we just met:

$$ \| \mathbf{u} \|_p = \left( \sum_{i=1}^n | u_i | ^p \right)^{1/p} $$

In practice, I've only ever seen p = 1, 2, or infinity (and 0, but we'll get to that). Let's look at the meaning of the \(\infty\)-norm, aka the \(\ell_\infty\) or \(L_\infty\) norm, which is sometimes called the Chebyshev distance or chessboard distance (because it defines the minimum number of moves for a king to any given square):

$$ \|\mathbf{v}\|_\infty = \mathrm{max}(|a_x - b_x|, |a_y - b_y|) $$

In other words, the Chebyshev distance is simply the maximum element in a given vector. In a nutshell, the infinitieth root of the sum of a bunch of numbers raised to the infinitieth power, is the same as the infinitieth root of the largest of those numbers raised to the infinitieth power — because infinity is weird like that.

What about p = 0?

Infinity is weird, but so is zero sometimes. Taking the zeroeth root of a lot of ones doesn't make a lot of sense, so mathematicians often redefine the \(\ell_0\) or \(L_0\) "norm" (not a true norm) as a simple count of the number of non-zero elements in a vector. In other words, we toss out the 0th root, define \(0^0 := 0 \) and do:

$$ \| \mathbf{u} \|_0 = |u_1|^0 + |u_2|^0 + \cdots + |u_n|^0 $$

(Or, if we're thinking about the points \(\mathbf{a}\) and \(\mathbf{b}\) again, just remember that \(\mathbf{v}\) = \(\mathbf{a}\) - \(\mathbf{b}\).)

Computing norms

Let's take a quick look at computing the norm of some vectors in Python:

>>> import numpy as np

>>> a = np.array([1, 1]).T
>>> b = np.array([6, 5]).T

>>> L_0 = np.count_nonzero(a - b)

>>> L_1 = np.sum(np.abs(a - b))

>>> L_2 = np.sqrt((a - b) @ (a - b))

>>> L_inf = np.max(np.abs(a - b))

>>> # Using NumPy's `linalg` module:
>>> import numpy.linalg as la
>>> for p in (0, 1, 2, np.inf):
>>>    print("L_{} norm = {}".format(p, la.norm(a - b, p)))
L_0 norm = 2.0
L_1 norm = 9.0
L_2 norm = 6.4031242374328485
L_inf norm = 5.0

What can we do with all this?

So far, so good. But what's the point of these metrics? How can we use them to solve problems? We'll get into that in a future post, so don't go too far!

For now I'll leave you to play with this little interactive demo of the effect of changing p-norms on a Voronoi triangle tiling — it's by Sarah Greer, a geophysics student at UT Austin. 

The quick green forsterite jumped over the lazy dolomite

The best-known pangram — a sentence containing every letter of the alphabet —  is probably

The quick brown fox jumped over the lazy dog.

There are lots of others of course. If you write like James Joyce, there are probably an infinite number of others. The point is to be short, and one of the shortest, with only 29 letters (!), even has a geological flavour:

Sphinx of black quartz, judge my vow.

I know what you're thinking: Cool, but what's the shortest set of mineral names that uses all the letters of the alphabet? What logophiliac geologist would not wonder the same thing?

Well, we posed this question in the most recent "Riddle me this" segment on the Undersampled Radio podcast. This blog post is my solution.

The set cover problem

Finding pangrams in a list of words amounts to solving the classical set cover problem:

Given a set of elements \(\{U_1, U_2,\ldots , U_n\}\) (called the ‘universe’) and a collection \(S\) of \(m\) sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of \(S\) whose union equals (or ‘covers’) the universe.

Our universe is the alphabet, and our \(S\) is the list of \(m\) mineral names. There is a slight twist in our case: the set cover problem wants the smallest subset of \(S\) — the fewest members. But in this problem, I suspect there are several 4-word solutions (judging from my experiments), so I want the smallest total size of the members of the subset. That is, I want the fewest total letters in the solution.

The solution

The set cover problem was shown to be NP-complete in 1972. What does this mean? It means that it's easy to tell if you have an answer (do you have all the letters of the alphabet?), but the only way to arrive at a solution is — to oversimplify massively — by brute force. (If you're interested in this stuff, this edition of the BBC's In Our Time is one of the best intros to P vs NP and complexity theory that I know of.)

Anyway, the point is that if we find a better way than brute force to solve this problem, then we need to write a paper about it immediately, claim our prize, collect our turkey, then move to a sunny tax haven with good water and double-digit elevation.

So, this could take a while: there are over 95 billion ways to draw 3 words from my list of 4600 mineral names. If we need 4 minerals, there are 400 trillion combinations... and a quick calculation suggests that my laptop will take a little over 50 years to check all the combinations. 

Can't we speed it up a bit?

Brute force is one thing, but we don't need to be brutish about it. Maybe we can think of some strategies to give ourselves a decent chance:

  • The list is alphabetically sorted, so randomize the list before searching. (I did this.)
  • Guess some 'useful' minerals and ensure that you get to them. (I did this too, with quartz.)
  • Check there are at least 26 letters in the candidate words, and (if it's only records we care about) no more than 44, because I have a solution with 45 letters (see below).
  • We could sort the list into word length order. That way we search shorter things first, so we should get shorter lists (which we want) earlier.
  • My solution does not depend much on Python's set type. Maybe we could do more with set theory.
  • Before inspecting the last word in each list, we could make sure it contains at least one letter that's so far missing.

So far, the best solution I've come up with so far has 45 letters, so there's plenty of room for improvement:

'quartz', 'kvanefjeldite', 'abswurmbachite', 'pyroxmangite'

My solution is in this Jupyter Notebook. Please put me out of my misery by improving on it.

Great geophysicists #13: Poisson

Siméon Denis Poisson was born in Pithiviers, France, on 21 June 1781. While still a teenager, Poisson entered the prestigious École Polytechnique in Paris, and published his first papers in 1800. He was immediately befriended — or adopted, really — by Lagrange and Laplace. So it's safe to say that he got off to a pretty good start as a mathematician. The meteoric trajectory continued throughout his career, as Poisson received more or less every honour a French scientist could accumulate. Along with Laplace and Lagrange — as well as Fresnel, Coulomb, Lamé, and Fourier — his is one of the 72 names on the Eiffel Tower.

Wrong Poisson

In the first few decades following the French Revolution, which ended in 1799, France enjoyed a golden age of science. The Société d’Acrueil was a regular meeting of savants, hosted by Laplace and the chemist Claude Louis Berthollet, and dedicated to the exposition of physical phenomena. The group worked on problems like the behaviour of gases, the physics of sound and light, and the mechanics of deformable materials. Using Newton's then 120-year-old law of gravitation as an analogy, the prevailing school of thought accounted for all physical phenomena in terms of forces acting between particles. 

Poisson was not flawless. As one of the members of this intellectual inner circle, Poisson was devoted to the corpuscular theory of light. Indeed, he dismissed the wave theory of light completely, until proven wrong by Thomas Young and, most conspicuously, Augustin-Jean Fresnel. Even Poisson's ratio, the eponymous elastic modulus, wasn't the result of his dogged search for truth, but instead represents a controversy that drove the development of the three-dimensional theory of elasticity. More on this next time.

The workaholic

Although he did make time for his wife and four children — but only after 6 pm — Poisson apparently had little time for much besides mathematics. His catchphrase was

Life is only good for two things: doing mathematics and teaching it.

In the summer of 1838, he learned he had a form of tuberculosis. According to James (2002), he was unable to take time away from work for long enough to recuperate. Eventually, insisting on conducting the final exams at the Polytechnique for the 23rd year in a row, he took on more than he could handle. He died on 20 April 1840. 


Grattan-Guinness, I. (1990). Convolutions in French Mathematics, 1800-1840: From the Calculus and Mechanics to Mathematical Analysis and Mathematical Physics. Vol.1: The Setting. Springer Science & Business Media. 549 pages.

Ioan James, I (2002). Remarkable Mathematicians: From Euler to Von Neumann. Cambridge University Press, 433 pages.

The University of St Andrews MacTutor archive article on Poisson.

Deriving equations in Python

Last week I wrote about the elastic moduli, and showed the latest version of my table of equations. Here it is; click on it for a large version:

Making this grid was a bit of an exercise in itself. One could spend some happy hours rearranging things by hand; instead, I spent some (mostly) happy hours learning to use SymPy, a symbolic maths library for Python. For what it's worth, you can see my flailing in this Jupyter Notebook. Warning: it's pretty untidy.

Wrangling equations

Fortunately, SymPy is easy to get started with. Let's look at getting an expression for \(V_\mathrm{P}\) in terms of \(E\) and \(K\), given that I already have an expression in terms of \(E\) and \(\mu\), plus an expression for \(\mu\) in terms of \(E\) and \(K\).

First we import the SymPy library, set it up for nice math display in the Notebook, and initialize some parameter names:

>>> import sympy
>>> sympy.init_printing(use_latex='mathjax')
>>> lamda, mu, nu, E, K, rho = sympy.symbols("lamda, mu, nu, E, K, rho")

lamda is not a typo: lambda means something else in Python — it's a sort of unnamed function.

Now we're ready to define an expression. First, I'll import SymPy's own square root function for convenience. Then I define an expression for \(V_\mathrm{P}\) in terms of \(E\) and \(\mu\):

>>> vp_expr = sympy.sqrt((mu * (E - 4*mu)) / (rho * (E - 3*mu)))
>>> vp_expr

$$ \sqrt{\frac{\mu \left(E - 4 \mu\right)}{\rho \left(E - 3 \mu\right)}} $$

Now we can give SymPy the expression for \(\mu\) in terms of \(E\) and \(K\) and substitute:

>>> mu_expr = (3 * K * E) / (9 * K - E)
>>> vp_new = vp_expr.subs(mu, mu_expr)
>>> vp_new

$$\sqrt{3} \sqrt{\frac{E K \left(- \frac{12 E K}{- E + 9 K} + E\right)}{\rho \left(- E + 9 K\right) \left(- \frac{9 E K}{- E + 9 K} + E\right)}}$$

Argh, what is that?? Luckily, it's easy to simplify:

>>> sympy.simplify(vp_new)

$$\sqrt{3} \sqrt{\frac{K \left(E + 3 K\right)}{\rho \left(- E + 9 K\right)}}$$

That's more like it! What's really cool is that SymPy can even generate the \(\LaTeX\) code for your favourite math renderer:

>>> print(sympy.latex(sympy.simplify(vp_new)))
\sqrt{3} \sqrt{\frac{K \left(E + 3 K\right)}{\rho \left(- E + 9 K\right)}}

That's all there is to it!

What is the mystery X?

Have a look at the expression for  \(V_\mathrm{P}\) in terms of \(E\) and \(\lambda\):


$$\frac{\sqrt{2}}{2} \sqrt{\frac{1}{\rho} \left(E - \lambda + \sqrt{E^{2} + 2 E \lambda + 9 \lambda^{2}}\right)}$$

I find this quantity — I call it \(X\) in the big table of equations — really curious:


$$ X = \sqrt{9\lambda^2 + 2E\lambda + E^2} $$

As you can see from the similar table on Wikipedia, a similar quantity appears in expressions in terms of \(E\) and \(M\). These quantities look like elastic moduli, and even have the right units and order of magnitude as the others. If anyone has thoughts on what significance it might have, if any, or on why expressions in terms of \(E\) and \(\lambda\) or \(M\) should be so uncommonly clunky, I'm all ears. 

One last thing... I've mentioned Melvyn Bragg's wonderful BBC radio programme In Our Time before. If you like listening to the radio, try this recent episode on the life and work of Robert Hooke. Not only did he invent the study of elasticity with his eponymous law, he was also big in microscopy, describing things like the cellular structure of cork in detail (right).

All the elastic moduli

An elastic modulus is the ratio of stress (pressure) to strain (deformation) in an isotropic, homogeneous elastic material:

$$ \mathrm{modulus} = \frac{\mathrm{stress}}{\mathrm{strain}} $$

OK, what does that mean?

Elastic means what you think it means: you can deform it, and it springs back when you let go. Imagine stretching a block of rubber, like the picture here. If you measure the stress \(F/W^2\) (i.e. the pressure is force per unit of cross-sectional area) and strain \(\Delta L/L\) (the stretch as a proportion) along the direction of stretch ('longitudinally'), then the stress/strain ratio gives you Young's modulus, \(E\).

Since strain is unitless, all the elastic moduli have units of pressure (pascals, Pa), and is usually on the order of tens of GPa (billions of pascals) for rocks. 

The other elastic moduli are: 

There's another quantity that doesn't fit our definition of a modulus, and doesn't have units of pressure — in fact it's unitless —  but is always lumped in with the others: 

What does this have to do with my data?

Interestingly, and usefully, the elastic properties of isotropic materials are described completely by any two moduli. This means that, given any two, we can compute all of the others. More usefully still, we can also relate them to \(V_\mathrm{P}\), \(V_\mathrm{S}\), and \(\rho\). This is great because we can get at those properties easily via well logs and less easily via seismic data. So we have a direct path from routine data to the full suite of elastic properties.

The only way to measure the elastic moduli themselves is on a mechanical press in the laboratory. The rock sample can be subjected to confining pressures, then squeezed or stretched along one or more axes. There are two ways to get at the moduli:

  1. Directly, via measurements of stress and strain, so called static conditions.
  2. Indirectly, via sonic measurements and the density of the sample. Because of the oscillatory and transient nature of the sonic pulses, we call these dynamic measurements. In principle, these should be the most comparable to the measurements we make from well logs or seismic data.

Let's see the equations then

The elegance of the relationships varies quite a bit. Shear modulus \(\mu\) is just \(\rho V_\mathrm{S}^2\), but Young's modulus is not so pretty:

$$ E = \frac{\rho V_\mathrm{S}^2 (3 V_\mathrm{P}^2 - 4 V_\mathrm{S}^2 }{V_\mathrm{P}^2 - V_\mathrm{S}^2} $$

You can see most of the other relationships in this big giant grid I've been slowly chipping away at for ages. Some of it is shown below. It doesn't have most of the P-wave modulus expressions, because no-one seems too bothered about P-wave modulus, despite its obvious resemblance to acoustic impedance. They are in the version on Wikipedia, however (but it lacks the \(V_\mathrm{P}\) and \(V_\mathrm{S}\) expressions).

Some of the expressions for the elastic moduli and velocities — click the image to see them all in SubSurfWiki.

Some of the expressions for the elastic moduli and velocities — click the image to see them all in SubSurfWiki.

In this table, the mysterious quantity \(X\) is given by:

$$ X = \sqrt{9\lambda^2 + 2E\lambda + E^2} $$

In the next post, I'll come back to this grid and tell you how I've been deriving all these equations using Python.

Top tip... To find more posts on rock physics, click the Rock Physics tag below!

All the time freaks

SEG 2014Thursday was our last day at the SEG Annual Meeting. Evan and I took in the Recent developments in time-frequency analysis workshop, organized by Mirko van der Baan, Sergey Fomel, and Jean-Baptiste Tary (Vienna). The workshop came out of an excellent paper I reviewed this summer, which was published online a couple of weeks ago:

Tary, JB, RH Herrera, J Han, and M van der Baan (2014), Spectral estimation—What is new? What is next?, Rev. Geophys. 52. doi:10.1002/2014RG000461.

The paper compares the results of several time–frequency transforms on a suite of 'benchmark' signals. The idea of the workshop was to invite further investigation or other transforms. The organizers did a nice job of inviting contributors with diverse interests and backgrounds. The following people gave talks, several of them sharing their code (*):

  • John Castagna (Lumina) with a review of the applications of spectral decomposition for seismic analysis.
  • Steven Lin (NCU, Taiwan) on empirical methods and the Hilbert–Huang transform.
  • Hau-Tieng Wu (Toronto) on the application of transforms to monitoring respiratory patterns in animals.*
  • Marcílio Matos (SISMO) gave an entertaining, talk about various aspects of the problem.
  • Haizhou Yang (Standford) on synchrosqueezing transforms applied to problems in anatomy.*
  • Sergey Fomel (UT Austin) on Prony's method... and how things don't always work out.*
  • Me, talking about the fidelity of time–frequency transforms, and some 'unsolved problems' (for me).*
  • Mirko van der Baan (Alberta) on the results from the Tary et al. paper.

Some interesting discussion came up in the two or three unstructured parts of the session, organized as mini-panel discussions with groups of authors. Indeed, it felt like the session could have lasted longer, because I don't think we got very close to resolving anything. Some of the points I took away from the discussion:

  • My observation: there is no existing survey of the performance of spectral decomposition (or AVO) — these would be great risking tools.
  • Castagna's assertion: there is no model that predicts the low-frequency 'shadow' effect (confusingly it's a bright thing, not a shadow).
  • There is no agreement on whether the so-called 'Gabor limit' of time–frequency localization is a lower-bound on spectral decomposition. I will write more about this in the coming weeks.
  • Should we even be attempting to use reassignment, or other 'sharpening' tools, on broadband signals? To put it another way: does instantaneous frequency mean anything in seismic signals?
  • What statistical measures might help us understand the amount of reassignment, or the precision of time–frequency decompositions in general?

The fidelity of time–frequency transforms

My own talk was one of the hardest I've ever done, mainly because I don't think about these problems very often. I'm not much of a mathematician, so when I do think about them, I tend to have more questions than insights, so I made my talk into a series of questions for the audience. I'm not sure I got much closer to any answers, but I have a better idea of my questions now... which is a kind of progress I suppose.

Here's my talk (latest slidesGitHub repo). Comments and feedback are, as always, welcome.