The right writing tools

Scientists write, it's part of the job. If writing feels laborious, it might be because you haven't found the right tools yet. 

The wrong tools <cough>Word</cough> feel like a lot of work. You spend a lot of time fiddling with font sizes and not being sure whether to use italic or bold. You're constantly renumbering sections after edits. Everything moves around when you resize a figure. Tables are a headache. Table of contents? LOL.

If this sounds familiar, check out the following tools — arranged more or less in order of complexity.

Markdown

If you've never experienced writing with a markup language, you're in for a treat. At first it might feel clunky, but it quickly gets out of the way, leaving you to focus on the writing. Markdown was invented by John Gruber in about 2004; it is now almost ubiquitous in tools for developers. It's very lightweight, but compatible with HTML and LaTeX math, so it has plenty of features. Styling is absent from the document itself, being applied enitrely in post-production, as it were. With help from pandoc, you can compile Markdown documents to almost any format (e.g. PDF or Word). As a result, Markdown is sufficient for at least 70% of my writing projects. Here's a sampling of Markdown markup, rendered on the right with no styling:

Markdown_raw.png
Markdown_render.png

Jupyter Notebook

If you've been following along with our X Lines of Python series, or any of our other code-centric content, you'll have come across Jupyter Notebooks. These documents combine Markdown with code (in more or less any language you can think of) and the outputs of code — data, charts, images, etc. More than containing code, a so-called kernel can also run the code: Notebooks are fully computable documents. Not only could you write a paper or book in a Notebook, many people use them to give presentations with fully interactive, live code blocks and widgets.

Notebook_example.png
latex_folder___by_missyobo-d3azzbh.png

LaTeX

I discovered LaTeX in about 1993 and it was love at first sight. I've always been a bit of a typography nerd, and LaTeX — like TeX, around which LaTeX is wrapped — really cares about typography. So you get ligatures, hyphenation, sentence spacing, and kerning for free. It also cares about mathematics, cross-references, bibliographies, page numbering, tables of contents, and everything else you need for publication-ready documents.

You can install LaTeX locally, but there are several ways to use LaTeX online, without installing anything — and you get the best of both worlds: markup with WYSIWYG editing. OverleafShareLaTex (which is merging with Overleaf), Authorea, and Papeeria are all worth a look, especially if you write scientific papers.

When WYSISYG works

Sometimes you just want a couple of headings and some text, or you need to share a document with others. I often go for WYSISYG in those situations too — Google Docs is the best WYSIWYG editor I've used. When it supports Markdown too, which is surely only a matter of time, it will be perfect.

What about you, do you have a favourite writing tool? Share it in the comments.

Is geolocation the new lat-long?

location_pick-up.png

If I want to hail a ride-sharing service I can give an address as my location. Unless I am at the rear of the building or in the parking lot across the street, in which case I need to fiddle about with a pin on a slippy map. No problem, I can usually eyeball it and then just look out for the driver.

But there are other situations where an address won't do, and fiddling with pins isn't an option either. Telling the pizza company exactly where the delivery drone should land. Calling an ambulance to a remote area. Specifying a well location in the desert. Doing almost anything in the desert.

Wait, isn't this what latitude and longitude are for?

Sort of. I mean, it is what lat and long are for, but they aren't all that good at it. For one thing, there's the annoying problem of datums, which is even more annoying because hardly anyone realizes it's a problem.

Then there's the fact that to get better than 10 metre accuracy, you need 5 decimal places (or 1 decimal place in the seconds if we're talking DMS). So, even without the all-important datum, your average lat-long pair in N America would need at least 18 characters in decimal notation: 44.44845N64.37565W. This is not terribly user-friendly.

What are the alternatives?

In the last ten years or so, several alternatives to addresses and lat-longs have emerged. For example, one interesting geocoding system — geohash.org — encodes locations as strings of letters and digits. The location of my fictional 'pick up point' would be dxfhz5e4fxs. A bit of a mouthful perhaps but the system helpfully omits some easily confused characters, like lower-case L, and is case-insensitive. Another really nice feature: the string is big-endian so you can remove characters from the right to get a bit less precision.

In 2013, what3words burst onto the geolocation scene with an ingenious proprietary algorithm uniquely transforms the location of every 3 x 3 m square on Earth into three pronounceable words. My pick-up location becomes a cryptic crossword clue: dreadlocks.boarded.pageant. One feature of the scheme is that similar-sounding locations are not neighbours, avoiding near-miss confusion. For example, the square to the west of mine is called lawmakers.sieves.breezes, and the similar-sounding deadlock.boarded.pageant is in the middle of a field in New South Wales. Mercedes and Dominoes Pizza are experimenting with what3words.

More recently still, Open Location Codes have got some traction. Also called plus codes — a clue that they were invented by Google — they are a really nice example of a well-executed open standard, with fully open code, reference implementations in lots of languages, a public-facing website, and great documentation. My location's plus code is 87PQCJXF+9Q. Like the geohash, it can be shortened — but only in particular ways, for example, I could give the code as CJXF+9Q, Mahone Bay. Unlike what3words, Open Location Codes are free to use. My guess is that we'll be seeing them all over the place as self-driving cars and drones become more widespread.

Mapcodes, developed in about 2001, are yet another implementation of geocodes. Their main feature is the use of very short codes for densely populated places. However, there are some problems. For example, codes can be specified with or without country and region codes — but the different versions do not resemble each other.

For comparison, here's how I might describe my pick up location on the map at the top of this post:

 
geocodes_again.png
 

Useful for geoscience?

They certainly seem easier to wield that lat-long, and you don't need to worry about datums anymore, but perhaps they feel too new or ephemeral to catch on for some geospatial practitioners. I also wonder why no-one seems to have thought about the 3D problem yet... Which floor of the apartment building is this pizza going to? How far down this mine is the heart attack victim? At exactly what depth in this lake was the wreckage of the self-driving car found?

What do you think? will any of these schemes gain traction? Might any of them be useful in science or engineering applications? Will you be experimenting with them?


I can't leave the subject of geolocation codes without mentioning geohashing — a sort of cross between geocaching and professional nerdism. Invented in 2008 by xkcd creator, Randall Munroe, geohashing involves generating random locations via an MD5 hash, then visiting that location without getting lost or beaten up.

Update: You can access this algorithm right from Python: from antigravity import geohash

On principles and creativity

I recently heard a quote that resonated with me:

 
Bernbach_principles.png
 

I grapple with this sentiment whenever I feel the selfish twinge of hesitation to donate money to Wikipedia or QGIS, or pay page fees for open access to an article, or otherwise cough up for my convictions.

Curious about who had uttered this wisdom, I looked it up. Turns out it was Bill Bernbach, celebrated advertiser, and supposedly an inspiration for the Don Draper character in Mad Men.

Bernbach_beetle_small.png

One of the founders of Doyle Dane Bernbach, now known as DDB, he brought bare-faced truth to the forefront in advertising, calling VW Beetles 'small', and proudly declaring Avis 'number 2'. He basically invented Apple's entire aesthetic in the late 50's , about four decades before Apple started to 'Think Different'.

He said some other true things. This could be about scientific communication:

The truth isn't the truth until people believe you, and they can't believe you if they don't know what you're saying, and they can't know what you're saying if they don't listen to you, and they won't listen to you if you're not interesting, and you won't be interesting unless you say things imaginatively, originally, freshly.

Now 'science' and 'truth' are not the same thing, so I don't want to try to claim that this sums things up perfectly, but I think the general point is important and we'll be better scientists if we live by it.

And I like this one too:

 
Bernbach_creativity.png
 

Too many organizations, and individuals, think their advantage must come from money, or secrecy, or patents, or other obvious, easily copied, things. But thinking about your creative edge first makes you take care of important things, and stop worrying about unimportant things.

I think this is an important idea, because creativity is, almost by definition, uncopyable. It feels like a slippery thing to build a company on or strategize about because while there's a limitless supply of the stuff, it's hard to maintain — and exploit. Creativity for its own sake is almost useless, but combined with a "Just ship it!" mentality, it's an unstoppable force.


The image of Bill Bernbach and the VW Beetle ad are both copyright of DDB Worldwide Communications Group and low-res images are used here in accordance with fair use rules. 

Real and apparent seismic frequency

There's a Jupyter Notebook for you to follow along with this tutorial. You can run it right here in your browser.


We often use Ricker wavelets to model seismic, for example when making a synthetic seismogram with which to help tie a well. One simple way to guesstimate the peak or central frequency of the wavelet that will model a particlar seismic section is to count the peaks per unit time in the seismic. But this tends to overestimate the actual frequency because the maximum frequency of a Ricker wavelet is more than the peak frequency. The question is, how much more?

To investigate, let's make a Ricker wavelet and see what it looks like in the time and frequency domains.

>>> T, dt, f = 0.256, 0.001, 25

>>> import bruges
>>> w, t = bruges.filters.ricker(T, dt, f, return_t=True)

>>> import scipy.signal
>>> f_W, W = scipy.signal.welch(w, fs=1/dt, nperseg=256)
The_frequency_of_a_Ricker_2_0.png

When we count the peaks in a section, the assumption is that this apparent frequency — that is, the reciprocal of apparent period or distance between the extrema — tells us the dominant or peak frequency.

To help see why this assumption is wrong, let's compare the Ricker with a signal whose apparent frequency does match its peak frequency: a pure cosine:

>>> c = np.cos(2 * 25 * np.pi * t)
>>> f_C, C = scipy.signal.welch(c, fs=1/dt, nperseg=256)
The_frequency_of_a_Ricker_4_0.png

Notice that the signal is much narrower in bandwidth. If we allowed more oscillations, it would be even narrower. If it lasted forever, it would be a spike in the frequency domain.

Let's overlay the signals to get a picture of the difference in the relative periods:

The_frequency_of_a_Ricker_6_1.png

The practical consequence of this is that if we estimate the peak frequency to be \(f\ \mathrm{Hz}\), then we need to reduce \(f\) by some factor if we want to design a wavelet to match the data. To get this factor, we need to know the apparent period of the Ricker function, as given by the time difference between the two minima.

Let's look at a couple of different ways to find those minima: numerically and analytically.

Find minima numerically

We'll use scipy.optimize.minimize to find a numerical solution. In order to use it, we'll need a slightly different expression for the Ricker function — casting it in terms of a time basis t. We'll also keep f as a variable, rather than hard-coding it in the expression, to give us the flexibility of computing the minima for different values of f.

Here's the equation we're implementing:

$$ w(t, f) = (1 - 2\pi^2 f^2 t^2)\ e^{-\pi^2 f^2 t^2} $$

In Python:

>>> def ricker(t, f):
>>>     return (1 - 2*(np.pi*f*t)**2) * np.exp(-(np.pi*f*t)**2)

Check that the wavelet looks like it did before, by comparing the output of this function when f is 25 with the wavelet w we were using before:

>>> f = 25
>>> np.allclose(w, ricker(t, f=25))
True

Now we call SciPy's minimize function on our ricker function. It itertively searches for a minimum solution, then gives us the x (which is really t in our case) at that minimum:

>>> import scipy.optimize
>>> f = 25
>>> scipy.optimize.minimize(ricker, x0=0, args=(f))

fun: -0.4462603202963996
 hess_inv: array([[1]])
      jac: array([-2.19792128e-07])
  message: 'Optimization terminated successfully.'
     nfev: 30
      nit: 1
     njev: 10
   status: 0
  success: True
        x: array([0.01559393])

So the minimum amplitude, given by fun, is -0.44626 and it occurs at an x (time) of \(\pm 0.01559\ \mathrm{s}\).

In comparison, the minima of the cosine function occur at a time of \(\pm 0.02\ \mathrm{s}\). In other words, the period appears to be \(0.02 - 0.01559 = 0.00441\ \mathrm{s}\) shorter than the pure waveform, which is...

>>> (0.02 - 0.01559) / 0.02
0.22050000000000003

...about 22% shorter. This means that if we naively estimate frequency by counting peaks or zero crossings, we'll tend to overestimate the peak frequency of the wavelet by about 22% — assuming it is approximately Ricker-like; if it isn't we can use the same method to estimate the error for other functions.

This is good to know, but it would be interesting to know if this parameter depends on frequency, and also to have a more precise way to describe it than a decimal. To get at these questions, we need an analytic solution.

Find minima analytically

Python's SymPy package is a bit like Maple — it understands math symbolically. We'll use sympy.solve to find an analytic solution. It turns out that it needs the Ricker function writing in yet another way, using SymPy symbols and expressions for \(\mathrm{e}\) and \(\pi\).

import sympy as sp
t, f = sp.Symbol('t'), sp.Symbol('f')
r = (1 - 2*(sp.pi*f*t)**2) * sp.exp(-(sp.pi*f*t)**2)

Now we can easily find the solutions to the Ricker equation, that is, the times at which the function is equal to zero:

>>> sp.solvers.solve(r, t)
[-sqrt(2)/(2*pi*f), sqrt(2)/(2*pi*f)]

But this is not quite what we want. We need the minima, not the zero-crossings.

Maybe there's a better way to do this, but here's one way. Note that the gradient (slope or derivative) of the Ricker function is zero at the minima, so let's just solve the first time derivative of the Ricker function. That will give us the three times at which the function has a gradient of zero.

>>> dwdt = sp.diff(r, t)
>>> sp.solvers.solve(dwdt, t)
[0, -sqrt(6)/(2*pi*f), sqrt(6)/(2*pi*f)]

In other words, the non-zero minima of the Ricker function are at:

$$ \pm \frac{\sqrt{6}}{2\pi f} $$

Let's just check that this evaluates to the same answer we got from scipy.optimize, which was 0.01559.

>>> np.sqrt(6) / (2 * np.pi * 25)
0.015593936024673521

The solutions agree.

While we're looking at this, we can also compute the analytic solution to the amplitude of the minima, which SciPy calculated as -0.446. We just plug one of the expressions for the minimum time into the expression for r:

>>> r.subs({t: sp.sqrt(6)/(2*sp.pi*f)})
-2*exp(-3/2)

Apparent frequency

So what's the result of all this? What's the correction we need to make?

The minima of the Ricker wavelet are \(\sqrt{6}\ /\ \pi f_\mathrm{actual}\ \mathrm{s}\) apart — this is the apparent period. If we're assuming a pure tone, this period corresponds to an apparent frequency of \(\pi f_\mathrm{actual}\ /\ \sqrt{6}\ \mathrm{Hz}\). For \(f = 25\ \mathrm{Hz}\), this apparent frequency is:

>>> (np.pi * 25) / np.sqrt(6)
32.06374575404661

If we were to try to model the data with a Ricker of 32 Hz, the frequency will be too high. We need to multiply the frequency by a factor of \(\sqrt{6} / \pi\), like so:

>>> 32.064 * np.sqrt(6) / (np.pi)
25.00019823475659

This gives the correct frequency of 25 Hz.

To sum up, rearranging the expression above:

$$ f_\mathrm{actual} = f_\mathrm{apparent} \frac{\sqrt{6}}{\pi} $$

Expressed as a decimal, the factor we were seeking is therefore \(\sqrt{6}\ /\ \pi\):

>>> np.sqrt(6) / np.pi
0.779696801233676

That is, the reduction factor is 22%.


Curious coincidence: in the recent Pi Day post, I mentioned the Riemann zeta function of 2 as a way to compute \(\pi\). It evaluates to \((\pi / \sqrt{6})^2\). Is there a million-dollar connection between the humble Ricker wavelet and the Riemann hypothesis?

I doubt it.

 
 

Happy π day, Einstein

It's Pi Day today, and also Einstein's 139th birthday. MIT celebrates it at 6:28 pm — in honour of pi's arch enemy, tau — by sending out its admission notices.

And Stephen Hawking died today. He will leave a great, black hole in modern science. I saw him lecture in London not long after A Brief History of Time came out. It was one of the events that inspired me along my path to science. I recall he got more laughs than a lot of stand-ups I've seen.

But I can't really get behind 3/14. The weird American way of writing dates, mixed-endian style, really irks me. As a result, I have previously boycotted Pi Day, instead celebrating it on 31/4, aka 31 April, aka 1 May. Admittedly, this takes the edge off the whole experience a bit, so I've decided to go full big-endian and adopt ISO-8601 from now on, which means Pi Day is on 3141-5-9. Expect an epic blog post that day.

Transcendence

Anyway, I will transcend the bickering over dates (pausing only to reject 22/7 and 6/28 entirely so don't even start) to get back to pi. It so happens that Pi Day is of great interest in our house this year because my middle child, Evie (10), is a bit obsessed with pi at the moment. Obsessed enough to be writing a book about it (she writes a lot of books; some previous topics: zebras, Switzerland, octopuses, and Settlers of Catan fan fiction, if that's even a thing).

I helped her find some ways to generate pi numerically. My favourite one uses Riemann's zeta function, which we'd recently watched a Numberphile video about. It's the sum of the reciprocals of the natural numbers raised to increasing powers:

$$\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}$$

Leonhard Euler solved the Basel problem in 1734, proving that \(\zeta(2) = \pi^2 / 6\), so you can compute pi slowly with a naive implementation of the zeta function:

 
def zeta(s, terms=1000):
    z = 0
    for t in range(1, int(terms)):
        z += 1 / t**s
    return z

(6 * zeta(2, terms=1e7))**0.5

Which returns pi, correct to 6 places:

 
3.141592558095893

Or you can use one of the various optimized versions of the zeta function, for example this one from the floating point math library mpmath (which I got from this awesome list of 100 ways to compute pi):

 
>>> from mpmath import *
>>> mp.dps = 50
>>> mp.pretty = True
>>>
>>> sqrt(6*zeta(2))
3.1415926535897932384626433832795028841971693993751068

...which is correct to 50 decimal places.

Here's the bit of Evie's book where she explains a bit about transcendental numbers.

Evie's book shows the relationships between the sets of natural numbers (N), integers (Z), rationals (Q), algebraic numbers (A), and real numbers (R). Transcendental numbers are real, but not algebraic. (Some definitions also let them be complex.)

Evie's book shows the relationships between the sets of natural numbers (N), integers (Z), rationals (Q), algebraic numbers (A), and real numbers (R). Transcendental numbers are real, but not algebraic. (Some definitions also let them be complex.)

I was interested in this, because while I 'knew' that pi is transcendental, I couldn't really articulate what that really meant, and why (say) √2, which is also irrational, is not also transcendental. Succinctly, transcendental means 'non-algebraic' (equivalent to being non-constructible). Since √2 is obviously the solution to \(x^2 - 2 = 0\), it is algebraic and therefore not transcendental. 

Weirdly, although hardly any numbers are known to be transcendental, almost all real numbers are. Isn't maths awesome?

Have a transcendental pi day!


The xkcd comic is by Randall Munroe and licensed CC-BY-NC.

Jounce, Crackle and Pop

jerk_shirt.png

I saw this T-shirt recently, and didn't get it. (The joke or the T-shirt.)

It turns out that the third derivative of displacement \(x\) with respect to time \(t\) — that is, the derivative of acceleration \(\mathbf{a}\) — is called 'jerk' (or sometimes, boringly, jolt, surge, or lurch) and is measured in units of m/s³. 

So far, so hilarious, but is it useful? It turns out that it is. Since the force \(\mathbf{F}\) on a mass \(m\) is given by \(\mathbf{F} = m\mathbf{a}\), you can think of jerk as being equivalent to a change in force. The lurch you feel at the onset of a car's acceleration — that's jerk. The designers of transport systems and rollercoasters manage it daily.

$$ \mathrm{jerk,}\ \mathbf{j} = \frac{\mathrm{d}^3 x}{\mathrm{d}t^3}$$

Here's a visualization of velocity (green line) of a Tesla Model S driving in a parking lot. The coloured stripes show the acceleration (upper plot) and the jerk (lower plot). Notice that the peaks in jerk correspond to changes in acceleration.

jerk_velocity_acceleration.png
jerk_velocity_jerk.png

The snap you feel at the start of the lurch? That's jounce  — the fourth derivative of displacement and the derivative of jerk. Eager et al (2016) wrote up a nice analysis of these quantities for the examples of a trampolinist and roller coaster passenger. Jounce is sometimes called snap... and the next two derivatives are called crackle and pop. 

What about momentum?

If the momentum \(\mathrm{p}\) of a mass \(m\) moving at a velocity \(v\) is \(m\mathbf{v}\) and \(\mathbf{F} = m\mathbf{a}\), what is mass times jerk? According to the physicist Philip Gibbs, who investigated the matter in 1996, it's called yank:

Momentum equals mass times velocity.
Force equals mass times acceleration.
Yank equals mass times jerk.
Tug equals mass times snap.
Snatch equals mass times crackle.
Shake equals mass times pop.

There are jokes in there, help yourself.

What about integrating?

Clearly the integral of jerk is acceleration, and that of acceleration is velocity, the integral of which is displacement. But what is the integral of displacement with respect to time? It's called absement, and it's a pretty peculiar quantity to think about. In the same way that an object with linearly increasing displacement has constant velocity and zero acceleration, an object with linearly increasing absement has constant displacement and zero velocity. (Constant absement at zero displacement gives rise to the name 'absement': an absence of displacement.)

Integrating displacement over time might be useful: the area under the displacement curve for a throttle lever could conceivably be proportional to fuel consumption for example. So absement seems to be a potentially useful quantity, measured in metre-seconds.

Integrate absement and you get absity (a play on 'velocity'). Keep going and you get abseleration, abserk, and absounce. Are these useful quantities? I don't think so. A quick look at them all — for the same Tesla S dataset I used before — shows that the loss of detail from multiple cumulative summations makes for rather uninformative transformations: 

jerk_jounce_etc.png

You can reproduce the figures in this article with the Jupyter Notebook Jerk_jounce_etc.ipynb. Or you can launch a Binder right here in your browser and play with it there, without installing a thing!


References

David Eager et al (2016). Beyond velocity and acceleration: jerk, snap and higher derivatives. Eur. J. Phys. 37 065008. DOI: 10.1088/0143-0807/37/6/065008

Amarashiki (2012). Derivatives of position. The Spectrum of Riemannium blog, retrieved on 4 Mar 2018.

The dataset is from Jerry Jongerius's blog post, The Tesla (Elon Musk) and
New York Times (John Broder) Feud
. I have no interest in the 'feud', I just wanted a dataset.

The T-shirt is from Chummy Tees; the image is their copyright and used here under Fair Use terms.

The vintage Snap, Crackle and Pop logo is copyright of Kellogg's and used here under Fair Use terms.

Digitalization... of what?

I've been hearing a lot about 'digitalization', or 'digital transformation', recently. What is this buzzword?

The general idea seems to be: exploit lots and lots of data (which we already have), with analytics and machine learning probably, to do a better job finding and producing fuel and energy safely and responsibly.

At the centre of it all is usually data. Lots of data, usually in a lake. And this is where it all goes wrong. Digitalization is not about data. And it's not about technology either. Or cloud. Or IoT.

Interest in the terms "digital transformation" and "digitalization" since 2004, according to Google Trends. The data reveal a slight preference for the term "digitalization" in central and northern Europe. Google Ngram Viewer&nbsp;indicates that the…

Interest in the terms "digital transformation" and "digitalization" since 2004, according to Google Trends. The data reveal a slight preference for the term "digitalization" in central and northern Europe. Google Ngram Viewer indicates that the term "digitalization" has been around for over 100 years, but it is also a medical term connected with the therapeutic use of digitalis. Just to be clear, that's not what we're talking about.

It's about people

Oh no, here I go with the hand-wavy, apple-pie "people not process" nonsense... well, yes. I'm convinced that it's humans we're transforming, not data or technology. Or clouds. Or Things.

I think it's worth spelling out, because I think most corporations have not grasped the human aspect yet. And I don't think it's unreasonable to say that petroleum has a track record of not putting people at the centre of its activities, so I worry that this will happen again. This would be bad, because it might mean that digitalization not only fails to get traction — which would be bad enough because this revolution is long overdue — but also that it causes unintended problems.

Without people, digital transformation is just another top-down 'push' effort, with too much emphasis on supply. I think it's smarter to create demand, or 'pull', so that professionals are asking for support, and tools, and databases, and are engaged in how those things are created.

Put technical professionals at the heart of the revolution, and the rest will follow. The inverse is not true.

Strategies

This is far from an exhaustive list, but here are some ideas for ways to get ahead in digital transformation:

  • Make it easy for digitally curious people to dip a toe in. Build a beginner-friendly computing environment, and encourage people to use it. Challenge your IT people to support a culture of experimentation and creativity. 
  • Give those curious professionals access to professional development channels, whether it's our courses, other courses, online channels like Lynda.com or Coursera, or whatever. 
  • Build a community of practice for 'scientific computing'. Whether it's a Yammer group or something more formal, be sure to encourage frequent face-to-face meetups, and perhaps an intranet portal.
  • Start to connect subsurface professionals with software engineers, especially web programmers and data scientists, elsewhere in the organization. I think the best way is to embed programmers into technical teams. 
  • Encourage participation in external channels like conferences and publications, data science contests, hackathons, open source projects, and so on. I guarantee you'll see a step change in skills and enthusiasm.

The bottom line is that we're looking for a profound culutral change. It will be slow. More than that, it needs to be slow. It might only take a year or two to get traction for an idea like "digital first". But deeper concepts, like "machine readable microservices" or "data-driven decisions" or "reproducible workflows", must take longer because you can't build that high without a solid foundation. Successfully applying specific technologies like deep learning, augmented reality, or blockchain, will certainly require a high level of technology literacy, and will take years to get right.

What's going on with scientific computing in your organization? Are you 'digitally curious'? Do you feel well-supported? Do you know others in your organization like you?


The circuit board image in the thumbnail for this post is by Carl Drougge, licensed CC-BY-SA.

Finding Big Bertha with a hot wire

mcnaughton-canada-war-museum.jpg

Sunday will be the 131st birthday of General Andrew McNaughton, a Canadian electrical engineer who served in the Canadian Expeditionary Force in the First World War. He was eventually appointed commander of the Canadian Corps Heavy Artillery and went on to serve in the Second World War as well.

So what is a professional soldier doing on a blog about geoscience? Well, McNaughton was part of the revolution of applied acoustics and geophysics that emerged right before and after the First World War.

Along with eminent British physicist Lawrence Bragg, engineer William Sansome Tucker, and physicist Charles Galton Darwin (the other Charles Darwin's grandson), among others, McNaughton applied physics to the big problem of finding the big noisy things that were trying to blow everyone up. They were involved in an arms race of their own — German surveyor Ludger Mintrop was trying to achieve the same goal from the other side of the trenches.

Big_Bertha.jpg

After gaining experience as a gunner, McNaughton became one of a handful of scientists and engineers involved in counter-battery operations. Using novel ranging techniques, these scientists gave the allied forces a substantial advantage over the enemy. Counter-battery fire became an weapon at pivotal battles like Vimy Ridge, and certainly helped expedite the end of the war.

If all this sounds like a marginal way to win a battle, stop think for a second about these artillery. The German howitzer, known as 'Big Bertha' (left), could toss an 820 kg (1800 lb) shell about 12.5 km (7.8 miles). In other words, it was incredibly annoying.


Combining technologies

Localization accuracy on the order of 5–10 m on the large majority of gun positions was eventually achieved by the coordinated use of several technologies, including espionage, cartography, aerial reconnaissance photography, and the new counter-measures of flash spotting and sound ranging.

Flash spotting was more or less just what it sounds like: teams of spotters recording the azimuth of artillery flashes, then triangulating artillery positions from multiple observations. The only real trick was in reporting the timing of flashes to help establish that the flashes came from the same gun.

Sound ranging, on the other hand, is a tad more complicated. It seems that Lawrence Bragg was the first to realize that the low frequency sound of artillery fire — which he said lifted him off the privy seat in the outhouse at his lodgings — might be a useful signal. However, microphones were not up to the task of detecting such low frequencies. Furthermore, the signal was masked by the (audible) sonic boom of the shell, as well as the shockwaves of passing shells.

Elsewhere in Belgium, William Tucker had another revelation. Lying inside a shack with holes in its walls, he realized that the 20 Hz pressure wave from the gun created tiny puffs of air through the holes. So he looked for a way to detect this pulse, and came up with a heated platinum wire in a rum jar. The filament's resistance dropped when cooled by the wavefront's arrival through an aperture. The wire was unaffected by the high-frequency shell wave. Later, moving-coil 'microphones' (geophones, essentially) were also used, as well as calibration for wind and temperature. The receivers were coupled with a 5-channel string galvanometer, invented by French engineers, to record traces onto 35-mm film bearing timing marks:

sound-ranging-traces.png

McNaughton continued to develop these technologies through the war, and by the end was successfully locating the large majority of enemy artillery locations, and was even able to specify the calibre of the guns and their probable intended targets. Erster Generalquartiermeister Erich Ludendorff commented at one point in the war: 

According to a captured English document the English have a well- developed system of sound-ranging which in theory corresponds to our own. Precautions are accordingly to be taken to camouflage the sound: e.g. registration when the wind is contrary, and when there is considerable artillery activity, many batteries firing at the same time, simultaneous firing from false positions, etc.

An acoustic arsenal

Denge_acoustic_mirrors_March-2005_Paul-Russon.jpg

The hot-wire artillery detector was not Tucker's only acoustic innovation. He also pioneered the use of acoustic mirrors for aircraft detection. Several of these were built around the UK's east coast, starting around 1915 — the three shown here are at Denge in Kent. They were rendered obselete by the invention of radar around the beginning of World War Two.

Acoustic and seismic devices are still used today in military and security applications, though they are rarely mentioned in applied geophysics textbooks. If you know about any interesting contemporary uses, tell us about it in the comments.


According to Crown Copyright terms, the image of McNaughton is out of copyright. The acoustic mirror image is by Paul Russon, licensed CC-BY-SA. The uncredited/unlicensed galvanometer trace is from the excellent Stop, hey, what's that sound article on the geographical imaginations blog; I assume it is out of copyright. The howitzer image is out of copyright.

This post on Target acquisition and counter battery is very informative and has lots of technical details, though most of it pertains to later technology. The Boom! Sounding out the enemy article on ScienceNews for Students is also very nice, with lots of images. 

Unsolved problems in applied geoscience

I like unsolved problems. I first wrote about them way back in late 2010 — Unsolved problems was the eleventh post on this blog. I touched on the theme again in 2013, before and after the first 'unsession' at the GeoConvention, which itself was dedicated to finding the most pressing questions in exploration geoscience. As we turn towards the unsession at AAPG in Salt Lake City in May, I find myself thinking again about unsolved problems. Specifically, what are they? How can we find them? And what can we do to make them easier to solve?

It turns out lots of people have asked these questions before.

unsolved_problems.png

I've compiled a list of various attempts by geoscientists to list he big questions in the field. The only one I was previous aware of was Milo Backus's challenges in applied seismic geophysics, laid out in his president's column in GEOPHYSICS in 1980 and highlighted later by Larry Lines as part of the SEG's 75th anniversary. Here are some notable attempts:

  • John William Dawson, 1883 — Nova Scotia's most famous geologist listed unsolved problems in geology in his presidential address to the American Association for the Advancement of Science. They included the Cambrian Explosion, and the origin of the Antarctic icecap. 
  • Leason Heberling Adams, 1947 — One of the first experimental rock physicists, Adams made the first list I can find in geophysics, which was less than 30 years old at the time. He included the origin of the geomagnetic field, and the temperature of the earth's interior.
  • Milo Backus, 1980 — The list included direct hydrocarbon detection, seismic imaging, attenuation, and anisotropy.  
  • Mary Lou Zoback, 2000 — As her presidential address to the GSA, Zoback kept things quite high-level, asking questions about finding signal indynamic systems, defining mass flux and energy balance, identifying feedback loops, and communicating uncertainty and risk. This last one pops up in almost every list since.
  • Calgary's geoscience community, 2013 — The 2013 unsession unearthed a list of questions from about 50 geoscientists. They included: open data, improving seismic resolution, dealing with error and uncertainty, and global water management.
  • Daniel Garcia-Castellanos, 2014 — The Retos Terrícolas blog listed 49 problems in 7 categories, ranging from the early solar system to the earth's interior, plate tectonics, oceans, and climate. The list is still maintained by Daniel and pops up occasionally on other blogs and on Wikipedia.

The list continues — you can see them all in this presentation I made for a talk (online) at the Bureau of Economic Geology last week (thank you to Sergey Fomel for hosting me!). During the talk, I took the opportunity to ask those present what their unsolved problems are, especially the ones in their own fields. Here are a few of what we got (the rest are in the preso):

1-what-are-the-biggest-unsolved-problems-in-your-field-1.jpg

What are your unsolved problems in applied geoscience? Share them in the comments!


If you have about 50 minutes to spare, you can watch the talk here, courtesy of BEG's streaming service.

Click here to watch the talk >>>

Easier, better, faster, stronger

bruges_preview_1.png

Yesterday I pushed a new release of bruges to Python's main package repository, PyPi.  Version 0.3.3 might not sound like an especially auspicious version perhaps, but I'm excited about the new things we've added recently. It has come a long way since we announced it back in 2015, so if you haven't checked it out lately, now's a good time to take another look.

What is bruges again?

Bruges is a...

In other words, nothing fancy — just equations. It is free, open source software. It's aimed at geophysicists who use Python.

How do you install it? The short answer is pip:

    pip install bruges

So what's new?

Here are the highlights of what's been improved and added in the last few months:

  • The reflectivity equations in reflection module now work on arrays for the Vp, Vs, and rho values, as well as the theta values. This is about 10 times faster than running a loop over elements; the Zoeppritz solution is 100× faster.
  • The various Zoeppritz solutions and the Aki–Richards approximations now return the complex reflectivity and therefore show post-critical amplitudes correctly.
  • A new reflection coefficient series function, reflection.reflectivity(), makes it easier to compute offset reflectivities from logs.
  • Several new linear and non-linear filters are in bruges.filters, including median (good for seismic horizons), mode (good for waveform classification), symmetric nearest-neighbours or snn, and kuwahara.
  • The wavelets ricker(), sweep() (aka Klauder) and ormsby() wavelet now all work for a sequence of frequencies, returning a wavelet bank. Also added a sinc() wavelet, with a taper option to attenuate the sidelobes.
  • Added inverse_gardner, and other density and velocity transforms, to petrophysics.
  • Added transform.v_rms() (RMS velocity), transform.v_avg() (average velocity) and transform.v_bac() (naïve Backus average). These all operate in a 'cumulative' average-down-to sense.
  • Added a coordinate transformation to translate between arbitrarily oriented (x,y) and (inline, line) coordinates.

Want to try using it right now, with no installation? Give it a spin in My Binder! See how easy it is to compute elastic moduli, or offset reflection coefficients, or convert a log to time.  

bruges_preview_2.png

Want to support the development of open source geophysics software? Here's how:

  • Use it! This is the main thing we care about.
  • Report problems on the project's Issue page.
  • Fork the project and make your own changes, then share them back.
  • Pay us for the development of functionality you need.