Ten things I loved about ScienceOnline2012

ScienceOnline logoI spent Thursday and Friday at the annual Science Online unconference at North Carolina State University in Raleigh, NC. I had been looking forward to it since peeking in on—and even participating in—sessions last January at ScienceOnline2011. As soon as I had emerged from the swanky airport and navigated my way to the charmingly peculiar Velvet Cloak Inn I knew the first thing I loved was...

Raleigh, and NC State University. What a peaceful, unpretentious, human-scale place. And the university campus and facilities were beyond first class. I was born in Durham, England, and met my wife at university there, so I was irrationally prepared to have a soft spot for Durham, North Carolina, and by extension Raleigh too. And now I do. It's one of those rare places I've visited and known at once: I could live here. I was still basking in this glow of fondness when I opened my laptop at the hotel and found that the hard drive was doornail dead. So within 12 hours of arriving, I had...

Read More

The filtered earth

Ground-based image (top left) vs Hubble's image. Click for a larger view. One of the reasons for launching the Hubble Space Telescope in 1990 was to eliminate the filter of the atmosphere that affects earth-bound observations of the night sky. The results speak for themselves: more than 10 000 peer-reviewed papers using Hubble data, around 98% of which have citations (only 70% of all astronomy papers are cited). There are plenty of other filters at work on Hubble's data: the optical system, the electronics of image capture and communication, space weather, and even the experience and perceptive power of the human observer. But it's clear: eliminating one filter changed the way we see the cosmos.

What is a filter? Mathematically, it's a subset of a larger set. In optics, it's a wavelength-selection device. In general, it's a thing or process which removes part of the input, leaving some output which may or may not be useful. For example, in seismic processing we apply filters which we hope remove noise, leaving signal for the interpreter. But if the filters are not under our control, if we don't even know what they are, then the relationship between output and input is not clear.

Imagine you fit a green filter to your petrographic microscope. You can't tell the difference between the scene on the left and the one on the right—they have the same amount and distribution of green. Indeed, without the benefit of geological knowledge, the range of possible inputs is infinite. If you could only see a monochrome view, and you didn't know what the filter was, or even if there was one, it's easy to see that the situation would be even worse. 

Like astronomy, the goal of geoscience is to glimpse the objective reality via our subjective observations. All we can do is collect, analyse and interpret filtered data, the sifted ghost of the reality we tried to observe. This is the best we can do. 

What do our filters look like? In the case of seismic reflection data, the filters are mostly familiar: 

  • the design determines the spatial and temporal resolution you can achieve
  • the source system and near-surface conditions determine the wavelet
  • the boundaries and interval properties of the earth filter the wavelet
  • the recording system and conditions affect the image resolution and fidelity
  • the processing flow can destroy or enhance every aspect of the data
  • the data loading process can be a filter, though it should not be
  • the display and interpretation methods control what the interpreter sees
  • the experience and insight of the interpreter decides what comes out of the entire process

Every other piece of data you touch, from wireline logs to point-count analyses, and from pressure plots to production volumes, is a filtered expression of the earth. Do you know your filters? Try making a list—it might surprise you how long it is. Then ask yourself if you can do anything about any of them, and imagine what you might see if you could. 

Hubble image is public domain. Photomicrograph from Flickr user Nagem R., licensed CC-BY-NC-SA. 

What do you mean by average?

I may need some help here. The truth is, while I can tell you what averages are, I can't rigorously explain when to use a particular one. I'll give it a shot, but if you disagree I am happy to be edificated. 

When we compute an average we are measuring the central tendency: a single quantity to represent the dataset. The trouble is, our data can have different distributions, different dimensionality, or different type (to use a computer science term): we may be dealing with lognormal distributions, or rates, or classes. To cope with this, we have different averages. 

Arithmetic mean

Everyone's friend, the plain old mean. The trouble is that it is, statistically speaking, not robust. This means that it's an estimator that is unduly affected by outliers, especially large ones. What are outliers? Data points that depart from some assumption of predictability in your data, from whatever model you have of what your data 'should' look like. Notwithstanding that your model might be wrong! Lots of distributions have important outliers. In exploration, the largest realizations in a gas prospect are critical to know about, even though they're unlikely.

Geometric mean

Like the arithmetic mean, this is one of the classical Pythagorean means. It is always equal to or smaller than the arithmetic mean. It has a simple geometric visualization: the geometric mean of a and b is the side of a square having the same area as the rectangle with sides a and b. Clearly, it is only meaningfully defined for positive numbers. When might you use it? For quantities with exponential distributions — permeability, say. And this is the only mean to use for data that have been normalized to some reference value. 

Harmonic mean

The third and final Pythagorean mean, always equal to or smaller than the geometric mean. It's sometimes (by 'sometimes' I mean 'never') called the subcontrary mean. It tends towards the smaller values in a dataset; if those small numbers are outliers, this is a bug not a feature. Use it for rates: if you drive 10 km at 60 km/hr (10 minutes), then 10 km at 120 km/hr (5 minutes), then your average speed over the 20 km is 80 km/hr, not the 90 km/hr the arithmetic mean might have led you to believe. 

Median average

The median is the central value in the sorted data. In some ways, it's the archetypal average: the middle, with 50% of values being greater and 50% being smaller. If there is an even number of data points, then its the arithmetic mean of the middle two. In a probability distribution, the median is often called the P50. In a positively skewed distribution (the most common one in petroleum geoscience), it is larger than the mode and smaller than the mean:

Mode average

The mode, or most likely, is the most frequent result in the data. We often use it for what are called nominal data: classes or names, rather than the cardinal numbers we've been discussing up to now. For example, the name Smith is not the 'average' name in the US, as such, since most people are called something else. But you might say it's the central tendency of names. One of the commonest applications of the mode is in a simple voting system: the person with the most votes wins. If you are averaging data like facies or waveform classes, say, then the mode is the only average that makes sense. 

Honourable mentions

Most geophysicists know about the root mean square, or quadratic mean, because it's a measure of magnitude independent of sign, so works on sinusoids varying around zero, for example. 

The root mean square equation

Finally, the weighted mean is worth a mention. Sometimes this one seems intuitive: if you want to average two datasets, but they have different populations, for example. If you have a mean porosity of 19% from a set of 90 samples, and another mean of 11% from a set of 10 similar samples, then it's clear you can't simply take their arithmetic average — you have to weight them first: (0.9 × 0.21) + (0.1 × 0.14) = 0.20. But other times, it's not so obvious you need the weighted sum, like when you care about the perception of the data points

Are there other averages you use? Do you see misuse and abuse of averages? Have you ever been caught out? I'm almost certain I have, but it's too late now...

There is an even longer version of this article in the wiki. I just couldn't bring myself to post it all here. 

Petroleum cheatsheet

I have just finished teaching one semester of Petroleum Geoscience at Dalhousie University. It's not quite over: I am still marking, marking, marking. The experience was all of the following, mostly simultaneously:

  • scarily exposing
  • surprisngly eye-opening
  • deeply exhausting
  • personally motivating
  • professionally educational
  • ultimately satisying
  • predominantly fun

Lucrative? No, but I did get paid. Regrettable? No, I'm very happy that I did it. I'm not certain I'd do it again... perhaps if it was the very same course, now that I have some material to build on. 

One of the things I made for my students was a cheatsheet. I'd meant to release it into the wild long ago, but I'm pleased to say that today I have tweaked and polished and extended it and it's ready. There will doubtless be updates as our cheatsheet faithful expose my schoolboy errors (please do!), but version 1.0 is here, still warm from the Inkscape oven.

This is the fifth cheatsheet in our collection. If you find a broken link, do let us know, as I have moved them into a new folder today. Enjoy!

Things not to think

  1. Some humans are scientists.
  2. No non-humans are scientists.
  3. Therefore, scientists are human.

That's how scientists think, right? Logical, deductive, objective, algorithmic. Put in such stark terms, this may seem over the top, but I think scientists do secretly think of themselves this way. Our skepticism makes us immune to the fanciful, emotional, naïvetés that normal people believe. You can't fool a scientist!

Except of course you can. Just like everyone else, scientists' intuition is flawed, infested with bias like subjectivity and the irresistible need to seek confirmation of hypotheses. I say 'everyone', but perhaps scientists are biased in obscure, profound ways that non-specialists are not. A scary thought.

But sometimes I hear scientists say things that are especially subtle in their wrongness. Don't get me wrong: I wholeheartedly believe these things too, until I stop for a moment and reflect. Here are some examples:

The scientific method

...as if there is but one method. To see how wrong this notion is, stop and try to write down how your own investigations proceed. The usual recipe is something like: question, hypothesis, experiment, adjust hypothesis, iterate, and conclude with a new theory. Now look at your list and ask yourself if that's really how it goes. If it isn't really full of false leads, failed experiments, random shots in the dark and a brain fart or two. Or maybe that's just me.

If not thesis then antithesis

...as if there is no nuance or uncertainty in the world. We treat bipolar disorder in people, but seem to tolerate it and even promote it in society. Arguments quickly move to the extremes, becoming ludicrously over-simplified in the process. Example: we need to have an even-tempered, fact-based discussion about our exploitation of oil and gas, especially in places like the oil sands. This discussion is difficult to have because if you're not with 'em, you're against 'em. 

Nature follows laws

...as if nature is just a good citizen of science. Without wanting to fall into the abyss of epistemology here, I think it's important to know at all times that scientists are trying to describe and represent nature. Thinking that nature is following the laws that we derive on this quest seems to me to encourage an unrealistically deterministic view of the world, and smacks of hubris.

How vivid is the claret, pressing its existence into the consciousness that watches it! If our small minds, for some convenience, divide this glass of wine, this universe, into parts — physics, biology, geology, astronomy, psychology, and so on — remember that Nature does not know it!
Richard Feynman

Science is true

...as if knowledge consists of static and fundamental facts. It's that hubris again: our diamond-hard logic and 1024-node clusters are exposing true reality. A good argument with a pseudoscientist always convinces me of this. But it's rubbish—science isn't true. It's probably about right. It works most of the time. It's directionally true, and that's the way you want to be going. Just don't think there's a True Pole at the end of your journey.

There are probably more but I read or hear and example of at least one of these a week. I think these fallacies are a class of cognitive bias peculiar to scientists. A kind of over-endowment of truth. Or perhaps they are examples of a rich medley of biases, each of us with our own recipe. Once you know your recipe and learned its smell, be on your guard!

The simultaneity funnel

Is your brilliant idea really that valuable?

At Agile*, we don't really place a lot of emphasis on ideas. Ideas are abundant, ideas are cheap. Ideas mean nothing without actions. And it's impossible to act on every one. Funny though, I seem to get enthralled whenever I come up with a new idea. It's conflicting because, it seems to me at least, a person with ideas is more valuable, and more interesting, than one without. Perhaps it takes a person who is rich with ideas to be able to execute. Execution and delivery is rare, and valuable. 

Kevin Kelly describes the evolution of technology as a progression of the inevitable, quoting examples such as the lightbulb, and calculus. Throughout history parallel invention is the norm. 

We can say, the likelihood that the lightbulb will stick is 100 percent. The likelihood Edison's was the adopted bulb is, well, one in 10,000. Furthermore, each stage of the incarnation can recruit new people. Those toiling at the later stages may not have been among the early pioneers. Given the magnitude of the deduction, it is improbable that the first person to make an invention stick was also the first person to think of the idea.

Danny Hillis, founder of Applied Minds describes this as an inverted pyramid of invention. It tells us that your brilliant idea will have coparents. Even though the final design of the first marketable lightbulb could not have been anticipated by anyone, the concept itself was inevitable. All ideas start out abstract and become more specific toward their eventual execution. 

Does this mean that it takes 10,000 independant tinkerers to bring about an innovation? We aren't all working on the same problems at the same time, and some ideas arrive too early. One example is how microseismic monitoring of reservoir stimulation has exploded recently with the commercialization of shale gas projects in North America. The technology came from earthquake detection methods and that has been around for decades. Only recently has this idea been utilized in the petroleum industry, due to an alignment of compelling market forces. 

So is innovation merely a numbers game? Is 10,000 a critical mass that must be exceeded to bring about a single change? If so, the image of the lonely hero-inventor-genius, then, is misguided. And if it is a numbers game, then subsurface oil and gas technology could be seriously challenged. The SPE has nearly 100,000 members world wide, compared to our beloved SEG, which has a mere 6,000 33,000. Membership to a club or professional society does not equate to contribution, but if this figure is correct, I doubt our industry has the sustained man power to feed this funnel.

This system has been observed since the start of recorded science. The pace of invention is accelerating with population and knowledge growth. Additionally, even though the pace of technology is accelerating, so is specialization and diversification, which means we have fewer people working on more problems. Is knowledge sharing and crowd wisdom a natural supplement to this historical phenomenon? Are we augmenting this funnel or connecting disparate funnels when we embrace openess?

A crowded funnel might be compulsory for advacement and progression even if it is causes cutthroat competitiveness, hoarding, or dropping out altogether. But if these options become no longer palatable for the future of our industry, we will have to modify our approach.  

Wave-particle duality

Geoblogger Brian Romans has declared it Dune Week (here's part of his tweet), so I thought I'd jump on the bandwagon with one of my favourite dynamic dune examples illustrating the manifold controls on dune shape. 

Barchan dunes and parabolic dunes both form where there is limited sand supply and unimodally-directed wind (that is, the wind always blows from the same direction). Barchans, like these in Qatar, migrate downwind as sand is blown around the tips of the crescent. Consequently, the slip face is concave.

Location: 24.98°N, 51.37°E

In contrast, parabolic dunes have a convex slip face. They form in vegetated areas: vegetation causes drag on the arms of the crescent, resulting in the elongated shape. These low-amplitude dunes in NE Brazil have left obvious trails.

Location: 3.41°S, 39.00°W

 


The eastern edge of White Sands dunefield in New Mexico shows an interesting transition from barchan to parabolic, as the marginal vegetation is encroached upon by these weird gypsum dunes. The mode transition runs more or less north–south. Can you tell which side is which? Which way does the wind blow?

View Larger Map

Herrmann and Duràn modelled this type of transition, among others, in a series of fascinating papers including this presentation and Durán et al  2007, Parabolic dunes in north-eastern Brazil, in arXiv Soft Condensed Matter. Their figures show how their numerical models represent nature quite well as barchans transition to parabolic dunes:

Duran_Herrmann_2006_Dunes.png

Please, sir, may I have some seismic petrophysics?

Petrophysics is an indispensible but odd corner of subsurface geoscience. I find it a bit of a paradox. On the one hand, well logs fill a critical gap between core and seismic. On the other hand, most organizations I've worked in are short of petrophysicists, sometimes—no, usually—without even recognizing it.

When a petrophysicist is involved in a project, they usually identify with the geologists, perhaps even calling themselves one. There’s a lot of concern for a good porosity curve, and the interpretation of the volume of clay and other mineralogical constituents. There’s also a lot of time for the reservoir engineer, who wants a reliable estimate of the reservoir pressure, temperature and water saturation (about 20–40% of the pore space is filled with water in an oil or gas field; it’s important to know how much). This is all good; these are important reservoir properties.

Incomplete and spiky logs in the uphole section of the Tunalik 1 well from the western edge of the National Petroleum Reserve in Alaska [click for larger image]. Image: USGSBut where is the geophysicist? Often, she is in her office, editing her own sonic log (called DT, the sonic is P-wave slowness), or QCing her own bulk density curve. Why? Because bulk density ρ and P-wave velocity VP together make the best estimate of acoustic impedance :

Acoustic impedance is the simplest way to compute a model seismic trace. We can compare this model trace to the real seismic data, recorded from the surface, to make the all-important connection between rocks and wiggles. The acoustic impedance curve determines what this model trace looks like, but we also need to know where it goes in the vertical travel-time domain. The sonic log comes into play again: it gives the best first estimate of seismic travel time. Since each sample is a measure of the time taken for a sound wave to travel the unit distance, it can be integrated for the total travel time. Yeah, that’s mathematics. It works.

In short, the logs are critical for doing any geophysics at all.

But they always need attention. Before we can use these logs, they must be quality checked and often edited. There is often a need to splice data form various logging runs together. The uphole sections are usually bad (there may be measurements in cased intervals, for example). Both of the logs are sensitive to hole condition. 

So the logs are critical, and always need fine-tuning. But I have yet to work on a project where a clean, top-to-tail DT and RHOB log are seen as a priority. Usually, they are not even on the List Of Things To Do. 

Result: the geophysicist gets on with it, and edits the logs. Now there's a DT_EDIT curve in the project. Oh, that name's been taken. And DT_Final and DT_edit2. I wonder who made those? DT_Matt then... but will anyone know what that is? No, and no-one will care either, because the madness will never end. 

There is even the risk of a greater tragedy: no geophysical logs at all. A missing or incomplete sonic because the tool was never run, or it failed and was not repeated, or it was just forgotten. No shear-wave sonic when you really just need a shear-wave sonic. No checkshots anywhere in the entire field, or the unedited data have been loaded in some horrible way. No VSPs anywhere, or no-one knows where the data are. Probably rotting on a 9-track tape somewhere in a salt cavern in Louisiana. 

Here's are some things to ask your friendly petrophysicist for:

  • A single, lightly edited, RHOB, DT, and DTS (if available) curve, from the top of the reliable data to the bottom.
  • If they're available, a set of checkshots with time and depth measured from the seismic datum (they are almost never recorded this way so have to be corrected).
  • Help understanding the controls on sonic and density with depth; for example, can we ascribe some portion of the trends to compaction, and some to diagenesis?
  • Help understanding the relationship between lithology and acoustic impedance. Filter the data to see how the impedance of sands and shales vary with depth.
  • If there are several wells with complete sets of logs and there's to be an attempt to model missing or incomplete logs, then the petrophysicist should be involved.

What have I missed? Is there more? Or maybe you think this is too much?

Last thing: when the petrophysicist is making his beautiful composite displays of the well data, ask him to include acoustic impedance, the reflection coefficients, the synthetic seismogram, and even the seismic traces from the well location. This will surprise people. In a good way.

Thin-bed vowels and heterolithic consonants

Seismologists see the world differently. Or, rather, they hear the world differently. Sounds become time series, musical notes become Fourier components. The notes we make with our vocal chords come from the so-called sonorants, especially the vowel sounds, and they look like this:

Consontants aren't as pretty, consisting of various obstruents like plosives and fricatives—these depend on turbulence, and don't involve the vocal chords. They look very different:

Geophysicists will recognize these two time series as being signal-dominated and noise-dominated, respectively. The signal in the vowel sound is highly periodic: a small segment about 12 ms long is repeated four times in this plot. There is no repeating signal in the consonant sound: it is more or less white noise.

When quantitative people hear the word periodic, their first thought is usually Fourier transform. Like a prism, the Fourier transform unpacks mixed signals into monotones, making them easier to examine and explain. For instance, the Fourier transform of a set of limestone beds might reveal the Milankovitch cycles of which I am so fond. What about S and E?

The spectrum of the consonant S is not very organized and close to being random. But the E sound has an interesting shape. It's quite smooth and has obvious repetitive notches. Any geophysicist who has worked with spectral decomposition—a technique for investigating thin beds—will recognize these. For example, compare the spectrums for a random set of reflection coefficients (what we might call background geology) and a single thin bed, 10 ms thick:

Notches! The beauty of this, from an interpreter's point of view, is that one can deduce the thickness of the thin-bed giving rise to this notchy spectrum. The thickness is simply 1/n, where n is the notch spacing, 100 Hz in this case. So the thickness is 1/100 = 0.01 s = 10 ms. We can easily compute the spectrum of seismic data, so this is potentially powerful.

While obvious here, in a complicated spectrum the notches might be hard to detect and thus measure. But the notches are periodic. And what do we use to find periodic signals? The Fourier transform! So what happens if we take the spectrum of the spectrum of my voice signal—where we saw a 12 ms repeating pattern?

There's the 12 ms periodic signal from the time series! 

The spectrum of the spectrum is called the cepstrum (pronounced, and sometimes spelled, kepstrum). We have been transported from the frequency domain to a new universe: the quefrency domain. We are back with units of time, but there are other features of the cepstral world that make it quite different from the time domain. I'll discuss those in a future post. 

Based on a poster paper I presented at the 2005 EAGE Conference & Exhibition in Madrid, Spain, and on a follow-up article Hall, M (2006), Predicting bed thickness with cepstral decomposition, The Leading Edge, February 2006, doi:10.1190/1.2172313

McKelvey's reserves and resources

Vincent McKelvey (right) was chief geologist at the US Geological Survey, and then its director from 1971 until 1977. Rather like Sherman Kent at the CIA, who I wrote about last week, one of his battles was against ambiguity in communication. But rather than worrying about the threat posed by the Soviet Union or North Korea, his concern was the reporting of natural resources in the subsurface of the earth. Today McKelvey's name is associated with a simple device for visualizing levels of uncertainty and risk associated with mineral resources: the McKelvey box.

Here (left) is a modernized version. It helps unravel some oft-heard jargon. The basic idea is that only discovered, commercially-viable deposits get to be called Reserves. Discovered but sub-commercial (with today's technology and pricing) are contingent resources. Potentially producible and viable deposits that we've not yet found are called prospective resources. These are important distinctions, especially if you are a public company or a government.

Over time, this device has been reorganized and subdivided with ever more subtle distinctions and definitions. I was uninspired by the slightly fuzzy graphics in the ongoing multi-part review of reserve reporting in the CSPG Reservoir magazine (Yeo and Derochie, 2011, Reserves and resources series, CSPG Reservoir, starting August 2011). So I decided to draw my own version. To reflect the possiblity that there may yet be undreamt-of plays out there, I added a category for Unimagined resources. One for the dreamers.

You can find the Scalable Vector Graphics file for this figure in SubSurfWiki. If you have ideas about other jargon to add, or ways to represent the uncertainty, please have a go at editing the wiki page, the figure, or drop us a line!