### Difficulty rating: Beginner

We'd often like to load images into Python. Once loaded, we might want to treat them as images, for example cropping them, saving in another format, or adjusting brightness and contrast. Or we might want to treat a greyscale image as a two-dimensional NumPy array, perhaps so that we can apply a custom filter, or because the image is actually seismic data.

This image-or-array duality is entirely semantic — there is really no difference between images and arrays. An image is a regular array of numbers, or, in the case of multi-channel rasters like full-colour images, a regular array of several numbers: one for each channel. So each pixel location in an RGB image contains 3 numbers:

In general, you can go one of two ways with images:

1. Load the image using a library that 'knows about' (i.e. uses language related to) images. The preeminent tool here is pillow (which is a fork of the grandparent of all Python imaging solutions, PIL).
2. Load the image using a library that knows about arrays, like matplotlib or scipy. These wrap PIL, making it a bit easier to use, but potentially losing some options on the way.

The Jupyter Notebook accompanying this post shows you how to do both of these things. I recommend learning to use some of PIL's power, but knowing about the easier options too.

Here's the way I generally load an image:

from PIL import Image
im = Image.open("my_image.png")

(One strange thing about pillow is that, while you install it with pip install pillow, you still actually import and use PIL in your code.) This im is an instance of PIL's Image class, which is a data structure especially for images. It has some handy methods, like im.crop(), im.rotate(), im.resize(), im.filter(), im.quantize(), and lots more. Doing some of these operations with NumPy arrays is fiddly — hence PIL's popularity.

But if you just want your image as a NumPy array:

import numpy as np
arr = np.array(im)

Note that arr is a 3-dimensional array, the dimensions being row, column, channel. You can go off with arr and do whatever you need, then cast back to an Image with Image.fromarray(arr).

All this stuff is demonstrated in the Notebook accompanying this post, or you can use one of these links to run it right now in your browser:

Run the accompanying notebook in MyBinder

Comment

### Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

# Impact structures in seismic

I saw this lovely tweet from PGS yesterday:

Kudos to them for sharing this. It’s always great to see seismic data and interpretations on Twitter — especially of weird things. And impact structures are just cool. I’ve interpreted them in seismic myself. Then uninterpreted them.

I wish PGS were able to post a little more here, like a vertical profile, maybe a timeslice. I’m sure there would be tons of debate if we could see more. But not all things are possible when it comes to commercial seismic data.

It’s crazy to say more about it without more data (one-line interpretation, yada yada). So here’s what I think.

### Impact craters are rare

There are at least two important things to think about when considering an interpretation:

1. How well does this match the model? (In this case, how much does it look like an impact structure?)

2. How likely are we to see an instance of this model in this dataset? (What’s the base rate of impact structures here?)

Interpreters often forget about the second part. (There’s another part too: How reliable are my interpretations? Let’s leave that for another day, but you can read Bond et al. 2007 as homework if you like.)

The problem is that impact structures, or astroblemes, are pretty rare on Earth. The atmosphere takes care of most would-be meteorites, and then there’s the oceans, weather, tectonics and so on. The result is that the earth’s record of surface events is quite irregular compared to, say, the moon’s. But they certainly exist, and occasionally pop up in seismic data.

In my 2011 post Reliable predictions of unlikely geology, I described how skeptical we have to be when predicting rare things (‘wotsits’). Bayes’ theorem tells us that we must modify our assigned probability (let’s say I’m 80% sure it’s a wotsit) with the prior probability (let’s pretend a 1% a priori chance of there being a wotsit in my dataset). Here’s the maths:

$$\ \ \ P = \frac{0.8 \times 0.01}{0.8 \times 0.01\ +\ 0.2 \times 0.99} = 0.0388$$

In other words, the conditional probability of the feature being a rare wotsit, given my 80%-sure interpretation, is 0.0388 or just under 4%.

As cool as it would be to find a rare wotsit, I probably need a back-up hypothesis. Now, what’s that base rate for astroblemes? (Spoiler: it’s much less than 1%.)

### Just how rare are astroblemes?

First things first. If you’re interpreting circular structures in seismic, you need to read Simon Stewart’s paper on the subject (Stewart 1999), and his follow-up impact crater paper (Stewart 2003), which expands on the topic. Notwithstanding Stewart’s disputed interpretation of the Silverpit not-a-crater structure in the North Sea, these two papers are two of my favourites.

According to Stewart, the probability P of encountering r craters of diameter d or more in an area A over a time period t years is given by:

$$\ \ \ P(r) = \mathrm{e}^{-\lambda A}\frac{(\lambda A)^r}{r!}$$

where

$$\ \ \ \lambda = t n$$

and

$$\ \ \ \log n = - (11.67 \pm 0.21) - (2.01 \pm 0.13) \log d$$

We can use these equations to compute the probability plot on the right. It shows the probability of encountering an astrobleme of a given diameter on a 2400 km² seismic survey spanning the Phanerozoic. (This doesn’t take into account anything to do with preservation or detection.) I’ve estimated that survey size from PGS’s tweet, and I’ve highlighted the 7.5 km diameter they mentioned. The probability is very small: about 0.00025. So Bayes tells us that an 80%-confident interpretation has a conditional probability of about 0.001. One in a thousand.

Here’s the Jupyter notebook I used to make that chart using Python.

### So what?

My point here isn’t to claim that this structure is not an astrobleme. I haven’t seen the data, I’ve no idea. The PGS team mentioned that they considered the possiblity of influence by salt or shale, and fluid escape, and rejected these based on the evidence.

My point is to remind interpreters that when your conclusion is that something is rare, you need commensurately more and better evidence to support the claim. And it’s even more important than usual to have multiple working hypotheses.

Last thing: if I were PGS and this was my data (i.e. not a client’s), I’d release a little cube (anonymized, time-shifted, bit-reduced, whatever) to the community and enjoy the engagement and publicity. With a proper license, obviously.

### References

Hughes, D, 1998, The mass distribution of the crater-producing bodies. In Meteorites: Flux with time and impact effects, Geological Society of London Special Publication 140, 31–42.

Davis, J, 1986, Statistics and data analysis in geology, John Wiley & Sons, New York.

Stewart, SA (1999). Seismic interpretation of circular geological structures. Petroleum Geoscience 5, p 273–285.

Stewart, SA (2003). How will we recognize buried impact craters in terrestrial sedimentary basins? Geology 31 (11), p 929–932.

Comment

### Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

# 90 years of seismic exploration

Today is an important day for applied geoscience. For one thing, it’s St Barbara’s Day. For another, 4 December is the anniversary of the first oil discovery drilled on seismic reflection data.

During World War 1 — thanks to the likes of Reginald Fessenden, Lawrence Bragg, Andrew McNaughton, William Sansome and Ludger Mintrop — acoustics emerged as a method of remote sensing. After the war, enterprising scientists looked for commercial applications of the technology. The earliest geophysical patent application I can find is Fessenden’s 1917 award for the detection of orebodies in mines, and Mintrop applied for a surface-based method in 1920, but the early patents pertained to refraction and diffraction experiments. The first reflection patent, US Patent no. 1,843,725, was filed on 1 May 1929 by John Clarence Karcher… almost 6 months after the discovery well was completed.

It’s fun to read the patent. It begins

Figures 4 and 5 show what must be the first ever depiction of shot gathers:

Figure 5 from Karcher’s patent, ‘Determination of subsurface formations’. It illustrates the arrivals of different wave modes at the receivers.

Karcher was born in Dale, Indiana, but moved to Oklahoma when he was five. He later studied electrical engineering and physics at the University of Oklahoma. Along with William Haseman, David Ohearn, and Irving Perrine, Karcher formed the Geological Engineering Company. Early tests of the technology took place in the summer of 1921 near Oklahoma City, and the men spent the next several years shooting commercial refraction surveys around Texas and Oklahoma — helping discover dozens of saltdome-related fields — and meanwhile trying to perfect the reflection experiment. During this period, they were competing with Mintrop’s company, Seismos.

### The first well

In 1925, Karcher formed a new company — Geophysical Research Corporation, GRC, now part of Sercel — with Everette Lee DeGolyer of Amerada Petroleum Corporation and money from the Viscount Cowdray (owner of Pearson, now a publishing company, but originally a construction firm). Through this venture, Karcher eventually prevailed in the race to prove the seismic reflection method. From what I can tell, HB Peacock and/or JE Duncan successfully mapped the structure of the Ordovician Viola limestone, which overlies the prolific Simpson Group. On 4 December 1928, Amerada completed No. 1 Hallum well near Maud, Oklahoma.

The locations (as best I Can tell) of the first test of reflection seismology, the first seismic section, and the first seismic survey that led to a discovery. The map also shows where Karcher grew up; he went to university in Norman, south of Oklahoma City..

### Serial entrepreneur

Karcher was a geophysical legend. After Geophysical Research Corporation, he co-founded Geophysical Service Incorporated (GSI) which was the origin of Texas Instruments and the integrated circuit. And he founded several explorations companies after that. Today, his name lives on in the J. Clarence Karcher Award that SEG gives each year to one or more stellar young geophysicists.

It seems appropriate that the oil discovery fell on the feast of St Barbara, the patron saint of miners and armorers and all who deal in explosives, but also of mathematicians and geologists. If you have a bottle near you this evening, raise a glass to St Barbara and the legion of geophysicists that have made seismic reflection such a powerful tool today.

Comment

### Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

# The Surmont Supermerge

In my recent Abstract horror post, I mentioned an interesting paper in passing, Durkin et al. (2017):

Paul R. Durkin, Ron L. Boyd, Stephen M. Hubbard, Albert W. Shultz, Michael D. Blum (2017). Three-Dimensional Reconstruction of Meander-Belt Evolution, Cretaceous Mcmurray Formation, Alberta Foreland Basin, Canada. Journal of Sedimentary Research 87 (10), p 1075–1099. doi: 10.2110/jsr.2017.59

I wanted to write about it, or rather about its dataset, because I spent about 3 years of my life working on the USD 75 million seismic volume featured in the paper. Not just on interpreting it, but also on acquiring and processing the data.

Let's start by feasting our eyes on a horizon slice, plus interpretation, of the Surmont 'Supermerge' 3D seismic volume:

Figure 1 from Durkin et al (2017), showing a stratal slice from 10 ms below the top of the McMurray Formation (left), and its interpretation (right). © 2017, SEPM (Society for Sedimentary Geology) and licensed CC-BY.

A decade ago, I was 'geophysics advisor' on Surmont, which is jointly operated by ConocoPhillips Canada, where I worked, and Total E&P Canada. My line manager was a Total employee; his managers were ex-Gulf Canada. It was a fantastic, high-functioning team, and working on this project had a profound effect on me as a geoscientist.

### The Surmont bitumen field

The dataset covers most of the Surmont lease, in the giant Athabasca Oil Sands play of northern Alberta, Canada. The Surmont field alone contains something like 25 billions barrels of bitumen in place. It's ridiculously massive — you'd be delighted to find 300 million bbl offshore. Given that it's expensive and carbon-intensive to produce bitumen with today's methods — steam-assisted gravity drainage (SAGD, "sag-dee") in Surmont's case — it's understandable that there's a great deal of debate about producing the oil sands. One factoid: you have to burn about 1 Mscf or 30 m³ of natural gas, costing about USD 10–15, to make enough steam to produce 1 bbl of bitumen.

Detail from Figure 12 from Durkin et al (2017), showing a seismic section through the McMurray Formation. Most of the abandoned channels are filled with mudstone (really a siltstone). The dipping heterolithic strata of the point bars, so obvious in horizon slices, are quite subtle in section. © 2017, SEPM (Society for Sedimentary Geology) and licensed CC-BY.

The field is a geoscience wonderland. Apart from the 600 km² of beautiful 3D seismic, there are now about 1500 wells, most of which are on the 3D. In places there are more than 20 wells per section (1 sq mile, 2.6 km², 640 acres). Most of the wells have a full suite of logs, including FMI in 2/3 wells and shear sonic as well in many cases, and about 550 wells now have core through the entire reservoir interval — about 65–75 m across most of Surmont. Let that sink in for a minute.

### What's so awesome about the seismic?

OK, I'm a bit biased, because I planned the acquisition of several pieces of this survey. There are some challenges to collecting great data at Surmont. The reservoir is only about 500 m below the surface. Much of the pay sand can barely be called 'rock' because it's unconsolidated sand, and the reservoir 'fluid' is a quasi-solid with a viscosity of 1 million cP. The surface has some decent topography, and the near surface is glacial till, with plenty of boulders and gravel-filled channels. There are surface lakes and the area is covered in dense forest. In short, it's a geophysical challenge.

Nonetheless, we did collect great data; here's how:

• General information
• The ca. 600 km² Supermerge consists of a dozen 3Ds recorded over about a decade starting in 2001.
• The northern 60% or so of the dataset was recombined from field records into a single 3D volume, with pre- and post-stack time imaging.
• The merge was performed by CGG Veritas, cost nearly \$2 million, and took about 18 months.
• Geometry
• Most of the surveys had a 20 m shot and receiver spacing, giving the volume a 10 m by 10 m natural bin size
• The original survey had parallel and coincident shot and receiver lines (Megabin); later surveys were orthogonal.
• We varied the line spacing between 80 m and 160 m to get trace density we needed in different areas.
• Sources
• Some surveys used 125 g dynamite at a depth of 6 m; others the IVI EnviroVibe sweeping 8–230 Hz.
• We used an airgun on some of the lakes, but the data was terrible so we stopped doing it.
• Most of the surveys were recorded into single-point 3C digital MEMS receivers planted on the surface.
• Bandwidth
• Most of the datasets have data from about 8–10 Hz to about 180–200 Hz (and have a 1 ms sample interval).

The planning of these surveys was quite a process. Because access in the muskeg is limited to 'freeze up' (late December until March), and often curtailed by wildlife concerns (moose and elk rutting), only about 6 weeks of shooting are possible each year. This means you have to plan ahead, then mobilize a fairly large crew with as many channels as possible. After acquisition, each volume spent about 6 months in processing — mostly at Veritas and then CGG Veritas, who did fantastic work on these datasets.

Kudos to ConocoPhillips and Total for letting people work on this dataset. And kudos to Paul Durkin for this fine piece of work, and for making it open access. I'm excited to see it in the open. I hope we see more papers based on Surmont, because it may be the world's finest subsurface dataset. I hope it is released some day, it would have huge impact.

References & bibliography

Paul R. Durkin, Ron L. Boyd, Stephen M. Hubbard, Albert W. Shultz, Michael D. Blum (2017). Three-Dimensional Reconstruction of Meander-Belt Evolution, Cretaceous Mcmurray Formation, Alberta Foreland Basin, Canada. Journal of Sedimentary Research 87 (10), p 1075–1099. doi: 10.2110/jsr.2017.59 (not live yet).

Hall, M (2007). Cost-effective, fit-for-purpose, lease-wide 3D seismic at Surmont. SEG Development and Production Forum, Edmonton, Canada, July 2007.

Hall, M (2009). Lithofacies prediction from seismic, one step at a time: An example from the McMurray Formation bitumen reservoir at Surmont. Canadian Society of Exploration Geophysicists National Convention, Calgary, Canada, May 2009. Oral paper.

Zhu, X, S Shaw, B Roy, M Hall, M Gurch, D Whitmore and P Anno (2008). Near-surface complexity masquerades as anisotropy. SEG Annual Convention, Las Vegas, USA, November 2008. Oral paper. doi: 10.1190/1.3063976.

Surmont SAGD Performance Review (2016), by ConocoPhillips and Total geoscientists and engineers. Submitted to AER, 258 pp. Available online [PDF] — and well worth looking at.

Trad, D, M Hall, and M Cotra (2008). Reshooting a survey by 5D interpolation. Canadian Society of Exploration Geophysicists National Convention, Calgary, Canada, May 2006. Oral paper.

Comment

### Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

# Machine learning meets seismic interpretation

Agile has been reverberating inside the machine learning echo chamber this past week at EAGE. The hackathon's theme was machine learning, Monday's workshop was all about machine learning. And Matt was also supposed to be co-chairing the session on Applications of machine learning for seismic interpretation with Victor Aare of Schlumberger, but thanks to a power-cut and subsequent rescheduling, he found himself double-booked so, lucky me, he invited me to sit in his stead. Here are my highlights, from the best seat in the house.

Before I begin, I must mention the ambivalence I feel towards the fact that 5 of the 7 talks featured the open-access F3 dataset. A round of applause is certainly due to dGB Earth Sciences for their long time stewardship of open data. On the other hand, in the sardonic words of my co-chair Victor Aarre, it would have been quite valid if the session was renamed The F3 machine learning session. Is it really the only quality attribute research dataset our industry can muster? Let's do better.

### Using seismic texture attributes for salt classification

Ghassan AlRegib ruled the stage throughout the session with not one, not two, but three great talks on behalf of himself and his grad students at Georgia Institute of Technology (rather than being a show of bravado, this was a result of problems with visas). He showed some exciting developments in shallow learning methods for predicting facies in seismic data. In addition to GLCM attributes, he also introduced a couple of new (to me anyway) attributes for salt classification. Namely, textural gradient and a thing he called seismic saliency, a metric modeled after the human visual system describing the 'reaction' between relative objects in a 3D scene.

Twelve Seismic attributes used for multi-attribute salt-boundary classification. (a) is RMS Amplitude, (B) to (M) are TEXTURAL attributes. See abstract for details. This figure is copyright of Ghassan AlRegib and licensed CC-BY-SA by virtue of being generated from the F3 dataset of dGB and TNO.

Ghassan also won the speakers' lottery, in a way. Due to the previous day's power outage and subsequent reshuffle, the next speaker in the schedule was a no-show. As a result, Ghassan had an extra 20 minutes to answer questions. Now for most speakers that would be a public-speaking nightmare, but Ghassan hosted the onslaught of inquiring minds beautifully. If we hadn't had to move on to the next next talk, I'm sure he could have entertained questions all afternoon. I find it fascinating how unpredictable events like power outages can actually create the conditions for really effective engagement.

### Salt classification without using attributes (using deep learning)

Matt reported on Anders Waldeland's work a year ago, and it was interesting to see how his research has progressed, as he nears the completion of his thesis.

Anders successfully demonstrated how convolutional neural networks (CNNs) can classify salt bodies in seismic datasets. So, is this a big deal? I think it is. Indeed, Anders's work seems like a breakthough in seismic interpretation, at least of salt bodies. To be clear, I don't think this means that it is time for seismic interpreters to pack up and go home. But maybe we can start looking forward to spending our time doing less tedious things than picking complex salt bodies.

One slice of a 3d seismic volume with two CLASS LABELS: Salt (red) and Not SALT (GREEN). This is the training data. On the right: Extracted 3D salt body in the same dataset, coloured by elevation. Copyright of A Waldeland, used with permission.

He trained a CNN on one manually labeled slice of a 3D cube and used the network to automatically classify the full 3D salt body (on the right in the figure). Conventional algorithms for salt picking, such as that used by AlRegib (see above), typically rely on seismic attributes to define a feature space. This requires professional insight and judgment, and is prone to error and bias. Nicolas Audebert mentioned the same shortcoming in his talk in the workshop Matt wrote about last week. In contrast, the CNN algorithm works directly on the seismic data, learning the most discriminative filters on its own, no attributes needed

### Intuition training

Machine learning isn't just useful for computing in the inverse direction such as with inversion, seismic interpretation, and so on. Johannes Amtmann showed us how machine learning can be useful for ranking the performance of different clustering methods using forward models. It was exciting to see: we need to get back into the habit of forward modeling, each and every one of us. Interpreters build synthetics to hone their seismic intuition. It's time to get insanely good at building forward models for machines, to help them hone theirs.

There were so many fascinating problems being worked on in this session. It was one of the best half-day sessions of technical content I've ever witnessed at a subsurface conference. Thanks and well done to everyone who presented.

# SEG-Y Rev 2 again: little-endian is legal!

Big news! Little-endian byte order is finally legal in SEG-Y files.

That's not all. I already spilled the beans on 64-bit floats. You can now have up to 18 quintillion traces (18 exatraces?) in a seismic line. And, finally, the hyphen confusion is cleared up: it's 'SEG-Y', with a hyphen. All this is spelled out in the new SEG-Y specification, Revision 2.0, which was officially released yesterday after at least five years in the making. Congratulations to Jill Lewis, Rune Hagelund, Stewart Levin, and the rest of the SEG Technical Standards Committee

### Back up a sec: what's an endian?

Whenever you have to think about the order of bytes (the 8-bit chunks in a 'word' of 32 bits, for example) — for instance when you send data down a wire, or store bytes in memory, or in a file on disk — you have to decide if you're Roman Catholic or Church of England.

### What?

In one of the more obscure satirical analogies in English literature, Jonathan Swift wrote about the ideological tussle between between two factions of Lilliputians in Gulliver's Travels (1726). The Big-Endians liked to break their eggs at the big end, while the Little-Endians preferred the pointier option. Chaos ensued.

Two hundred and fifty years later, Danny Cohen borrowed the terminology in his 1 April 1980 paper, On Holy Wars and a Plea for Peace — in which he positioned the Big-Endians, preferring to store the big bytes first in memory, against the Little-Endians, who naturally prefer to store the little ones first. Big bytes first is how the Internet shuttles data around, so big-endian is sometimes called network byte order. The drawing (right) shows how the 4 bytes in a 32-bit 'word' (the hexadecimal codes 0A, 0B, 0C and 0D) sit in memory.

Because we write ordinary numbers big-endian style — 2017 has the thousands first, the units last — big-endian might seem intuitive. Then again, lots of people write dates as, say, 31-03-2017, which is analogous to little-endian order. Cohen reviews the computational considerations in his paper, but really these are just conventions. Some architectures pick one, some pick the other. It just happens that the x86 architecture that powers most desktop and laptop computers is little-endian, so people have been illegally (and often accidentally) writing little-endian SEG-Y files for ages. Now it's actually allowed.

Still other byte orders are possible. Some processors, notably ARM and other RISC architectures, are middle-endian (aka mixed endian or bi-endian). You can think of this as analogous to the month-first American date format: 03-31-2017. For example, the two halves of a 32-bit word might be reversed compared to their 'pure' endian order. I guess this is like breaking your boiled egg in the middle. Swift did not tell us which religious denomination these hapless folks subscribe to.

### OK, that's enough about byte order

I agree. So I'll end with this handy SEG-Y cheatsheet. Click here for the PDF.

### References and acknowledgments

Cohen, Danny (April 1, 1980). On Holy Wars and a Plea for PeaceIETF. IEN 137. "...which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word? The followers of the former approach are called the Little-Endians, and the followers of the latter are called the Big-Endians." Also published at IEEE ComputerOctober 1981 issue.

### Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

# More precise SEG-Y?

The impending SEG-Y Revision 2 release allows the use of double-precision floating point numbers. This news might leave some people thinking: "What?".

### Integers and floats

In most computing environments, there are various kinds of number. The main two are integers and floating point numbers. Let's take a quick look at integers, or ints, first.

Integers can only represent round numbers: 0, 1, 2, 3, etc. They can have two main flavours: signed and unsigned, and various bit-depths, e.g. 8-bit, 16-bit, and so on. An 8-bit unsigned integer can have values between 0 and 255; signed ints go from -128 to +127 using a mathematical operation called two's complement.

As you might guess, floating point numbers, or floats, are used to represent all the other numbers — you know, numbers like 4.1 and –7.2346312 × 10¹³ — we need lots of those.

### Floats in binary

OK, so we need to know about floats. To understand what double-precision means, we need to know how floats are represented in computers. In other words, how on earth can a binary number like 01000010011011001010110100010101 represent a floating point number?

It's fairly easy to understand how integers are stored in binary: the 8-bit binary number 01001101 is the integer 77 in decimal, or 4D in hexadecimal; 11111111 is 255 (base 10) or FF (base 16) if we're dealing with unsigned ints, or -1 decimal if we're in the two's complement realm of signed ints.

Clearly we can only represent a certain number of values with, say, 16 bits. This would give us 65 536 integers... but that's not enough dynamic range represent tiny or gigantic floats, not if we want any precision at all. So we have a bit of a paradox: we'd like to represent a huge range of numbers (down around the mass of an electron, say, and up to Avogadro's number), but with reasonably high precision, at least a few significant figures. This is where floating point representations come in.

### Scientific notation, sort of

If you're thinking about scientific notation, you're thinking on the right lines. We raise some base (say, 10) to some integer exponent, and multiply by another integer (called the mantissa, or significand). That way, we can write a huge range of numbers with plenty of precision, using only two integers. So:

$$3.14159 = 314159 \times 10^{-5} \ \ \mathrm{and} \ \ 6.02214 \times 10^{23} = 602214 \times 10^{18}$$

If I have two bytes at my disposal (so-called 'half precision'), I could have an 8-bit int for the integer part, called the significand, and another 8-bit int for the exponent. Then we could have floats from $$0$$ up to $$255 \times 10^{255}$$. The range is pretty good, but clearly I need a way to get negative significands — maybe I could use one bit for the sign, and leave 7 bits for the exponent. I also need a way to get negative exponents — I could assign a bias of –64 to the exponent, so that 127 becomes 63 and an exponent of 0 becomes –64. More bits would help, and there are other ways to apportion the bits, and we can use tricks like assuming that the significand starts with a 1, storing only the fractional part and thereby saving a bit. Every bit counts!

### IBM vs IEEE floats

The IBM float and IEEE 754-2008 specifications are just different ways of splitting up the bits in a floating point representation. Single-precision (32-bit) IBM floats differ from single-precision IEEE floats in two ways: they use 7 bits and a base of 16 for the exponent. In contrast, IEEE floats — which are used by essentially all modern computers — use 8 bits and base 2 (usually) for the exponent. The IEEE standard also defines not-a-numbers (NaNs), and positive and negative infinities, among other conveniences for computing.

In double-precision, IBM floats get 56 bits for the fraction of the significand, allowing greater precision. There are still only 7 bits for the exponent, so there's no change in the dynamic range. 64-bit IEEE floats, however, use 11 bits for the exponent, leaving 52 bits for the fraction of the significand. This scheme results in 15–17 sigificant figures in decimal numbers.

The diagram below shows how four bytes (0x42, 0x6C, 0xAD, 0x15) are interpreted under the two schemes. The results are quite different. Notice the extra bit for the exponent in the IEEE representation, and the different bases and biases.

### IBM, IEEE, and seismic

When SEG-Y was defined in 1975, there were only IBM floats — IEEE floats were not defined until 1985. The SEG allowed the use of IEEE floating-point numbers in Revision 1 (2002), and they are still allowed in the impending Revision 2 specification. This is important because most computers these days use IEEE float representations, so if you want to read or write IBM floats, you're going to need to do some work.

The floating-point format in a particular SEG-Y file should be indicated by a flag in bytes 3225–3226. A value of 0x01 indicates IBM floats, while 0x05 indicates IEEE floats. Then again, you can't believe everything you read in headers. And, unfortunately, you can't tell an IBM float just by looking at it. Meisinger (2004) wrote a nice article in CSEG Recorder about the perils of loading IBM as IEEE and vice versa — illustrated below. You should read it.

From Meisinger, D (2004). SEGY floating point confusion. CSEG Recorder 29(7). Available online.

I wrote this post by accident while writing about endianness, the main big change in the new SEG-Y revision. Stay tuned for that post! [Update: here it is!]

### Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

# Burning the surface onto the subsurface

Previously, I described a few of the reasons why we don't get a clean ground surface event on land seismic data like we do the water-bottom in marine seismic. In land data, the worst part of the image is right at the surface. But ground level is not just tricky to see, it's impossible to see. Since the vibe truck is on the ground, there's no reflection from that surface. Even if there was some kind of event there, processors apply a magic eraser to the top of the section — the mute — to erase the early arrivals. So it's not possible to see the ground in land data, and you can't pick what isn't there.

But I still want to know where the ground is. Why can't we slap a ground-level seismic 'reflection' event on the section?

### What you need

We need the ground level, which is in depth of course, in the time domain of the seismic section. To compute this, let's call it $$t_\mathrm{G}$$, we need three pieces of information at every trace location: the ground elevation $$G$$, the seismic reference datum (SRD) which I'll call $$D$$, and the replacement velocity $$V_\mathrm{r}$$.

$$t_\mathrm{G} = \frac{2 (G - D)}{V_\mathrm{r}}$$

Ground elevation.  If you're lucky, you'll be able to find the ground elevation corresponding to each trace stored in the trace headers. Ground elevation might be located in bytes 41-44 or 45-48 of the trace header, which correspond to the receiver group elevation and the surface elevation of the source, respectively. These should be the same for a stacked trace, but as with any meta-data to do with SEGY, this info could be hiding somewhere else, or missing altogether. And if you're that unlucky, you might have to comb through processing reports for the missing information. If you are even more unlucky (as I was in this example), you won't have any kind of processing report to fall back on and you'll have to concoct something else. In the accompanying Jupyter notebook, I resorted to interpolating a digitized elevation profile from a JPEG plot of the seismic line. So if you're all out of options, you might find refuge in those legacy plots!

This profile is particularly wonky, because the seismic reference datum (red) is not the same across the profile

Seismic reference datum. And to make life yet more complicated, the seismic reference datum is not flat across the profile. It goes downhill and then flattens out (red line below). Don't ask me what the advantages are of processing data to a variable datum, but whatever they are, I hope they offset the disadvantages of all-to-easily mistaking the datum to be flat.

The replacement velocity is given in the sidelabel of the raster image online (shown right). It's 10 000 ft/sec, or 3048 m/s.

Byte locations 53-56 and 57-60 are the standard trace header placeholders reserved for holding the datum elevation at the receiver group and the datum elevation at source. Again, for a stacked trace, these should be the same value. If these fields are zeros, then check the fields of the Trace Header Extension. If they turn up empty, and if the datum is horizontal, it might be listed in the file's text header.

### Convert elevation to time

By definition, the seismic reference datum is horizontal in the time-domain (red line below). Notice how the ground elevation – in the time domain – plots mostly as negative values (before) time zero. In other words, most of the ground is being cut-off by the top of the section. So, if we want to see it, we need to shift everything down into the field of view. Conceptually, this means adjusting the seismic reference datum so it floats entirely above the ground-level. Computationally, we can achieve this easily enough by padding the top of the data with zeros.

A time-domain representation of the ground-level along the seismic profile. The surface of the earth extends above the start of the seismic data for most of the locations along the profile.

### Make the ground a pickable event

As a final bit of post-processing, we could actually burn the ground-level into the data as a sort of synthetic seismic event. The reason I like this concept is that it alleviates the need to dig up old-processing reports, puzzle over missing header data, or worse, maintain and munge external text files containing elevation information. I say, let's make it self-contained. Let's put it directly into the data so that it can be treated like any other seismic reflection. Why would I do this?

• You can see where there might be fold, velocity or other issues related to topography.
• You can immediately see the polarity of the data.
• You could use the bandwidth of the data to make the pseudo-reflector, giving a visual hint to the interpreter.
• Keeping track of amplitude adjustments and phase rotations would be self-documenting and reversible.
• you could autotrack it to get a topographic map (or just get this from the processor).
• It looks cool!

Seismic profile with ground level SYNTHETICALLY SLAPPED ON TOP.  Bandlimited, of course, so you can Autotrack till your hearts content!

I've deliberately constructed a band-limited reflection, opposed to placing a sharp spike at ground-level. The problem with a spike is that it has infinite bandwidth. It contains higher frequencies than the image, so as Carl Reine commented on that last post, that might not play nice with seismic attributes. Also, there's the problem of selecting an amplitude value to assign to the spike: we don't want to introduce amplitudes that are ridiculously out of range of the existing data.

### The whole image

I hereby propose that this synthetic ground level trick adopted as the new standard for any land seismic processing and interpretation. The great thing is, it can be done just as easily by interpreters and seismic data technologists, as by the processing companies that create the rest of the image. I realize we're adding stuff to the data that isn't actually signal. We do non-real things to signals all the time. The question is, do the benefits outweigh the artificiality?

Here's the view of the entire section:

The whole section, ground level included.

The details of this exercise can be found in the this Jupyter Notebook.

References

The seismic is line 36_77_PR from the USGS data repository.

SEG Y rev 2 Data Exchange Format. SEG Technical Standards Committee. Draft 2.0, January, 2015.

# Where is the ground?

This is the upper portion of a land seismic profile in Alaska. Can you pick a horizon where the ground surface is? Have a go at pickthis.io.

Pick the Ground surface at the top of the seismic section at pickthis.io.

Picking the ground surface on land-based seismic data is not straightforward. Picking the seafloor reflection on marine data, on the other hand, is usually a piece of cake, a warm-up pick. You can often auto-track the whole thing with a few seeds.

Seafloor reflection on Penobscot 3D survey, offshore Nova Scotia. from Matt's tutorial in the April 2016 The Leading Edge, The function of interpolation.

Why aren't interpreters more nervous that we don't know exactly where the surface of the earth is? I'm sure I'm not the only one that would like to have this information while interpreting. Wouldn't it be great if land seismic were more like marine?

Treacherously Jagged TopographY or Near-Surface processing ArtifactS?

If you're new to land-based seismic data, you might notice that there isn't a nice pickable event across the top of the section like we find in marine seismic data. Shot noise at the surface has been muted (deleted) in processing, and the low fold produces an unclean, jagged look at the top of the section. Additionally, the top of the section, time-zero — the seismic reference datum — usually floats somewhere above the land surface — and we can't know where that is unless it can be found in the file header, or looked up in the processing report.

The seismic reference datum, at a two-way time of zero seconds on seismic data, is typically set at mean sea level for offshore data. For land data, it is usually chosen to 'float' above the land surface.

### Reframing the question

This challenge is a bit of a trick question. It begs the viewer to recognize that the seemingly simple task of mapping the ground level on a land seismic section is actually a rudimentary velocity modeling or depth conversion exercise in itself. Wouldn't it be nice to have the ground surface expressed as pickable seismic event? Shouldn't we have it always in our images? Baked into our data, so to speak, such that we've always got an unambiguous pick? In the next post, I'll illustrate what I mean and show what's involved in putting it in.

In the meantime, I challenge you to pick where you think the (currently absent) ground surface is on this profile, so in the next post we can see how well you did.

# x lines of Python: read and write SEG-Y

Reading SEG-Y files comes up a lot in the geophysicist's workflow. Writing, less often, but it does come up occasionally. As long as we're mostly concerned with trace data and not location, both of these tasks can be fairly easily accomplished with ObsPy.

Today we'll load some seismic, compute an attribute on it, and save a new SEG-Y, in 10 lines of Python.

ObsPy is a rare thing. It demonstrates what a research group can accomplish with a little planning and a lot of perseverance (cf my whinging earlier this year about certain consortiums in our field). It's an open source Python package from the geophysicists at the University of Munich — Karl Bernhard Zoeppritz studied there for a while, so you know it's legit. The tool serves their research in earthquake and global seismology needs, and also happens to handle SEG-Y files quite nicely.

Aside: I think SixtyNorth's segpy is actually the way to go for reading and writing SEG-Y; ObsPy is probably overkill for most applications — it's about 80 times the size for one thing. I just happen to be familiar with it and it's super easy to install: conda install obspy. So, since minimalism is kind of the point here, look out for a future x lines of Python using that library.

### The sentences

As before, we'd like to express the process in just a few sentences of plain English. Assuming we just want to read the data into a NumPy array, look at it, do something to it, and write a new file, here's what we're doing:

1. Read (or really index) the file as an ObsPy Stream object.
2. Stack (in the NumPy sense) the Trace objects into a single NumPy array. We have data!
3. Get the 99th percentile of the amplitudes to make plotting easier.
4. Plot the data so we can see it.
5. Get the sample interval of the data from a trace header.
6. Compute the similarity attribute using our library bruges.
7. Make a new Stream object to hold the outbound data.