October 10, 2014

Great geophysicists #12: Gauss

October 10, 2014/ Matt Hall

Carl Friedrich Gauss was born on 30 April 1777 in Braunschweig (Brunswick), and died at the age of 77 on 23 February 1855 in Göttingen. He was a mathematician, you've probably heard of him; he even has his own Linnean handle: Princeps mathematicorum, or Prince of mathematicians (I assume it's the royal kind, not the Purple Rain kind — ba dum tss).

Gauss's parents were poor, working class folk. I wonder what they made of their child prodigy, who allegedly once stunned his teachers by summing the integers up to 100 in seconds? At about 16, he was quite a clever-clogs, rediscovering Bode's law, the binomial theorem, and the prime number theorem. Ridiculous.

His only imperfection was that he was too much of a perfectionist. His motto was pauca sed matura, meaning "few, but ripe". It's understandable how someone so bright might not feel much need to share his work, but historian Eric Temple Bell reckoned that if Gauss had published his work regularly, he would have advanced mathematics by fifty years.

He was only 6 when Euler died, but surely knew his work. Euler is the only other person who made comparably broad contributions to what we now call the exploration geophysics toolbox, and applied physics in general. Here are a few:

He proved the fundamental theorems of algebra and arithmetic. No big deal.
He formulated the Gaussian function — which of course crops up everywhere, especially in geostatistics. The Ricker wavelet is a pulse with frequencies distributed in a Gaussian.
The gauss is the cgs unit of magnetic flux density, thanks to his work on the flux theorem, one of Maxwell's equations.
He discovered the Cauchy integral theorem for contour integrals but did not publish it.
The 'second' or 'total' curvature — a coordinate-system-independent measure of spatial curvedness — is named after him.
He made discoveries in non-Euclidean geometry, but did not publish them.

Excitingly, Gauss is the first great geophysicist we've covered in this series to have been photographed (right). Unfortunately, he was already dead. But what an amazing thing, to peer back through time almost 160 years.

Next time: Augustin-Jean Fresnel, a pioneer of wave theory.

September 26, 2014

The hackathon is coming

September 26, 2014/ Matt Hall

The Geophysics Hackathon is one month away! Signing up is not mandatory — you can show up on the day if you like — but it does help with the planning. It's 100% free, and I guarantee you'll enjoy yourself. You'll also learn tons about geophysics and about building software. Deets: Thrive, Denver, 8 am, 25–26 October. Bring a laptop.

Need more? Here's all the info you could ask for. Even more? Ask by email or in the comments.

Send your project ideas

The theme this year is RESOLUTION. Participants are encouraged to post projects to hackathon.io ahead of time — especially if you want to recruit others to help. And even if you're not coming to the event, we'd love to hear your project ideas. Here are some of the proto-ideas we have so far:

Compute likely spatial and temporal resolution from some basic acquisition info: source, design, etc.
Do the same but from information from the stack: trace spacing, apparent bandwidth, etc.
Find and connect literature about seismic and log resolution using online bibliographic data.
What does the seismic spectrum look like, given STFT limitations, or Gabor uncertainty?

If you have a bright idea, get in touch by email or in the comments. We'd love to hear from you.

Thank you to our sponsors

Three forward-thinking companies have joined us in making the hackathon as much a geophysics party as well as a scientific workshop (a real workshop). I think this industry may have trained us to take event sponsorship for granted, but it's easy to throw $5000 at the Marriott for Yet Another Coffee Break. Handing over money to a random little company in Nova Scotia to buy coffee, tacos, and cool swag for hungry geophysicists and programmers takes real guts!

Please take a minute to check out our sponsors and reward them for supporting innovation in our community.

Students: we are offering $250 bursaries to anyone looking for help with travel or accommodation. Just drop me a line with a project idea. If you know a student that might enjoy the event, please forwadrd this to them.

September 09, 2014

The road to Modelr: my EuroSciPy poster

September 09, 2014/ Matt Hall

At EuroSciPy recently, I gave a poster-ized version of the talk I did at SciPy. Unlike most of the other presentations at EuroSciPy, my poster didn't cover a lot of the science (which is well understood), or the code (which is esoteric).

Instead it focused on the advantages of spreading software via web applications, rather than only via source code, and on the challenges that we overcame — well, that we're still overcoming — to get our Modelr tool out there. I wanted other programmer-scientists to think about running some of their code as a web app for others to enjoy, but to be aware of the effort involved in doing this.

I've written before about my dislike of posters, though I'm told they are an important component at, say, the AGU Fall Meeting. I admit I do quite like the process of making them, and — on advice from Colin Purrington's useful page — I left a space on the poster for people to write comments or leave sticky notes. As a result, I heard about Docker, a lead I'll certainly follow up,

What's new in modelr

This wasn't part of the poster, but I might as well take the chance to let you know what we've updated recently:

You can now add noise to models by specifying the signal:noise.
Instead of automatic scaling, you can choose your own gain.
The app now returns the elastic moduli of the rocks in the model.
You can choose a spatial cross-section view or a space–offset–frequency view.

All of these features are now available to subscribers for only $9/month. Amazing value :)

Figshare

I've stored my poster on Figshare, a data storage site and part of Macmillan's Digital Science effort. What I love about Figshare, apart from the convenience of cloud-based storage and easy access for others, is that every item gets a digital object identifier or DOI. You've probably seen these on journal articles. They're a bit like other persistent and unique IDs for publications, such as ISBNs for books, but the idea is to provide more interactivity by making it easily linkable: you can get to any object with a DOI by prepending it with "http://dx.doi.org/".

Reference

Hall, M (2014). The road to modelr: building a commercial web app on an open source foundation. EuroSciPy, Cambridge, UK, August 29–30, 2014. Poster presentation. DOI:10.6084/m9.figshare.1151653

September 04, 2014

Julia in a nutshell

September 04, 2014/ Matt Hall

Julia is the most talked-about language in the scientific Python community. Well, OK, maybe second to Python... but only just. I noticed this at SciPy in July, and again at EuroSciPy last weekend.

As promised, here's my attempt to explain why scientists are so excited about it.

Why is everyone so interested in Julia?

At some high level, Julia seems to solve what Steven Johnson (MIT) described at EuroSciPy on Friday as 'the two-language problem'. It's also known as Outerhout's dichotomy. Basically, there are system languages (hard to use, fast), and scripting languages (easy to use, slow). Attempts to get the best of boths worlds have tended to result in a bit of a mess. Until Julia.

Really though, why?

Cheap speed. Computer scientists adore C because it's rigorous and fast. Scientists and web developers worship Python because it's forgiving and usually fast enough. But the trade-off has led to various cunning ploys to get the best of both worlds, e.g. PyPy and Cython. Julia is perhaps the cunningest ploy of all, achieving speeds that compare with C, but with readable code, dynamic typing, garbage collection, multiple dispatch, and some really cool tricks like Unicode variable names that you enter in pure LaTeX. And check out this function definition shorthand:

Why is Julia so fast?

Machines don't understand programming languages — the code written by humans has to be translated into machine language in a process called 'compiling'. There are three approaches:

Compile ahead of time — e.g. Fortran, C, C++
Compile just in time (often called JIT) — e.g. JavaScript, Julia
Interpret — e.g. Python, MATLAB, Perl, Ruby

Compiling makes languages fast, because the executed code is tuned to the task (e.g. in terms of the types of variables it handles), and to the hardware it's running on. Indeed, it's only by building special code for, say, integers, that compiled languages achieve the speeds they do.

Julia is compiled, like C or Fortran, so it's fast. However, unlike C and Fortran, which are compiled before execution, Julia is compiled at runtime ('just in time' for execution). So it looks a little like an interpreted language: you can write a script, hit 'run' and it just works, just like you can with Python.

You can even see what the generated machine code looks like:

Don't worry, I can't read it either.

But how is it still dynamically typed?

Because the compiler can only build machine code for specific types — integers, floats, and so on — most compiled languages have static typing. The upshot of this is that the programmer has to declare the type of each variable, making the code rather brittle. Compared to dynamically typed languages like Python, in which any variable can be any type at any time, this makes coding a bit... tricky. (A computer scientist might say it's supposed to be tricky — you should know the type of everything — but we're just trying to get some science done.)

So how does Julia cope with dynamic typing and still compile everything before execution? This is the clever bit: Julia scans the instructions and compiles for the types it finds — a process called type inference — then makes the bytecode, and caches it. If you then call the same instructions but with a different type, Julia recompiles for that type, and caches the new bytecode in another location. Subsequent runs use the appropriate bytecode, with recompilation.

Metaprogramming

It gets better. By employing metaprogramming — on-the-fly code generation for special tasks — it's possible for Julia to be even faster than highly optimized Fortran code (right), in which metaprogramming is unpleasantly difficult. So, for example, in Fortran one might tolerate a relatively slow loop that can only be avoided with code generation tricks; in Julia the faster route is much easier. Here's Steven's example.

Interoperability and parallelism

It gets even better. Julia has been built with interoperability in mind, so calling C — or Python — from inside Julia is easy. Projects like Jupyter will only push this further, and I expect Julia to soon be the friendiest way to speed up that stubborn inner NumPy loop. And I'm told a lot of thoughtful design has gone into Julia's parallel processing libraries... I have never found an easy way into that world, so I hope it's true.

I'm not even close to being able to describe all the other cool things Julia, which is still a young language, can do. Much of it will only be of interest to 'real' programmers. In many ways, Julia seems to be 'Python for C programmers'.

If you're really interested, read Steven's slides and especially his notebooks. Better yet, just install Julia and IJulia, and play around. Here's another tutorial and a cheatsheet to get you started.

September 02, 2014

Highlights from EuroSciPy

September 02, 2014/ Matt Hall

In July, Agile reported from SciPy in Austin, Texas, one of several annual conferences for people writing scientific software in the Python programming language. I liked it so much I was hungry for more, so at the end of my recent trip to Europe I traveled to the city of Cambridge, UK, to participate in EuroSciPy.

The conference was quite a bit smaller than its US parent, but still offered 2 days of tutorials, 2 days of tech talks, and a day of sprints. It all took place in the impressive William Gates Building, just west of the beautiful late Medieval city centre, and just east of Schlumberger's cool-looking research centre. What follows are my highlights...

Okay you win, Julia

Steven Johnson, an applied mathematician at MIT, gave the keynote on the first morning. His focus was Julia, the current darling of the scientific computing community, and part of a new ecosystem of languages that seek to cooperate, not compete. I'd been sort of ignoring Julia, in the hope that it might go away and let me focus on Python, the world's most useful language, and JavaScript, the world's most useful pidgin... but I don't think scientists can ignore Julia much longer.

I started writing about what makes Julia so interesting, but it turned into another post — up next. Spoiler: it's speed. [Edit: Here is that post! Julia in a nutshell.]

Learning from astrophysics

The Astropy project is a truly inspiring community — in just 2 years it has synthesized a dozen or so disparate astronomy libraries into an increasingly coherent and robust toolbox for astronomers and atrophysicists. What does this mean?

The software is well-tested and reliable.
Datatypes and coordinate systems are rich and consistent.
Documentation is useful and evenly distributed.
There is a tangible project to rally developers and coordinate funding.

Geophysicists might even be interested in some of the components of Astropy and the related SunPy project, for example:

astropy.units, just part of the ever-growing astropy library, as a unit conversion and quantity handler to compare with pint.
sunpy datatypes map and spectra for types of data that need special methods.
asv is a code-agnostic benchmarking library, a bit like freebench.

Speed dating for scientists

Much of my work is about connecting geoscientists in meaningful collaboration. There are several ways to achieve this, other than through project work: unsessions, wikis, hackathons, and so on. Now there's another way: speed dating.

Okay, it doesn't quite get to the collaboration stage, but Vaggi and his co-authors shared an ingenious way to connect people and give their professional relationship the best chance of success (an amazing insight, a new algorithm, or some software). They asked everyone at a small 40-person conference to complete a questionnaire that asked, among other things, what they knew about, who they knew, and, crucially, what they wanted to know about. Then they applied graph theory to find the most desired new connections (the matrix shows the degree of similarity of interests, red is high), and gave the scientists five 10-minute 'dates' with scientists whose interests overlapped with theirs, and five more with scientists who knew about fields that were new to them. Brilliant! We have to try this at SEG.

Vaggi, F, T Schiavinotto, A Csikasz-Nagy, and R Carazo-Salas (2014). Mixing scientists at conferences using speed dating. Poster presentation at EuroSciPy, Cambridge, UK, August 2014. Code on GitHub.

Vaggi, F, T Schiavinotto, J Lawson, A Chessel, J Dodgson, M Geymonat, M Sato, R Carazo Salas, A Csikasz Nagy (2014). A network approach to mixing delegates at meetings. eLife, 3. DOI: 10.7554/eLife.02273

Other highlights

sumatra to generate and keep track of simulations.
vispy, an OpenGL-based visualization library, now has higher-level, more Pythonic components.
Ian Osvald's IPython add-on for RAM usage.
imageio for lightweight I/O of image files.
nbagg backend for matplotlib version 1.4, bringin native (non-JS) interactivity.
An on-the-fly kernel chooser in upcoming IPython 3 (currently in dev).

All in all, the technical program was a great couple of days, filled with the usual note-taking and hand-shaking. I had some good conversations around my poster on modelr. I got a quick tour of the University of Cambridge geophysics department (thanks to @lizzieday), which made me a little nostalgic for British academic institutions. A fun week!

August 28, 2014

Burrowing by burning

August 28, 2014/ Matt Hall

Most kind of mining are low-yield games. For example, the world's annual gold production would fit in a 55 m² room. But few mining operations I'm aware of are as low yield as the one that ran in Melle, France, from about 500 till 950 CE, producing silver for the Carolingian empire and Charlemagne's coins. I visited the site on Saturday.

The tour made it clear just how hard humans had to work to bring about commerce and industry in the Middle Ages. For a start, of course they had no machines, just picks and shovels. But the Middle Jurassic limestone is silicic and very hard, so to weaken the rock they set fires against the face and thermally shocked the rock to bits. The technique, called fire-setting, was common in the Middle Ages, and was described in detail by Georgius Agricola in his book De Re Metallica (right; aside: the best translation of this book is by Herbert Hoover!). Apart from being stupefyingly dangerous, the method is slow: each fire got the miners about 4 cm further into the earth. Incredibly, they excavated about 20 km of galleries this way, all within a few metres of the surface.

The fires were set against the walls and fuelled with wood, mostly beech. Recent experiments have found that one tonne of wood yielded about one tonne of rock. Since a tonne of rock yields 5 kg of galena, and this in turn yields 10 g of silver, we see that producing 1.1 tonnes of silver per year — enough for 640,000 deniers — was quite a feat!

There are several limits to such a resource intensive operation: wood, distance from face to works, maintenance, and willing human labour, not to mention the usual geological constraints. It is thought that, in the end, operations ended due to a shortage of wood.

Several archaeologists visit the site regularly (here's one geospatial paper I found mentioning the site: Arles et al. 2013), and the evidence of their attempts to reproduce the primitive metallurgical methods were on display. Here's my attempt to label everything, based on what I could glean from the tour guide's rapid French:

The image of the denier coin is licensed CC-BY-SA by Wikipedia user Lequenne Gwendoline.

August 21, 2014

At home with Leonardo

August 21, 2014/ Matt Hall

Well, OK, Leonardo da Vinci wasn't actually there, having been dead 495 years, but on Tuesday morning I visited the house at which he spent the last three years of his life. I say house, it's more of a mansion — the Château du Clos Lucé is a large 15th century manoir near the centre of the small market town of Amboise in the Loire valley of northern France. The town was once the royal seat of France, and the medieval grandeur still shows.

Leonardo was invited to France by King Francis I in 1516. Da Vinci had already served the French governor of Milan, and was feeling squeezed from Rome by upstarts Rafael and Michelangelo. It's nice to imagine that Frank appreciated Leo's intellect and creativity — he sort of collected artists and writers — but let's face it, it was probably the Italian's remarkable capacity for dreaming up war machines, a skill he had honed in the service of mercenary and cardinal Cesare Borgia. Leonardo especially seemed to like guns; here are models of a machine gun and a tank, alongside more peaceful concoctions:

Inspired by José Carcione's assertion that Leonardo was a geophysicst, and plenty of references to fossils (even Palaeodictyon) in his notebooks, I scoured the place for signs of Leonardo dablling in geology or geophysics, but to no avail. The partly-restored Renaissance floor tiles did have some inspiring textures and lots of crinoid fossils... I wonder if he noticed them as he shuffled around?

If you are ever in the area, I strongly recommend a visit. Even my kids (10, 6, and 4) enjoyed it, and it's close to some other worthy spots., specifically Chenonceau (for anyone) and Cheverny (for Tintin fans like me). The house, the numerous models, and the garden (below — complete with tasteful reproductions from Leonardo's works) were all terrific.

Check out José Carcione's two chapters about Leonardo and
his work in 52 Things You Should Know About Geophysics.
Download the chapter for free! [PDF, 3.8MB]

August 07, 2014

The Blangy equation

August 07, 2014/ Matt Hall

After reading Chris Liner's recent writings on attenuation and negative Q — both in The Leading Edge and on his blog — I've been reading up a bit on anisotropy. The idea was to stumble a little closer to writing the long-awaited Q is for Q post in our A to Z series. As usual, I got distracted...

In his 1994 paper AVO in tranversely isotropic media—An overview, Blangy (now the chief geophysicist at Hess) answered a simple question: How does anisotropy affect AVO? Stigler's law notwithstanding, I'm calling his solution the Blangy equation. The answer turns out to be: quite a bit, especially if impedance contrasts are low. In particular, Thomsen's parameter δ affects the AVO response at all offsets (except zero of course), while ε is relatively negligible up to about 30°.

The key figure is Figure 2. Part (a) shows isotropic vs anisotropic Type I, Type II, and Type III responses:

Unpeeling the equation

Converting the published equation to Python was straightforward (well, once Evan pointed out a typo — yay editors!). Here's a snippet, with the output (here's all of it):

For the plot below, I computed the terms of the equation separately for the Type II case. This way we can see the relative contributions of the terms. Note that the 3-term solution is equivalent to the Aki–Richards equation.

Interestingly, the 5-term result is almost the same as the 2-term approximation.

Reproducible results

One of the other nice features of this paper — and the thing that makes it reproducible — is the unambiguous display of the data used in the models. Often, this sort of thing is buried in the text, or not available at all. A table makes it clear:

Last thought: is it just me, or is it mind-blowing that this paper is now over 20 years old?

Reference

Blangy, JP (1994). AVO in tranversely isotropic media—An overview. Geophysics 59 (5), 775–781.

Don't miss the IPython Notebook that goes with this post.

July 31, 2014

July linkfest

July 31, 2014/ Matt Hall

It's linkfest time again. All the links, in one handy post.

First up — I've seen some remarkable scientific visualizations recently. For example, giant ocean vortices spiralling across the globe (shame about the rainbow colourbar though). Or the trillion-particle Dark Sky Simulation images we saw at SciPy. Or this wonderful (real, not simulated) video by the Perron Group at MIT:

Staying with visuals, I highly recommend reading anything by Mike Bostock, especially if you're into web technology. The inventor of D3.js, a popular data viz library, here's his exploration of algorithms, from sampling to sorting. It's more conceptual than straight up visualization of data, but no less insightful.

And I recently read about some visual goodness combined with one of my favourite subjects, openness. Peter Falkingham, a palaeontologist at the Royal Vetinary College and Brown University, has made a collection of 3D photographs of modern tracks and traces available to the world. He knows his data is more impactful when others can use it too.

Derald Smith and sedimentology

From Smith et al. (2009) in SEPM Special Publication No. 97.The geological world was darkened by the death of Derald Smith on 18 June. I met Derald a few times in connection with working on the McMurray Formation of Alberta, Canada during my time at ConocoPhillips. We spent an afternoon examining core and seismic data, and speculating about counter-point-bars, a specialty of his. He was an intuitive sedimentologist whose contributions will be remembered for many years.

Another geological Smith is being celebrated in September at the Geological Society of London's annual William Smith Meeting. The topic this year is The Future of Sequence Stratigraphy: Evolution or Revolution? Honestly, my first thought was "hasn't that conversation been going on since 1994?", but on closer inspection, it promises to be an interesting two days on 'source-to-sink', 'landscape into rock', and some other recent ideas.

The issue of patents reared up in June when Elon Musk of Tesla Motors announced the relaxation of their patents — essentially a promise not to sue anyone using one of their patented technology. He realizes that a world where lots of companies make electric vehicles is better for Tesla. I wrote a piece about patents in our industry.

Technology roundup

A few things that caught our eye online:

Along with our good friend Duncan Child, we started Software Underground, a dicussion group on subsurface software and entrepreneurship. It's in private beta for now — follow the links to request an invite.
We like colour. Matteo Niccoli's tutorial on colourmaps is out tomorrow in the August issue of The Leading Edge.
Colour came up at SciPy too — check out Kristen Thyng's talk at SciPy.
NASA scientist Rob Simmon tweeted last month about HCL Wizard, a wonderful new perceptual colour palette tool
WellDatabase.com is a new commercial site trying to unify access to public well data in the US. We still prefer Ted Kiernan's more open approach with PublicWellData.com, but competition is always good.
Rob Smallshire of Sixty North in Norway has rescued one of the few open tools that can read and write SEG Y data — here's segpy!

Last thing: did you know that the unit of acoustic impedance is the Rayl? Me neither.

Previous linkfests: April — January — October.

The figure is from Smith et al. (2009), Stratigraphy of counter-point-bar and eddy accretion deposits in low-energy meander belts of the Peace–Athabasca delta, northeast Alberta, Canada. In: SEPM Special Publication No. 97, ISBN 978-1-56576-305-0, p. 143–152. It is copyright of SEPM, and used here in accordance with their terms.

July 29, 2014

Graphics that repay careful study

July 29, 2014/ Evan Bianco

The Visual Display of Quantitative Information by Edward Tufte (2nd ed., Graphics Press, 2001) celebrates communication through data graphics. The book provides a vocabulary and practical theory for data graphics, and Tufte pulls no punches — he suggests why some graphics are better than others, and even condemns failed ones as lost opportunities. The book outlines empirical measures of graphical performance, and describes the pursuit of graphic-making as one of sequential improvement through revision and editing. I see this book as a sort of moral authority on visualization, and as the reference book for developing graphical taste.

Through design, the graphic artist allows the viewer to enter into a transaction with the data. High performance graphics, according to Tufte, 'repay careful study'. They support discovery, probing questions, and a deeper narrative. These kinds of graphics take a lot of work, but they do a lot of work in return. In later books Tufte writes, 'To clarify, add detail.'

A stochastic AVO crossplot

Consider this graphic from the stochastic AVO modeling section of modelr. Its elements are constructed with code, and since it is a program, it is completely reproducible.

Let's dissect some of the conceptual high points. This graphic shows all the data simultaneously across 3 domains, one in each panel. The data points are sampled from probability density estimates of the physical model. It is a large dataset from many calculations of angle-dependent reflectivity at an interface. The data is revealed with a semi-transparent overlay, so that areas of certainty are visually opaque, and areas of uncertainty are harder to see.

At the same time, you can still see every data point that makes the graphic giving a broad overview (the range and additive intensity of the lines and points) as well as the finer structure. We place the two modeled dimensions with templates in the background, alongside the physical model histograms. We can see, for instance, how likely we are to see a phase reversal, or a Class 3 response subject to the physical probability estimates. The statistical and site-specific nature of subsurface modeling is represented in spirit. All the data has context, and all the data has uncertainty.

Rules for graphics that work

Tufte summarizes that excellent data graphics should:

Show all the data.
Provoke the viewer into thinking about meaning.
Avoid distorting what the data have to say.
Present many numbers in a small space.
Make large data sets coherent.
Encourage the eye to compare different pieces of the data.
Reveal the data at several levels of detail, from a broad overview to the fine structure.
Serve a reasonably clear purpose: description, exploration, tabulation, or decoration.
Be closely integrated with the statistical and verbal descriptions of a data set.

The data density, or data-to-ink ratio, looks reasonably high in my crossplot, but it could like still be optimized. What would you remove? What would you add? What elements need revision?

Blog