Journalists are scientists

Tim Radford. Image: Stevyn Colgan.On Thursday I visited The Guardian’s beautiful offices in King’s Cross for one of their Masterclass sessions. Many of them have sold out, but Tim Radford’s science writing evening did so in hours, and the hundred-or-so budding writers present were palpably excited to be there. The newspaper is one of the most progressive news outlets in the world, and boasts many venerable alumni (John Maddox and John Durant among them). It was a pleasure just to wander around the building with a glass of wine, with some of London’s most eloquent nerds.

Radford is not a trained scientist, but a pure journalist. He left school at 16, idolized Dylan Thomas, joined a paper, wrote like hell, and sat on almost every desk before mostly retiring from The Guardian in 2005. He has won four awards from the Association of British Science Writers. More people read any one of his science articles on a random Tuesday morning over breakfast than will ever read anything I ever write. Tim Radford is, according to Ed Yong, the Yoda of science writers.

Within about 30 minutes it became clear what it means to be a skilled writer: Radford’s real craft is story-telling. He is completely at home addressing a crowd of scientists — he knows how to hold a mirror up to the geeks and reflect the fun, fascinating, world-changing awesomeness back at them. “It’s a terrible mistake to think that because you know about a subject you are equipped to write about it,” he told us, getting at how hard it is to see something from within. It might be easier to write creatively, and with due wonder, about fields outside our own.

Some in the audience weren’t content with being entertained by Radford, watching him in action as it were, preferring instead to dwell on controversy. He mostly swatted them aside, perfectly pleasantly, but one thing he was having none of was the supposed divide between scientists and journalists. Indeed, Radford asserted that journalists and scientists do basically the same thing: imagine a story (hypothesis), ask questions (do experiments), form a coherent story (theory) from the results, and publish. Journalists are scientists. Kind of.

I loved Radford's committed and unapologetic pragmatism, presumably the result of several decades of deadlines. “You don’t have to be ever so clever, you just have to be ever so quick,” and as a sort of corollary: “You can’t be perfectly right, but you must be mostly right.” One questioner accused journalists of sensationalising science (yawn). “Of course we do!” he said — because he wants his story in the paper, and he wants people to read it. Specifically, he wants people who don’t read science stories to read it. After all, writing for other people is all about giving them a sensation of one kind or another.

I got so much out of the 3 hours I could write at least another 2000 words, but I won’t. The evening was so popular that the paper decided to record the event and experiment with a pay-per-view video, so you can get all the goodness yourself. If you want more Radford wisdom, his Manifesto for the simple scribe is a must-read for anyone who writes.

Tim Radford's most recent book, The Address Book: Our Place in the Scheme of Things, came out in spring 2011.

The photograph of Tim Radford, at The World's Most Improbable Event on 30 September, is copyright of Stevyn Colgan, and used with his gracious permission. You should read his blog, Colganology. The photograph of King's Place, the Guardian's office building, is by flickr user Davide Simonetti, licensed CC-BY-NC.

My StrataConf highlights

Lots went on at the geologically named, but not geologically inclined, Strata Conference in London. Here are my highlights:

George Dyson was one of the keynote speakers on the first morning. The son of the British–American mathematician Freeman Dyson, George is an author and historian of science and computing. He talked about the history of storage, starting with tally sticks, through the 53kB of global digital storage in 1953, to today. His talk was fascinating. 

Simon Rogers was one of several speakers from the Guardian newspaper, one of the most progressive and online-friendly news outlets in the world. The paper has a host of strategies for putting data first:

  • Their data and viz geeks sit in the middle of news room
  • They built their own software library for data viz, Miso
  • They share the data behind every story on their Datablog

Duncan Irving from Teradata gave the audience a glimpse of the big data geoscientists wield, as I alluded to yesterday. Teradata does data warehousing, but with high technology extras like distributed storage and level of detail layers. I was intrigued by one of the technologies he talked about — SQL on Hadoop. This sounds like gobbledygook, but here's the (possibly horribly misunderstood) gist: store statistical attributes of a massive seismic volume in a database, then you can query them. "Show me all the traces with such-and-such seismic facies."   

Hjalmar Gislason from Datamarket, whose recent products include Energy Portal, gave us his best practices for publishing data:

  • Use simple formats, like CSV
  • Aim for at least 3 stars in Tim Berners-Lee's system
  • Be consistent across the datasets you publish
  • Put unique IDs everywhere, especially on tables and columns
  • Provide FAQs and clear feedback channels for users
  • Be clear about the license terms of the data

Ben Goldacre, author and bad science crimefighter, gave a keynote on the second day. Almost vibrating with energy, he described how the most basic bias-fighting tool in medicine — randomized controlled trials — might be applied to improving government services (Haynes et al., 2012, Test, learn, adapt). 

At the end of the two days, I had the usual feeling of fullness, fatigue, and anticlimax... but also the inspired, impatient, creative energy that I hope for from events. The consistency of the themes was encouraging — data wants to be free, visualization is necessary but insufficient, reproducibility is core, stories drive us — these are ideas we embrace. They're at the heart of the quiet revolution going on in the world, but perhaps not yet at the heart of our subsurface professional communities. 

Photo by flickr user bjelkeman.

Big data in geoscience

Big data is what we got when the decision cost of deleting data became greater than the cost of storing it.
George Dyson, at Strata London

I was looking for something to do in London this week. Tempted by the Deep-water contintental margins meeting in Piccadilly, I instead took the opportunity to attend a different kind of conference. The media group O'Reilly, led by the inspired Tim O'Reilly, organizes conferences. They're known for being energetic, quirky, and small-company-friendly. I wanted to see one, so I came to Strata.

Strata is the conference for big data, one of the woolliest buzzwords in computer science today. Some people are skeptical that it's anything other than a new way to provoke fear and uncertainty in IT executives, the only known way to make them spend money. Indeed, Google "big data" and the top 5 hits are: Wikipedia (obvsly), IBM, McKinsey, Oracle, and EMC. It might be hype, but all this attention might lead somewhere good. 

We're all big data scientists

Geoscientists, especially geophysicists, are unphased by the concept of big data. The acquisition data from a 3D survey can easily require 10TB (10,240GB) or even 100TB of storage. The data must be written, read, processed, and re-written dozens of times during processing, then delivered, loaded, and interpreted. In geoscience, big data is normal data. 

So it's great that big data problems are being hacked on by thousands of developers, researchers, and companies that, until about a year ago, were only interested in games and the web. About 99% of them are not working on problems in geophysics or petroleum, but there will be insight and technology that will benefit our industry.

It's not just about data management. Some of the most creative data scientists in the world are at this conference. People are showing dense, and sometimes beautiful, visualizations of giant datasets, like the transport displays by James Cheshire's research group at UCL (right). I can't wait to show some of these people a SEG-Y or LAS file and, unencumbered by our curmudgeonly tradition of analog display metaphors, see how they would display it.

Would the wiggle display pass muster?

News of the month

Our more-or-less regular news round-up is here again. News tips?

Geophysics giant

On Monday the French geophysics company CGGVeritas announced a deal to buy most of Fugro's Geoscience division for €1.2 billion (a little over $1.5 billion). What's more, the two companies will enter into a joint venture in seabed acquisition. Fugro, based in the Netherlands, will pay CGGVeritas €225 million for the privilege. CGGVeritas also pick up commercial rights to Fugro's data library, which they will retain. Over 2500 people are involved in the deal — and CGGVeritas are now officially Really Big. 

Big open data?

As Evan mentioned in his reports from the SEG IQ Earth Forum, Statoil is releasing some of their Gullfaks dataset through the SEG. This dataset is already 'out there' as the Petrel demo data, though there has not yet been an announcement of exactly what's in the package. We hope it includes gathers, production data, core photos, and so on. The industry needs more open data! What legacy dataset could your company release to kickstart innovation?

Journal innovation

Again, as Evan reported recently, SEG is launching a new peer-reviewed, quarterly journal — Interpretation. The first articles will appear in early 2013. The journal will be open access... but only till the end of 2013. Perhaps they will reconsider if they get hundreds of emails asking for it to remain open access! Imagine the impact on the reach and relevance of the SEG that would have. Why not email the editorial team?

In another dabble with openness, The Leading Edge has opened up its latest issue on reserves estimation, so you don't need to be an SEG member to read it. Why not forward it to your local geologist and reservoir engineer?

Updating a standard

It's all about SEG this month! The SEG is appealing for help revising the SEG-Y standard, for its revision 2. If you've ever whined about the lack of standardness in the existing standard, now's your chance to help fix it. If you haven't whined about SEG-Y, then I envy you, because you've obviously never had to load seismic data. This is a welcome step, though I wonder if the real problems are not in the standard itself, but in education and adoption.

The SEG-Y meeting is at the Annual Meeting, which is coming up in November. The technical program is now online, a fact which made me wonder why on earth I paid $15 for a flash drive with the abstracts on it.

Log analysis in OpendTect

We've written before about CLAS, a new OpendTect plug-in for well logs and petrophysics. It's now called CLAS Lite, and is advertised as being 'by Sitfal', though it was previously 'by Geoinfo'. We haven't tried it yet, but the screenshots look very promising.

This regular news feature is for information only. We aren't connected with any of these organizations, and don't necessarily endorse their products or services. Except OpendTect, which we definitely do endorse.

L is for Lambda

Hooke's law says that the force F exerted by a spring depends only on its displacement x from equilibrium, and the spring constant k of the spring:

.

We can think of k—and experience it—as stiffness. The spring constant is a property of the spring. In a sense, it is the spring. Rocks are like springs, in that they have some elasticity. We'd like to know the spring constant of our rocks, because it can help us predict useful things like porosity. 

Hooke's law is the basis for elasticity theory, in which we express the law as

stress [force per unit area] is equal to strain [deformation] times a constant

This time the constant of proportionality is called the elastic modulus. And there isn't just one of them. Why more complicated? Well, rocks are like springs, but they are three dimensional.

In three dimensions, assuming isotropy, the shear modulus μ plays the role of the spring constant for shear waves. But for compressional waves we need λ+2μ, a quantity called the P-wave modulus. So λ is one part of the term that tells us how rocks get squished by P-waves.

These mysterious quantities λ and µ are Lamé's first and second parameters. They are intrinsic properties of all materials, including rocks. Like all elastic moduli, they have units of force per unit area, or pascals [Pa].

So what is λ?

Matt and I have spent several hours discussing how to describe lambda. Unlike Young's modulus E, or Poisson's ratio ν, our friend λ does not have a simple physical description. Young's modulus just determines how much longer something gets when I stretch it. Poisson's ratio tells how much fatter something gets if I squeeze it. But lambda... what is lambda?

  • λ is sometimes called incompressibility, a name best avoided because it's sometimes also used for the bulk modulus, K.  
  • If we apply stress σ1 along the 1 direction to this linearly elastic isotropic cube (right), then λ represents the 'spring constant' that scales the strain ε along the directions perpendicular to the applied stress.
  • The derivation of Hooke's law in 3D requires tensors, which we're not getting into here. The point is that λ and μ help give the simplest form of the equations (right, shown for one dimension).

The significance of elastic properties is that they determine how a material is temporarily deformed by a passing seismic wave. Shear waves propagate by orthogonal displacements relative to the propagation direction—this deformation is determined by µ. In contrast, P-waves propagate by displacements parallel to the propagation direction, and this deformation is inversely proportional to M, which is 2µ + λ

Lambda rears its head in seismic petrophysics, AVO inversion, and is the first letter in the acronym of Bill Goodway's popular LMR inversion method (Goodway, 2001). Even though it is fundamental to seismic, there's no doubt that λ is not intuitively understood by most geoscientists. Have you ever tried to explain lambda to someone? What description of λ do you find useful? I'm open to suggestions. 

Goodway, B., 2001, AVO and Lame' constants for rock parameterization and fluid detection: CSEG Recorder, 26, no. 6, 39-60.

On being the world's smallest technical publishing company

Four months ago we launched our first book, 52 Things You Should Know About Geophysics. This little book contains 52 short essays by 37 amazing geoscientists. And me and Evan. 

Since it launched, we've been having fun hearing from people who have enjoyed it:

Yesterday's mail brought me an envelope from Stavanger — Matteo Niccoli sent me a copy of 52 Things. In doing so he beat me to the punch as I've been meaning to purchase a copy for some time. It's a good thing I didn't buy one — I'd like to buy a dozen. [a Calgary geoscientist]

A really valuable collection of advice from the elite in Geophysics to help you on your way to becoming a better more competent Geophysicist. [a review on Amazon.co.uk]

We are interested in ordering 50 to 100 copies of the book 52 Things You Should Know About Geophysics [from an E&P company. They later ordered 100.]

The economics

We thought some people might be interested in the economics of self-publishing. If you want to know more, please ask in the comments — we're happy to share our experiences. 

We didn't approach a publisher with our book. We knew we wanted to bootstrap and learn — the Agile way. Before going with Amazon's CreateSpace platform, we considered Lightning Source (another print-on-demand provider), and an ordinary 'web press' printer in China. The advantages of CreateSpace are Amazon's obvious global reach, and not having to carry any inventory. The advantages of a web press are the low printing cost per book and the range of options — recycled paper, matte finish, gatefold cover, and so on.

So, what does a book cost?

  • You could publish a book this way for $0. But, unless you're an editor and designer, you might be a bit disappointed with your results. We spent about $4000 making the book: interior design about $2000, cover design was about $650, indexing about $450. We lease the publishing software (Adobe InDesign) for about $35 per month.
  • Each book costs $2.43 to manufacture. Books are printed just in time — Amazon's machines must be truly amazing. I'd love to see them in action. 
  • The cover price is $19 at Amazon.com, about €15 at Amazon's European stores, and £12 at Amazon.co.uk. Amazon are free to offer whatever discounts they like, at their expense (currently 10% at Amazon.com). And of course you can get free shipping. Amazon charges a 40% fee, so after we pay for the manufacturing, we're left with about $8 per book. 
  • We also sell through our own estore, at $19. This is just a slightly customizable Amazon page. This channel is good for us because Amazon only charges 20% of the sale price as their fee. So we make about $12 per book this way. We can give discounts here too — for large orders, and for the authors.
  • Amazon also sells the book through a so-called expanded distribution channel, which puts the book on other websites and even into bookstores (highly unlikely in our case). Unfortunately, it doesn't give us access to universities and libraries. Amazon's take is 60% through this channel.
  • We sell a Kindle edition for $9. This is a bargain, by the way—making an attractive and functional ebook was not easy. The images and equations look terrible, ebook typography is poor, and it just feels less like a book, so we felt $9 was about right. The physical book is much nicer. Kindle royalties are complicated, but we make about $5 per sale. 

By the numbers

It doesn't pay to fixate on metrics—most of the things we care about are impossible to measure. But we're quantitative people, and numbers are hard to resist. To recoup our costs, not counting the time we lovingly invested, we need to sell 632 books. (Coincidentally, this is about how many people visit agilegeoscience.com every week.) As of right now, there are 476 books out there 'in the wild', 271 of which were sold for actual money. That's a good audience of people — picture them, sitting there, reading about geophysics, just for the love of it.

The bottom line

My wife Kara is an experienced non-fiction publisher. She's worked all over the world in editorial and production. So we knew what we were getting into, more or less. The print-on-demand stuff was new to her, and the retail side of things. We already knew we suck at marketing. But the point is, we knew we weren't in this for the money, and it's about relevant and interesting books, not marketing.

And now we know what we're doing. Sorta. We're in the process of collecting 52 Things about geology, and are planning others. So we're in this for one or two more whatever happens, and we hope we get to do many more.

We can't say this often enough: Thank You to our wonderful authors. And Thank You to everyone who has put down some hard-earned cash for a copy. You are awesome. 

Cross plots: a non-answer

On Monday I asked whether we should make crossplots according to statistical rules or natural rules. There was some fun discussion, and some awesome computation from Henry Herrera, and a couple of gems:

Physics likes math, but math doesn't care about physics — @jeffersonite

But... when I consider the intercept point I cannot possibly imagine a rock that has high porosity and zero impedance — Matteo Niccoli, aka @My_Carta

I tried asking on Stack Overflow once, but didn’t really get to the bottom of it, or perhaps I just wasn't convinced. The consensus seems to be that the statistical answer is to put porosity on y-axis, because that way you minimize the prediction error on porosity. But I feel—and this is just my flaky intuition talking—like this fails to represent nature (whatever that means) and so maybe that error reduction is spurious somehow.

Reversing the plot to what I think of as the natural, causation-respecting plot may not be that unreasonable. It's effectively the same as reducing the error on what was x (that is, impedance), instead of y. Since impedance is our measured data, we could say this regression respects the measured data more than the statistical, non-causation-respecting plot.

So must we choose? Minimize the error on the prediction, or minimize the error on the predictor. Let's see. In the plot on the right, I used the two methods to predict porosity at the red points from the blue. That is, I did the regression on the blue points; the red points are my blind data (new wells, perhaps). Surprisingly, the statistical method gives an RMS error of 0.034, the natural method 0.023. So my intuition is vindicated! 

Unfortunately if I reverse the datasets and instead model the red points, then predict the blue, the effect is also reversed: the statistical method does better with 0.029 instead of 0.034. So my intuition is wounded once more, and limps off for an early bath.

Irreducible error?

Here's what I think: there's an irreducible error of prediction. We can beg, borrow or steal error from one variable, but then it goes on the other. It's reminiscent of Heisenberg's uncertainty principle, but in this case, we can't have arbitrarily precise forecasts from imperfectly correlated data. So what can we do? Pick a method, justify it to yourself, test your assumptions, and then be consistent. And report your errors at every step. 

I'm reminded of the adage 'Correlation does not equal causation.' Indeed. And, to borrow @jeffersonite's phrase, it seems correlation also does not care about causation.

Cross plot or plot cross?

I am stumped. About once a year, for the last nine years or so, I have failed to figure this out.

What could be simpler than predicting porosity from acoustic impedance? Well, lots of things, but let’s pretend for a minute that it’s easy. Here’s what you do:

1.   Measure impedance at a bunch of wells
2.   Measure the porosity — at seismic scale of course — at those wells
3.   Make a crossplot with porosity on the y-axis and amplitude on the x-axis
4.   Plot the data points and plot the regression line (let’s keep it linear)
5.   Find the equation of the line, which is of the form y = ax + b, or porosity = gradient × impedance + constant
6.   Apply the equation to a map (or volume, if you like) of amplitude, and Bob's your uncle.

Easy!

But, wait a minute. Is Bob your uncle after all? The parameter on the y-axis is also called the dependent variable, and that on the x-axis the independent. In other words, the crossplot represents a relationship of dependency, or causation. Well, porosity certainly does not depend on impedance — it’s the other way around. To put it another way, impedance is not the cause of porosity. So the natural relationship should put impedance, not porosity, on the y-axis. Right?

Therefore we should change some steps:

3.   Make a crossplot with impedance on the y-axis and porosity on the x-axis
4.   Plot the data points and plot the regression line
5a. Find the equation of the line, which is of the form y = ax + b, or impedance = gradient × porosity + constant
5b. Rearrange the equation for what we really want:
porosity = (impedance – constant)/gradient

Not quite as easy! But still easy.

More importantly, this gives a different answer. Bob is not your uncle after all. Bob is your aunt. To be clear: you will compute different porosities with these two approaches. So then we have to ask: which is correct? Or rather, since neither going to give us the ‘correct’ porosity, which is better? Which is more physical? Do we care about physicality?

I genuinely do not know the answer to this question. Do you?

If you're interested in playing with this problem, the data I used are from Imaging reservoir quality seismic signatures of geologic effects, report number DE-FC26-04NT15506 for the US Department of Energy by Gary Mavko et al. at Stanford University. I digitized their figure D-8; you can download the data as a CSV here. I have only plotted half of the data points, so I can use the rest as a blind test. 

The intentional professional

I'm involved in a local effort to launch a coworking and business incubation space in Mahone Bay, Nova Scotia, where I live. Like most things worth doing, it's taking some time, but I think we'll get there eventually. Along this journey, I heard a lovely phrase recently — intentional community. What a great way to describe a group of coworkers and entrepreneurs, implying a group formed not just on purpose, but also with purpose

But it made me think too — it made me wonder if some of the communities I'm involved in might be unintentional — accidental, inadvertent, perhaps even a mistake? Would you describe your workplace as intentional? If you're a student, are your classes intentional? That committee you're on — is that intentional?

Another phrase that keeps popping into my head lately is

Don't be a looky-loo. — Clay Shirky, Cognitive Surplus

Even if you don't know what a looky-loo is, you'll recognize the behaviour immediately. A looky-loo is someone who, taking Woody Allen's advice a little too seriously, thinks 80% of success is showing up. If you've ever organized a meeting, with an idea that you might get something done in it, you know the sort: they arrive, they eat the cookies, they do the small talk, then they sit there and stare at you for an hour, then they leave. No input given. No notes taken. No point being there. 

Next time you hear yourself described in passive terms — attendee, reader, employee, student, user, consumer, react to it. You're being described as a person that things happen to. A victim.

Instead of being an unintentional victim, think of yourself an essential part of whatever it is. You are a participant, a partner, a stakeholder, a contributor, a collaborator. If you're not an essential part of it then, for everyone's sake, don't go.

This is what professionalism is. 

Great geophysicists #5: Huygens

Christiaan Huygens was a Dutch physicist. He was born in The Hague on 14 April 1629, and died there on 8 July 1695. It's fun to imagine these times: he was a little older than Newton (born 1643), a little younger than Fermat (1601), and about the same age as Hooke (1635). He lived in England and France and must have met these men.

It's also fun to imagine the intellectual wonder life must have held for a wealthy, educated person in these protolithic Enlightenment years. Everyone, it seems, was a polymath: Huygens made substantial contributions to probability, mechanics, astronomy, optics, and horology. He was the first to describe Saturn's rings. He invented the pendulum clock. 

Then again, he also tried to build a combustion engine that ran on gunpowder. 

Geophysicists (and most other physicists) know him for his work on wave theory, which prevailed over Newton's corpuscles—at least until quantum theory. In his Treatise on Light, Huygens described a model for light waves that predicted the effects of reflection and refraction. Interference has to wait 38 years till Fresnel. He even explained birefringence, the anisotropy that gives rise to the double-refraction in calcite.

The model that we call the Huygens–Fresnel principle consists of spherical waves emanating from every point in a light source, such as a candle's flame. The sum of these manifold wavefronts predicts the distribution of the wave everywhere and at all times in the future. It's a sort of infinitesimal calculus for waves. I bet Newton secretly wished he'd thought of it.