February 25, 2020

Visual explanations of mathematics

February 25, 2020/ Matt Hall

It is thought that Euclid wrote Elements in about 300 BC, but Oliver Byrne turned it into one of the true gems of visualization — and made it about 100 times more readable. By seamlessly combining typeset text (Caslon, if you’re interested) with minimalist geometric drawings in primary colours, he didn’t just reproduce the text; he explained it in a new way.

If you like the look of it, it’s even cooler in Nicholas Rougeur’s beautiful interactive version.

This is a classic example of what Edward Tufte, the modern saint of visualization, calls a visual explanation (he wrote a whole book about the subject). We’ve written about the subject before (for example, see Evan’s 2014 post, Graphics that repay careful study). Figures and charts should do more than merely illustrate, they should elucidate.

Too often, equations — for example the myriad equations in any volume of GEOPHYSICS — do not elucidate. Indeed, they barely even illustrate. In some cases, it’s worse: they obfuscate. You might think mathematics is too dry, or too steeped in convention, for it to be any other way. Equations just are. But Byrne showed us that we can do better.

A few years ago, in an attempt to broaden my geophysical knowledge, I bought a copy of Daniel Fleisch’s book on Maxwell’s equations. It’s excellent, and the others in the series are good too. I especially liked the annotated equations; I’ve lightened the annotations in this version, to put them on a separate visual ‘layer’:

In 2010, Randall Munroe of xkcd applied a similar strategy to label The Flake Equation, his parody of the Drake equation:

There are still other examples out there.

Later, I came across some lovely colourized equations by Stuart Riffle, a game developer. There was a bit of buzz about them on social media. Most people loved them, but a few pointed out that they suffer from the ‘legend lookup’ problem, and the colours he chose might not be great for colourblind people. Still, I like the concept — here’s the Fourier transform:

Direct annotation, something Tufte always advocates, avoids the legend lookup problem. In his 2016 Geophysics Tutorial on finite volume methods, Rowan Cockett showed that colour and labels can work together:

And in his Observable post on the predator–prey interaction, modern visualization legend Mike Bostock avoids the problem entirely with the use of pictograms: direct representation of what the symbols represent:

Observable is interesting because the documents are runnable code. And this reminds us that mathematics — equations, data structures, and so on — has another expression: code. While symbolic representation speaks directly to some people, code speaks to others, probably more. Look at Randall Munroe’s annotation of a Wolfram Alpha equation (similar to an Excel formula) from his (wonderful) book, What If:

What I love about this is the direct path to exploring the function yourself. It would take me an hour to implement Fleisch’s electric field integral in code, even with the annotations. Typing in this — admittedly less useful — rocket golf equation will take me two minutes. Expressing mathematics in code is the ultimate explicit and practical expression of an idea.

We have lots of tools to write better mathematics: LaTeX, markdown, Jupyter Notebooks, and so on. But it feels like nothing has really converged yet. Technology that seamlessly mixes symbolic equations, illustrative-and-explicative annotation, and runnable code is, I am sure, not far off. Until then, we do the best we can with the tools we have.

Have you seen nice examples of annotated equations? I’d love to hear about them; let me know in the comments!

Don’t miss the follow-up post from 2021: Illuminated equations.

The work by Byrne is out of copyright. Those by Munroe and Cockett are openly licensed under Creative Commons. The work of Fleisch and Bostock are used in accordance with Fair Use doctrine.

April 03, 2019

What makes a good benchmark dataset?

April 03, 2019/ Matt Hall

Last week I mentioned that we need more open benchmark datasets in geoscience. I think benchmarks are important for researchers to work on, as a teaching aid, and as a way for us to objectively measure how well we’re doing on a particular problem. How else can we know how we’re doing, or compare Company X’s claim with Company Y’s?

What makes a good benchmark?

I haven’t unearthed any guides from other domains to help answer this question, and we don’t yet have enought experience to know for ourselves. But here’s what I’m thinking:

It must address at least one clear machine learning task. The more obviously useful the task, the more useful (and important) the benchmark. The benchmark dataset should be well suited to the task (but does not have to be comprehensive or definitive).
It must be open. That means explicitly licensed with an open, and preferably permissive, license. I think we need to avoid non-permissive (so-called ‘copyleft’) licenses, because it’s not clear how the ‘sharealike’ clause would affect works that depended on the dataset. And we definitely need to avoid restrictive non-commercial clauses.
It must be discoverable and accessible. In other words, it needs to be easy to find, and anyone should be able to get it, without registering on a website or waiting for an email or doing anything else that slows down the pace of their research. A properly open dataset can be replicated anywhere, so openness should take care of this.
It must have enough features to be interesting. This might mean different things for different tasks, but in general we’d like to see a few physical measurements (e.g. seismic, well logs, RockEval, core photos, field observations, flow rates, and so on). The features should be independent — we can always generate derivatives.
It must have labels. As well as some interesting features, the dataset must have some interpretive information with high information value (e.g. seismic facies, lithologies, deposotional environment, sequence boundaries, EURs, and so on). Usually, these are expensive to acquire (which is partly why we’d like to be able to predct them).
It should name suitable prediction error evaluation methods, with reference implementations, for the intended task. If people are to use it as a score benchmark, they need to know how to score their own implementations of the task.
It can be de-localized, but not completely. We don’t need to know the exact whereabouts of the dataset, but if we remove the relative spatial relationships between wells, say, or don’t know which basin we’re in, then the questions we can ask about the data get a lot less interesting, and the whole situation gets much less realistic.
It should not be too big. More than about 1GB means unwieldy. It means difficult to download. It means too much room for nuance. And it means it’s probably impossible to explore in the space of a tutorial. It’s also much harder to get a big dataset into shape than a smaller one. A few thousand records, maybe 100,000 in some cases, is probably plenty.
It should be clean, but not too clean. No-one wants to spend hours processing a dataset before it can be used, or — worse — be bitten by some esoteric data problem only a domain expert would spot. But, on the other hand, a dataset with no issues at all might be a bit boring. And, in subsurface at least, completely unrepresentative!
It should be well documented. The dataset needs to be described to non-technical people, who know little or nothing about the subsurface. Remember that many users will not be proficient programmers either, so…
It should have an accompanying demonstration. For example, a script or notebook, preferably in at least a couple of languages, that shows how to load and inspect the data. Ideally this would include a demonstration of how to pose, and answer, a straightforward question as a machine learning task.

I’m not sure we can call this last one a criterion, but maybe in an ideal world…

It should be launched with a data science contest. If you’re felling really brave, what better way to attract attention to the new open dataset than with a Kaggle-style contest?

It’s certainly true that there are several datasets around. Unfortunately, the openness criterion eliminates most of them, so they fall at the first hurdle. For example, the very nice dataset that Brendon Hall used in the SEG machine learning contest is not open.

If you know of a dataset that could be coerced into meeting most of these criteria, we’d like to hear about it. I know a small army of people that would love to help get it into the open, and into the hands of machine learning researchers all over the world.

The thumbnail image for this post was adapted from an image by user arg_flickr on Flickr, licensed CC-BY.

Thanks to several people on Software Underground, for the discussion on this topic. In particular, Justin Gosses and Lukas Mosser pointed out the need for transparent error evaluation.

March 27, 2019

Closing the gap: what next?

March 27, 2019/ Matt Hall

I wrote recently about closing the gap between data science and the subsurface domain, naming some strategies that I think will speed up this process of digitalization.

But even after the gap has closed in your organization, you’re really just getting started. It’s not enough to have contact between the two worlds, you need most of your actvity to be there. This means moving it from wherever it is now. This means time, and effort, and mistakes.

Strategies for 2020+

Hardly any organizations have got to this point yet. And I certainly don’t know what it looks like when we get there as a discipline. But nonetheless I think I’m starting to see what’s going to be required to continue to build into the gap. Soon, we’re going to need to think about these things.

We’re bad at hiring; we need to be awesome at it*. We need to stop listening to the pop psychology peddled by HR departments (Myers-Briggs, lol) and get serious about hiring brilliant scientific and technical talent. It’s going to take some work because a lot of brilliant scientists and technical talent aren’t that interested in subsurface.
You need to get used to the idea that digital scientists can do amazing things quickly. These are not your corporate timelines. There are no weekly meetings. Protoyping a digital technology, or proving a concept, takes days. Give me a team of 3 people and I can give you a prototype this time next week.
You don’t have to do everything yourself. In fact, you don’t want to, because that would be horribly slow. For example, if you have a hard problem at hand, Kaggle can get 3000 teams of fantastically bright people to look at it for you. Next week.
We need benchmark datasets. If anyone is going to be able to test anything, or believe any claims about machine learning results, then we need benchmark data. Otherwise, what are we to make of claims like “98% accuracy”? (Nothing, it’s nonsense.)
We need faster research. We need to stop asking people for static, finished work — that they can only present with PowerPoint — months ahead of a conference… then present it as if it’s bleeding edge. And do you know how long it takes to get a paper into GEOPHYSICS?
You need Slack and Stack Overflow in your business. These two tools have revolutionized how technical people communicate and help each other. If you have a large organization, then your digital scientists need to talk to each other. A lot. Skype and Yammer won’t do. Check out the Software Underground if you want to see how great Slack is.

Even if your organization is not quite ready for these ideas yet, you can start laying the groundwork. Maybe your team is ready. You only need a couple of allies to get started; there’s always something you can do right now to bring change a little sooner. For example, I bet you can:

List 3 new places you could look for amazing, hireable scientists to start conversations with.
Find out who’s responsible for technical communities of practice and ask them about Slack and SO.
Find 3 people who’d like to help organize a hackathon for your department before the summer holidays.
Do some research about what it takes to organize a Kaggle-style contest.
Get with a colleague and list 3 datasets you could potentially de-locate and release publically.
Challenge the committe to let you present in a new way at your next technical conference.

I guarantee that if you pick up one of these ideas and run with it for a bit, it’ll lead somewhere fun and interesting. And if you need help at some point, or just want to talk about it, you know where to find us!

* I’m not being flippant here. Next time you’re at a conference, go and talk to the grad students, all sweaty in their suits, getting fake interviews from recruiters. Look at their CVs and resumes. Visit the recruitment room. Go and look at LinkedIn. The whole thing is totally depressing. We’ve trained them to present the wrong versions of themselves.

November 30, 2017

The post of Christmas present

November 30, 2017/ Matt Hall

It's nearly the end of another banner year for humanity, which seems determined as ever to destroy the good things it has achieved. Here's hoping certain world 'leaders' have their Scrooge moments sooner rather than later.

One positive thing we can all do is bring a little more science into the world. And I don't just mean for the scientists you already know. Let's infect everyone we can find! Maybe your niece will one day detect a neutron star collision in the Early Cretaceous, or your small child's intuition for randomness will lead to more breakthroughs in quantum computing.

Build a seismic station

There's surely no better way to discover the wonder of waves than to build a seismometer. There are at least a couple of good options. I built a single component 10 Hz Raspberry Shake myself; it was easy to do and, once hooked up to Ethernet, the device puts itself online and starts streaming data immediately.

The Lego seismometer kit (above right) looks like a slightly cheaper option, and you might want to check that they can definitely ship in time for Xmas, but it's backed by the British Geological Survey so I think it's legit. And it looks very cool indeed.

Everyone needs a globe!

As I mentioned last year, I love globes. We have several at home and more at the office. I don't yet have a Moon globe, however, so I've got my eye on this Replogle edition, NASA approved apparently ("Yup, that's the moon alright!"), and not too pricey at about USD 85.

They seem to be struggling to fill orders, but I can't mention globes without mentioning Little Planet Factory. These beautiful little 3D-printed worlds can be customized in all sorts of ways (clouds or no clouds, relief or smooth, etc), and look awesome in sets.

The good news is that you can pick up LPF's little planets direct from Shapeways, a big 3D printing service provider. They aren't lacquered, but until LPF get back on track, they're the next best thing.

Geology as a lifestyle

Brenda Houston like minerals. A lot. She's made various photomicrographs into wallpaper and fabrics (below, left), and they are really quite awesome. Especially if you always wanted to live inside a geode

OK, some of them might make your house look a bit... Bond-villainy.

If you prefer the more classical imagery of geology, how about this Ancient Dorset duvet cover (USD 120) by De la Beche?

I love this tectonic pewter keychain (below, middle) — featuring articulated fault blocks, and tiny illustrations of various wave modes. And it's under USD 30.

A few months ago, Mark Tingay posted on Twitter about his meteorite-faced watch (below, right). Turns out it's a thing (of course it's a thing) and you can drop substantial sums of money on such space-time trinkets. Like $235,000.

Algorithmic puzzles and stuff

These are spectacular: randomly generated agate-like jigsaw puzzles. Every one is different! Even the shapes of the wooden pieces are generated with maths. They cost about USD 95, and come from Boston-based Nervous System. The same company has lots of other rock- and fossil-inspired stuff, like ammonity jewellery (from about USD 50) and some very cool coasters that look a bit like radiolarians (USD 48 for 4).

There's always books

You can't go wrong with books. These all just came out, and just might appeal to a geoscientist. And if these all sound a bit too much like reading for work, try the Atlas of Beer instead. Click on a book to open its page at Amazon.com.

The posts of Christmas past

If by any chance there aren't enough ideas here, or you are buying for a very large number of geoscientists, you'll have to dredge through the historical listicles of yesteryear — 2011, 2012, 2013, 2014, 2015, or 2016. You'll find everything there, from stocking stuffers to Triceratops skulls.

The images in this post are all someone else's copyright and are used here under fair use guidelines. I'm hoping the owners are cool with people helping them sell stuff!

July 18, 2017

Fear and loathing in oil & gas

July 18, 2017/ Evan Bianco

Sometimes you have to swallow your fear. This is one of those times.

The proliferation of 3D seismic in the 1980s was a major step forward for the petroleum industry. However, it took more than a decade for the 3D seismic method to become popular. During that decade, seismic equipment continued to evolve, particularly with the advent of telemetry recording systems that needed for doing 3D surveys offshore.

Things were never the same again. New businesses sprouted up to support it, and established service companies and tech companies exploded size and in order to keep up with the demand and all the new work.

Not so coincidently, another major shift happened in the late 1980s and early 1990s with the industry-wide shift to Sun workstations in order to cope with the crunching and rendering the overwhelming influx of all these digits. UNIX workstations with hilariously large cathode-ray tube monitors became commonplace. This industry helped make Sun and many other IT companies very wealthy, and once again everything was good. At least until Sun's picnic was trampled on by Linux workstations in the early 2000s, but that's another story...

I think the advent of 3D seismic is one of many examples of the upstream oil and gas industry thriving on technological change. 3D seismic changed everything, facilitating progress in the full sense of the word and we never looked back. As an early career geoscientist, I don't know what the world was like before 3D seismic, but I have interpreted 2D data and I know it's an awful experience — even on a computer.

Debilitating skepticism?

Today, in 2017, we find ourselves in the middle of the next major transformation. Like 3D seismic before it, machine learning will alter yesterday's landscape beyond all recognition. We've been through all of this before, but this time, for some reason it feels different. Many people are cautious, unconvinced about whether this next thing will live up to the hype. Other people are vibrating with excitement viewing the whole thing with rose-coloured glasses. Still others truly believe that it will fail — assertively rejecting hopes and over-excited claims that yes, artificial intelligence will catapult us into a better world, a world beyond our wildest dreams.

A little skepticism is healthy, but I meet a lot of people who are so skeptical about this next period of change that they are ignoring it. It feels to me like an unfair level of dismissal, a too-rigid stance. And it has left me rather perplexed: Why is there so much resistance and denial this time around? Why the apprehension?

I'll wager the reason it is different this time because this change is happening to us, in spite of us, whether we like it or not. We're not in the driving seat. Most of us aren't even in the passenger seat. Unlike seismic technology and UNIX|Linux workstations, our sector has had little to do with this revolution. We haven't been pushing for it, instead, it is dragging us along with it. Worse, it's happening fast; even the people who are trying to keep up with it can barely hold on.

We need you

This is the opportunity of a lifetime. It's happening. High time to crank up the excitement, get involved, be a part of it. I for one want you to be part of it. Come along with us. We need you, whether you like it or not.

This post was provoked by a conversation on LinkedIn.

November 29, 2016

St Nick's list for the geoscientist

November 29, 2016/ Matt Hall

It's that time again. Perhaps you know a geoscientist that needs a tiny gift, carefully wrapped, under a tiny tree. Perhaps that geoscientist has subtly emailed you this blog post, or non-subtly printed it out and left copies of it around your house and/or office and/or person. Perhaps you will finally take the hint and get them something awesomely geological.

Or perhaps 2016 really is the rubbish year everyone says it is, and it's gonna be boring non-geological things for everyone again. You decide.

Science!

I have a feeling science is going to stick around for a while. Get used to it. Better still, do some! You can get started on a fun science project for well under USD 100 — how about these spectrometers from Public Lab? Or these amazing aerial photography kits?

All scientists must have a globe. It's compulsory. Nice ones are expensive, and they don't get much nicer than this one (right) from Real World Globes (USD 175 to USD 3000, depending on size). You can even draw on it. Check out their extra-terrestrial globes too: you can have Ganymede for only USD 125!

If you can't decide what kind of science gear to get, you could inspire someone to make their own with a bunch of Arduino accessories from SparkFun. When you need something to power your gadget in the field, get a fuel cell — just add water! Or if it's all just too much, play with some toy science like this UNBELIEVABLE Lego volcano, drone, crystal egg scenario.

Stuff for your house

Just because you're at home doesn't mean you have to stop loving rocks. Relive those idyllic field lunches with this crazy rock sofa that looks exactly like a rock but is not actually a rock (below left). Complete the fieldwork effect with a rainhead shower and some mosquitoes.

No? OK, check out these very cool Livingstone bouldery cushions and seats (below right, EUR 72 to EUR 4750).

If you already have enough rocks and/or sofas to sit on, there are some earth sciencey ceramics out there, like this contour-based coffee cup by Polish designer Kina Gorska, who's based in Oxford, UK. You'll need something to put it on; how about a nice absorbent sandstone coaster?

Wearables

T-shirts can make powerful statements, so don't waste it on tired old tropes like "schist happens" or "it's not my fault". Go for bold design before nerdy puns... check out these beauties: one pretty bold one containing the text to Lyell's Principles of Geology (below left), one celebrating Bob Moog with waveforms (perfect for a geophysicist!), and one featuring the lonely Chrome T-Rex (about her). Or if you don't like those, you can scour Etsy for volcano shirts.

Books

You're probably expecting me to lamely plug our own books, like the new 52 Things you Should Know About Rock Physics, which came out a few weeks ago. Well, you'd be wrong. There are lots of other great books about geoscience out there!

For example, Brian Frehner (a historian at Oklahoma State) has Finding Oil (2016, U Nebraska Press) coming out on Thursday this week. It covers the early history of petroleum geology, and I'm sure it'll be a great read. Or how about a slightly 'deeper history' book the new one from Walter Alvarez (the Alvarez), A Most Improbable Journey: A Big History of Our Planet and Ourselves (2016, WW Norton), which is getting good reviews. Or for something a little lighter, check out my post on scientific comic books — all of which are fantastic — or this book, which I don't think I can describe.

Dry your eyes

If you're still at a loss, you could try poking around in the prehistoric giftological posts from 2011, 2012, 2013, 2014, or 2015. They contain over a hundred ideas between them, I mean, come on.

Still nothing? Nevermind, dry your eyes in style with one of these tissue box holders. Paaarp!

The images in this post are all someone else's copyright and are used here under fair use guidelines. I'm hoping the owners are cool with people helping them sell stuff!

October 12, 2016

Working without a job

October 12, 2016/ Matt Hall

I have drafted variants of this post lots of times. I've never published them because advice always feels... presumptuous. So let me say: I don't have any answers. But I do know that the usual way of 'finding work' doesn't work any more, so maybe the need for ideas, or just hope, has grown.

Lots of people are out of work right now. I just read that 120,000 jobs have been lost in the oil industry in the UK alone. It's about the same order of magnitude in Canada, maybe as much as 200,000. Indeed, several of my friends — smart, uber-capable professionals — are newly out of jobs. There's no fat left to trim in operator or service companies... but the cuts continue. It's awful.

The good news is that I think we can leave this downturn with a new, and much better, template for employment. The idea is to be more resilient for 'next time' (the coming mergers, the next downturn, the death throes of the industry, that sort of thing).

The tragedy of the corporate professional

At least 15 years ago, probably during a downturn, our corporate employers started telling us that we are responsible for our own careers. This might sound like a cop-out, maybe it was even meant as one, but really it's not. Taken at face value, it's a clear empowerment.

My perception is that most professionals did not rise to the challenge, however. I still hear, literally all the time, that people can't submit a paper to a conference, or give a talk, or write a blog, or that they can't take a course, or travel to a workshop. Most of the time this comes from people who have not even asked, they just assume the answer will be No. I worry that they have completely given in; their professional growth curtailed by the real or imagined conditions of their employment.

More than just their missed opportunity, I think this is a tragedy for all of us. Their expertise effectively gone from the profession, these lost scientists are unknown outside their organizations.

Many organizations are happy for things to work out that way, but when they make the situation crystal clear by letting people go, the inequity is obvious. The professional realizes, too late, that the career they were supposed to be managing (and perhaps thought they were managing by completing their annual review forms on time) was just that — a career, not a job. A career spanning multiple jobs and, it turns out, multiple organizations.

I read on LinkedIn recently someone wishing recently let-go people good luck, hoping that they could soon 'resume their careers'. I understand the sentiment, but I don't see it the same way. You don't stop being a professional, it's not a job. Your career continues, it's just going in a different direction. It's definitely not 'on hold'. If you treat it that way, you're missing an opportunity, perhaps the best one of your career so far.

What you can do

Oh great, unsolicited advice from someone who has no idea what you're going through. I know. But hey, you're reading a blog, what did you expect?

Do you want out? If you think you might want to leave the industry and change your career in a profound way, do it. Start doing it right now and don't look back. If your heart's not in this work, the next months and maybe years are really not going to be fun. You're never going to have a better run at something completely different.
You never stop being a professional, just like a doctor never stops being a doctor. If you're committed to this profession, grasp and believe this idea. Your status as such is unrelated to the job you happen to have or the work you happen to be doing. Regaining ownership of our brains would be the silveriest of linings to this downturn.
Your purpose as a professional is to offer help and advice, informed by your experience, in and around your field of expertise. This has not changed. There are many, many channels for this purpose. A job is only one. I firmly believe that if you create value for people, you will be valued and — eventually — rewarded.
Establish a professional identity that exists outside and above your work identity. Get your own business cards. Go to meetings and conferences on your own time. Write papers and articles. Get on social media. Participate in the global community of professional geoscientists.
Build self-sufficiency. Invest in a powerful computer and fast Internet. Learn to use QGIS and OpendTect. Embrace open source software and open data. If and when you get some contracting work, use Tick to count hours, Wave for accounting and invoicing, and Todoist to keep track of your tasks.
Find a place to work — I highly recommend coworking spaces. There is one near you, I can practically guarantee it. Trust me, it's a much better place to work than home. I can barely begin to describe the uplift, courage, and inspiration you will get from the other entrepreneurs and freelancers in the space.
Find others like you, even if you can't get to a coworking space, your new peers are out there somewhere. Create the conditions for collaboration. Find people on meetup.com, go along to tech and start-up events at your local university, or if you really can't find anything, organize an event yourself!
Note that there are many ways to make a living. Money in exchange for time is one, but it's not a very efficient one. It's just another hokey self-help business book, but reading The 4-Hour Workweek honestly changed the way I look at money, time, and work forever.
Remember entrepreneurship. If you have an idea for a new product or service, now's your chance. There's a world of making sh*t happen out there — you genuinely do not need to wait for a job. Seek out your local startup scene and get inspired. If you've only ever worked in a corporation, people's audacity will blow you away.

If you are out of a job right now, I'm sorry for your loss. And I'm excited to see what you do next.

June 01, 2016

Open source geoscience is _________________

June 01, 2016/ Matt Hall

As I wrote yesterday, I was at the Open Source Geoscience workshop at EAGE Vienna 2016 on Monday. Happily, the organizers made time for discussion. However, what passes for discussion in the traditional conference setting is, as I've written before, stilted.

What follows is not an objective account of the proceedings. It's more of a poorly organized collection of soundbites and opinions with no real conclusion... so it's a bit like the actual discussion itself.

TL;DR The main take home of the discussion was that our community does not really know what to do with open source software. We find it difficult to see how we can give stuff away and still make loads of money.

I'm not giving away my stuff

Paraphrasing a Schlumberger scientist:

Schlumberger sponsors a lot of consortiums, but the consortiums that will deliver closed source software are our favourites.

I suppose this is a way to grab competitive advantage, but of course there are always the other consortium members so it's hardly an exclusive. A cynic might see this position as a sort of reverse advantage — soak up the brightest academics you can find for 3 years, and make sure their work never sees the light of day. If you patent it, you can even make sure no-one else gets to use the ideas for 20 years. You don't even have to use the work! I really hope this is not what is going on.

I loved the quote Sergey Fomel shared; paraphrasing Matthias Schwab, his former advisor at Stanford:

Never build things you can't take with you.

My feeling is that if a consortium only churns out closed source code, then it's not too far from being a consulting shop. Apart from the cheap labour, cheap resources, and no corporation tax.

Yesterday, in the talks in the main stream, I asked most of the presenters how people in the audience could go and reproduce, or simply use, their work. The only thing that was available was a commerical OpendTect plugin of dGB's, and one free-as-in-beer MATLAB utility. Everything else was unavailble for any kind of inspection, and in one case the individual would not even reveal the technology framework they were using.

Support and maintenance

Paraphrasing a Saudi Aramco scientist:

There are too many bugs in open source, and no support.

The first point is, I think, a fallacy. It's like saying that Wikipedia contains inaccuracies. I'd suggest that open source code has about the same number of bugs as proprietary software. Software has bugs. Some people think open source is less buggy; as Linus Torvalds said: "Given enough eyeballs, all bugs are shallow." Kristofer Tingdahl (dGB) pointed out that the perceived lack of support is a business opportunity for open source community. Another participant mentioned the importance of having really good documentation. That costs money of course, which means finding ways for industry to support open source software development.

The same person also said something like:

[Open source software] changes too quickly, with new versions all the time.

...which says a lot about the state of application management in many corporations and, again, may represent opportunity rather than a threat to open source movement.

Only in this industry (OK, maybe a couple of others) will you hear the impassioned cry, "Less change!"

The fog of torpor

When a community is falling over itself to invent new ways to do things, create new value for people, and find new ways to get paid, few question the sharing and re-use of information. And by 'information' I mean code and data, not a few PowerPoint slides. Certainly not all information, but lots. I don't know which is the cause and which is the effect, but the correlation is there.

In a community where invention is slow, on the other hand, people are forced to be more circumspect, and what follows is a cynical suspicion of the motives of others. Here's my impression of the dynamic in the room during the discussion on Monday, and of course I'm generalizing horribly:

Operators won't say what they think in front of their competitors
Vendors won't say what they think in front of their customers and competitors
Academics won't say what they think in front of their consortium ~~customers~~ sponsors
Students won't say what they think in front of their advisors and potential employers

This all makes discussion a bit stilted. But it's not impossible to have group discussions in spite of these problems. I think we achieved a real, honest conversation in the two Unsessions we're done in Calgary, and I think the model we used would work perfectly in all manner of non-technical and in technical settings. We just have to start doing it. Why our convention organizers feel unable to try new things at conferences is beyond me.

I can't resist finishing on something a person at Chevron said at the workshop:

I'm from Chevron. I was going to say something earlier, but I thought maybe I shouldn't.

This just sums our industry up.

January 27, 2015

On breaking rules

January 27, 2015/ Matt Hall

Humans have a complicated relationship with rules.

One of the mantras of the 21st century economy is 'first, break all the rules'. If the rules are merely stale conventions, then yes: break away. But it's tempting to go too far and scoff at all rules, and even laws, as the petty creations of boring bureaucrats, declaring, "Rules? Pah! We won't be tied down by your rules!"

But it's not that simple. We like some rules, like the rule about not smoking in aeroplanes, or parking in your reserved parking place. When others break those rules, it's annoying. And rules that define boundaries can heighten, not hinder, creativity and impact — look at code golf, Yves Klein, haiku (though the 5–7–5 thing is a myth), and Twitter.

So what to do about a rule we don't like? There are usually a few options:

Obey it. The rule worked! But maybe not for you.
Change it. This might work, but it might take a while. Good luck!
Break it. Easy! Just pretend it's not there. There's no need to feel bad: everyone else is doing it.

Is that it? Be boring, be brave, or stick it to the man? No, it's a false trichotomy. There is a fourth option:

Make the rule irrelevant. Build or contribute to a new version of reality where the rule no longer applies. 

In other words, don't break stupid rules — that doesn't change anything. Better to make your point by subverting the entire foundation of stupid rules. For example:

When lawyer Larry Lessig decided he'd had enough of copyright restrictions, he didn't say 'screw you guys' and start downloading movies on BitTorrent. He started Creative Commons and transformed the way the sharing economy functions. Result: not just reduced revenue, but reduced impact of traditional media — far more important.
The local government will partly fund training for small businesses from a marketing consultant. Apparently, it's common to game this system by hiring a consultant under this program, then simply having them do work for hire — website, branding, and so on. But these are normal business expenses; instead of coercing a broken system to channel public money into private enterprise, we'd all be better off beating a new path to small-scale investment and collaboration.
There's a young would-be Robin Hood in the geoscience publishing world, hosting copyrighted textbook PDFs for free download. He believes he's helping to rid the world of the tyranny of over-priced technical literature, but he's going about it the wrong way. Better to promote open-access literature, and be a champion of legal re-use. This denies 'the establishment' their impact, instead of lauding it, and helps spread truly shareable content.

Next time you come across a rigid rule you don't like, don't break it. Ask instead how you can make the rule not matter.

No Trespassing image CC-BY-SA by Michael Dorausch on Flickr.

January 21, 2015

Test driven development geoscience

January 21, 2015/ Matt Hall

Sometimes I wonder how much of what we do in applied geoscience is really science. Is it really about objective enquiry? Do we form hypotheses, then test them? The scientific method is largely a caricature — science is more accidental and more fun than a step-by-step recipe — but I think our field sometimes falls short of even basic rigour. Go and sit through a conference session on seismic attribute analysis some time and you'll see what I mean. Let's just say there's a lot of arm-waving and shape-ology.

Learning from geeks

We've written before about the virtues of the software engineering community. Innovation has been so rapid recently, that I think it's a great place to find interpretation hacks like pair picking. Learning about and experiencing the amazing productivity of programmers is one of the reasons I think all scientists should learn to program (but not learn to be a programmer). You'll find out about concepts like version control, user-centered design, and test-driven development. Programmers embrace these ideas to a greater or lesser degree, depending on their goals and those of the project they're working on. But all programmers know them.

I'm especially into test-driven development at the moment. The idea is that before implementing a new module or feature, you write a test — a short program that gives the new thing some input, inspects the output, and compares it to a known answer. The first version of the code will likely fail the test. The idea is to refactor the code until it passes the test. Then you add that test to a suite that runs every time you build anything in the same project, so you know your new thing doesn't get broken by something else later. And you aren't tempted to implement features that weren't part of the test.

Fail — Refactor — Pass

Imagine test-driven development geology (or any other kind of geoscience). What would that look like?

When planning wells, we often do write tests — they're called prognoses. But the comparison with the result is rarely formalized or quantified, especially outside the target zone. Once the well is drilled, it becomes data and we move on. No-one likes to dwell on the poorly understood or error-prone, but naturally that's where the greatest room for improvement is.
When designing a new seismic attribute, or embarking on a seismic processing project, we often have a vague idea of success in our heads, and that's about it. What if we explicitly defined an input test dataset, some wells or bits of wells, and set 'passing' performance criteria on those? "I won't interpret the reprocessed seismic until it improves those synthetic correlation coefficients by 40%."
When designing a seismic survey, we could establish acceptable criteria for trace density, minimum offset, azimuth distribution, and recording time, then use these as a cost function to find the best possible survey for our dollars. Wait, perhaps we actually do this one. Is seismic acquisition unusually scientific? Or is it an inherently more linear problem?

What do you think? Can you see ways to define 'success' before you begin, then somewhat quantitatively compare your results with that? Ideas wanted!

Blog