Closing the gap: what next?

I wrote recently about closing the gap between data science and the subsurface domain, naming some strategies that I think will speed up this process of digitalization.

But even after the gap has closed in your organization, you’re really just getting started. It’s not enough to have contact between the two worlds, you need most of your actvity to be there. This means moving it from wherever it is now. This means time, and effort, and mistakes.

Strategies for 2020+

Hardly any organizations have got to this point yet. And I certainly don’t know what it looks like when we get there as a discipline. But nonetheless I think I’m starting to see what’s going to be required to continue to build into the gap. Soon, we’re going to need to think about these things.

• We’re bad at hiring; we need to be awesome at it*. We need to stop listening to the pop psychology peddled by HR departments (Myers-Briggs, lol) and get serious about hiring brilliant scientific and technical talent. It’s going to take some work because a lot of brilliant scientists and technical talent aren’t that interested in subsurface.

• You need to get used to the idea that digital scientists can do amazing things quickly. These are not your corporate timelines. There are no weekly meetings. Protoyping a digital technology, or proving a concept, takes days. Give me a team of 3 people and I can give you a prototype this time next week.

• You don’t have to do everything yourself. In fact, you don’t want to, because that would be horribly slow. For example, if you have a hard problem at hand, Kaggle can get 3000 teams of fantastically bright people to look at it for you. Next week.

• We need benchmark datasets. If anyone is going to be able to test anything, or believe any claims about machine learning results, then we need benchmark data. Otherwise, what are we to make of claims like “98% accuracy”? (Nothing, it’s nonsense.)

• We need faster research. We need to stop asking people for static, finished work — that they can only present with PowerPoint — months ahead of a conference… then present it as if it’s bleeding edge. And do you know how long it takes to get a paper into GEOPHYSICS?

• You need Slack and Stack Overflow in your business. These two tools have revolutionized how technical people communicate and help each other. If you have a large organization, then your digital scientists need to talk to each other. A lot. Skype and Yammer won’t do. Check out the Software Underground if you want to see how great Slack is.

Even if your organization is not quite ready for these ideas yet, you can start laying the groundwork. Maybe your team is ready. You only need a couple of allies to get started; there’s always something you can do right now to bring change a little sooner. For example, I bet you can:

• List 3 new places you could look for amazing, hireable scientists to start conversations with.

• Find out who’s responsible for technical communities of practice and ask them about Slack and SO.

• Find 3 people who’d like to help organize a hackathon for your department before the summer holidays.

• Do some research about what it takes to organize a Kaggle-style contest.

• Get with a colleague and list 3 datasets you could potentially de-locate and release publically.

• Challenge the committe to let you present in a new way at your next technical conference.

I guarantee that if you pick up one of these ideas and run with it for a bit, it’ll lead somewhere fun and interesting. And if you need help at some point, or just want to talk about it, you know where to find us!

* I’m not being flippant here. Next time you’re at a conference, go and talk to the grad students, all sweaty in their suits, getting fake interviews from recruiters. Look at their CVs and resumes. Visit the recruitment room. Go and look at LinkedIn. The whole thing is totally depressing. We’ve trained them to present the wrong versions of themselves.

Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

Pricing professional services, again

I have written about this before, but in my other life as an owner of a coworking space. It's come up in Software Underground a couple of times recently, so I thought it might be time to revisit the crucial question for anyone offering services: what do I charge?

Unfortunately, it's not a simple answer. And before you read any further, you also need to understand that I am no business mastermind. So you should probably ignore everything I write. (And please feel free to set me straight!)

Here's a bit of the note I got recently from a blog reader:

I'm planning to start doing consulting and projects of seismic interpretation and prospect generations but I don't know what's a fair price to charge for services. I sure there're many of factors. I was wondering if you can share some tips on how to calculate/determine the cost of a seismic interpreter project? Is it by sq mi of data interpreted, maps of different formations, presentations, etc.?

Let's break the reply down into a few aspects:

Know the price you're aiming for and don't go below it. I've let myself get beaten down once or twice, and it's not a recipe for success: you may end up resenting the entire job. One opinion on Software Underground was to start with a high price, then concede to the client during negotiations. I tend to keep a fair price fixed from the start, and negotiate on other things (scope and deliverables). Do try not to get sucked into too much itemization though: it will squeeze your margins.

But what is the price you're aiming for? It depends on your fixed costs (how much do you need to get the work done and pay yourself what you need to live on?), time, complexity, your experience, how simple you want your pricing to be, and so on. All these things are difficult. I tend to go for simplicity, because I don't want the administrative overhead of many line items, keeping track of time, etc. Sometimes this bites me, sometimes (maybe) I come out ahead.

Come on, be specific. If you've recently had a 'normal' job, then a good starting point is to know your "fully loaded cost" (i.e. what you really cost, with benefits, bonuses, cubicle, coffee, computer, and so on). This is typically about 2 to 2.5 times your salary(!). That's what you would like to make in about 200 days of work. You will quickly realize why consultants are apparently so expensive: people are expensive, especially people who are good at things.

If I ever feel embarrassed to ask for my fee, I remind myself that when I worked at Halliburton, my list price as a young consultant was USD 2400 per day. Clients would sign year-long contracts for me at that rate.

It's definitely a good idea to know what you're competing with. However, it can be very hard to find others' pricing information. If you have a good relationship with the client, they may even tell you what they are used to paying. Maybe you give them a better price, or maybe you're more expensive, because you're more awesome.

Remember your other bottom lines. Money is not everything. If we get paid for work on an open source project (open code or open content), we always discount the price, often by 50%. If we care deeply about the work, we ask for less than usual. Conversely, if the work comes with added stress or administration, we charge a bit more.

One thing's for sure: sometimes (often) you're leaving money on the table. Someone out there is charging (way) more for (much) lower quality. Conversely, someone is probably charging less and doing a better job. The lack of transparency around pricing and salaries in the industry doubtless contributes to this. In the end, I tend to be as open as possible with the client. Often, prices change for the next piece of work for the same client, because I have more information the second time.

Opinions wanted

There's no doubt, it's a difficult subject. The range of plausible prices is huge: $50 to$500 per hour, as someone on Software Underground put it. Nearer $50 to$100 for a routine programming job, $200 for professional input,$400 for more awesomeness than you can handle. But if there's a formula, I've yet to discover it. And maybe a fair formula is impossible, because providing critical insight isn't really something you can pay for on a 'per hour' kind of basis — or shouldn't be.

I'm very open to more opinions on this topic. I don't think I've heard the same advice yet from any two people. When I asked one friend about it he said: "Keep increasing your prices until someone says No."

Then again, he doesn't drive a Porsche either.

If you found this post useful, you might like the follow-up post too: Beyond pricing: the fine print.

Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

The big data eye-roll

First, let's agree on one thing: 'big data' is a half-empty buzzword. It's shorthand for 'more data than you can look at', but really it's more than that: it branches off into other hazy territory like 'data science', 'analytics', 'deep learning', and 'machine intelligence'. In other words, it's not just 'large data'.

Anyway, the buzzword doesn't bother me too much. What bothers me is when I talk to people at geoscience conferences about 'big data', about half of them roll their eyes and proclaim something like this: "Big data? We've been doing big data since before these punks were born. Don't talk to me about big data."

This is pretty demonstrably a load of rubbish.

What the 'big data' movement is trying to do is not acquire loads of data then throw 99% of it away. They are not processing it in a purely serial pipeline, making arbitrary decisions about parameters on the way. They are not losing most of it in farcical enterprise data management train-wrecks. They are not locking most of their data up in such closed systems that even they don't know they have it.

They are doing the opposite of all of these things.

If you think 'big data', 'data' science' and 'machine learning' are old hat in geophysics, then you have some catching up to do. Sure, we've been toying with simple neural networks for years, eg probabilistic neural nets with 1 hidden layer — though this approach is very, very far from being mainstream in subsurface — but today this is child's play. Over and over, and increasingly so in the last 3 years, people are showing how new technology — built specifically to handle the special challenge that terabytes bring — can transform any quantitative endeavour: social media and online shopping, sure, but also astronomy, robotics, weather prediction, and transportation. These technologies will show up in petroleum geoscience and engineering. They will eat geostatistics for breakfast. They will change interpretation.

So when you read that Google has open sourced its TensorFlow deep learning library (9 November), or that Microsoft has too (yesterday), or that Facebook has too (months ago), or that Airbnb has too (in August), or that there are a bazillion other super easy-to-use packages out there for sophisticated statistical learning, you should pay a whole heap of attention! Because machine learning is coming to subsurface.

Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

Touring vs tunnel vision

My experience with software started, and still largely sits, at the user end. More often than not, interacting with another's design. One thing I have learned from the user experience is that truly great interfaces are engineered to stay out of the way. The interface is only a skin atop the real work that software does underneath — taking inputs, applying operations, producing outputs. I'd say most users of computers don't know how to compute without an interface. I'm trying to break free from that camp.

In The dangers of default disdain, I wrote about the power and control that the technology designer has over his users. A kind of tunnel is imposed that restricts the choices for interacting with data. And for me, maybe for you as well, the tunnel has been a welcome structure, directing my focus towards that distant point; the narrow aperture invokes at least some forward motion. I've unknowingly embraced the tunnel vision as a means of interacting without substantial choices, without risk, without wavering digressions. I think it's fair to say that without this tunnel, most travellers would find themselves stuck, incapacitated by the hard graft of touring over or around the mountain.

But there is nothing to do inside the tunnel, no scenery to observe, just a black void between input and output. For some tasks, taking the tunnel is the only obvious and economic choice — all you want is to get stuff done. But choosing the tunnel means you will be missing things along the way. It's a trade off.

For getting from A to B, there are engineers to build tunnels, there are travellers to travel the tunnels, and there is a third kind of person altogether: tour guides take the scenic route. Building your own tunnel is a grand task, only worthwhile if you can find enough passengers to use it. The scenic route isn't just a casual lackadaisical approach. It's necessary for understanding the landscape; by taking it the traveler becomes connected with the territory. The challenge for software and technology companies is to expose people to the richness of their environment while moving them through at an acceptable pace. Is it possible to have a tunnel with windows?

Oil and gas operating companies are good at purchasing the tunnel access pass, but are not very good at building a robust set of tools to navigate the landscape of their data environment. After all, that is the thing that we travellers need to be in constant contact with. Touring or tunneling? The two approaches may or may not arrive at the same destination and they have different costs along the way, making it different business.

On being the world's smallest technical publishing company

Four months ago we launched our first book, 52 Things You Should Know About Geophysics. This little book contains 52 short essays by 37 amazing geoscientists. And me and Evan.

Since it launched, we've been having fun hearing from people who have enjoyed it:

Yesterday's mail brought me an envelope from Stavanger — Matteo Niccoli sent me a copy of 52 Things. In doing so he beat me to the punch as I've been meaning to purchase a copy for some time. It's a good thing I didn't buy one — I'd like to buy a dozen. [a Calgary geoscientist]

A really valuable collection of advice from the elite in Geophysics to help you on your way to becoming a better more competent Geophysicist. [a review on Amazon.co.uk]

We are interested in ordering 50 to 100 copies of the book 52 Things You Should Know About Geophysics [from an E&P company. They later ordered 100.]

The economics

We thought some people might be interested in the economics of self-publishing. If you want to know more, please ask in the comments — we're happy to share our experiences.

We didn't approach a publisher with our book. We knew we wanted to bootstrap and learn — the Agile way. Before going with Amazon's CreateSpace platform, we considered Lightning Source (another print-on-demand provider), and an ordinary 'web press' printer in China. The advantages of CreateSpace are Amazon's obvious global reach, and not having to carry any inventory. The advantages of a web press are the low printing cost per book and the range of options — recycled paper, matte finish, gatefold cover, and so on.

So, what does a book cost?

• You could publish a book this way for $0. But, unless you're an editor and designer, you might be a bit disappointed with your results. We spent about$4000 making the book: interior design about $2000, cover design was about$650, indexing about $450. We lease the publishing software (Adobe InDesign) for about$35 per month.
• Each book costs $2.43 to manufacture. Books are printed just in time — Amazon's machines must be truly amazing. I'd love to see them in action. • The cover price is$19 at Amazon.com, about €15 at Amazon's European stores, and £12 at Amazon.co.uk. Amazon are free to offer whatever discounts they like, at their expense (currently 10% at Amazon.com). And of course you can get free shipping. Amazon charges a 40% fee, so after we pay for the manufacturing, we're left with about $8 per book. • We also sell through our own estore, at$19. This is just a slightly customizable Amazon page. This channel is good for us because Amazon only charges 20% of the sale price as their fee. So we make about $12 per book this way. We can give discounts here too — for large orders, and for the authors. • Amazon also sells the book through a so-called expanded distribution channel, which puts the book on other websites and even into bookstores (highly unlikely in our case). Unfortunately, it doesn't give us access to universities and libraries. Amazon's take is 60% through this channel. • We sell a Kindle edition for$9. This is a bargain, by the way—making an attractive and functional ebook was not easy. The images and equations look terrible, ebook typography is poor, and it just feels less like a book, so we felt $9 was about right. The physical book is much nicer. Kindle royalties are complicated, but we make about$5 per sale.

By the numbers

It doesn't pay to fixate on metrics—most of the things we care about are impossible to measure. But we're quantitative people, and numbers are hard to resist. To recoup our costs, not counting the time we lovingly invested, we need to sell 632 books. (Coincidentally, this is about how many people visit agilegeoscience.com every week.) As of right now, there are 476 books out there 'in the wild', 271 of which were sold for actual money. That's a good audience of people — picture them, sitting there, reading about geophysics, just for the love of it.

The bottom line

My wife Kara is an experienced non-fiction publisher. She's worked all over the world in editorial and production. So we knew what we were getting into, more or less. The print-on-demand stuff was new to her, and the retail side of things. We already knew we suck at marketing. But the point is, we knew we weren't in this for the money, and it's about relevant and interesting books, not marketing.

And now we know what we're doing. Sorta. We're in the process of collecting 52 Things about geology, and are planning others. So we're in this for one or two more whatever happens, and we hope we get to do many more.

We can't say this often enough: Thank You to our wonderful authors. And Thank You to everyone who has put down some hard-earned cash for a copy. You are awesome.

Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

The Agile toolbox

Some new businesses go out and raise millions in capital before they do anything else. Not us — we only do what we can afford. Money makes you lazy. It's technical consulting on a shoestring!

If you're on a budget, open source is your best friend. More than this, it's important an open toolbox is less dependent on hardware and less tied to workflows. Better yet, avoiding large technology investments helps us avoid vendor lock-in, and the resulting data lock-in, keeping us more agile. And there are two more important things about open source:

• You know exactly what the software does, because you can read the source code
• You can change what the software does, becase you can change the source code

Anyone who has waited 18 months for a software vendor to fix a bug or add a feature, then 18 more months for their organization to upgrade the software, knows why these are good things.

So what do we use?

In the light of all this, people often ask us what software we use to get our work done.

Hardware  Matt is usually on a dual-screen Apple iMac running OS X 10.6, while Evan is on a Samsung Q laptop (with a sweet solid-state drive) running Windows. Our plan, insofar as we have a plan, is to move to Mac Pro as soon as the new ones come out in the next month or two. Pure Linux is tempting, but Macs are just so... nice.

Geoscience interpretation  dGB OpendTect, GeoCraftQuantum GIS (above). The main thing we lack is a log visualization and interpretation tool. Beyond this, we don't use them much yet but Madagascar and GMT are plugged right into OpendTect. For getting started on stratigraphic charts, we use TimeScale Creator

A quick aside, for context: when I sold Landmark's GeoProbe seismic interpretation tool, back in 2003 or so, the list price was USD140 000 per user, choke, plus USD25k per year in maintenance. GeoProbe is very powerful now (and I have no idea what it costs), but OpendTect is a much better tool that that early edition was. And it's free (as in speech, and as in beer).

Geekery, data mining, analysis  Our core tools for data mining are Excel, Spotfire Silver (an amazing but proprietary tool), MATLAB and/or GNU Octave, random Python. We use Gephi for network analysis, FIJI for image analysis, and we have recently discovered VISAT for remote sensing images. All our mobile app development has been in MIT AppInventor so far, but we're playing with the PhoneGap framework in Eclipse too.

Writing and drawing  Google Docs for words, Inkscape for vector art and composites, GIMP for rasters, iMovie for video, Adobe InDesign for page layout. And yeah, we use Microsoft Office and OpenOffice.org too — sometimes it's just easier that way. For managing references, Mendeley is another recent discovery — it is 100% awesome. If you only look at one tool in this post, look at this.

Collaboration  We collaborate with each other and with clients via SkypeDropbox, Google+ Hangouts, and various other Google tools (for calendars, etc). We also use wikis (especially SubSurfWiki) for asynchronous collaboration and documentation. As for social media, we try to maintain some presence in Google+, Facebook, and LinkedIn, but our main channel is Twitter.

Web  This website is hosted by Squarespace for reliability and reduced maintenance. The MediaWiki instances we maintain (both public and private) are on MediaWiki's open source platform, running on Amazon's Elastic Compute servers for flexibility. An EC2 instance is basically an online Linux box, running Ubuntu and Bitnami's software stack, plus some custom bits and pieces. We are launching another website soon, running WordPress on Amazon EC2. Hover provides our domain names — an awesome Canadian company.

Administrative tools  Every business has some business tools. We use Tick to track our time — it's very useful when working on multiple projects, subscontractors, etc. For accounting we recently found Wave, and it is the best thing ever. If you have a small business, please check it out — after headaches with several other products, it's the best bean-counting tool I've ever used.

If you have a geeky geo-toolbox of your own, we'd love to hear about it. What tools, open or proprietary, couldn't you live without?

Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

News of the week

Newsworthy items of the last fortnight or so. We look for stories in the space between geoscience and technology, but if you come across anything you think we should cover, do tell

Tibbr at work

Tibbr is a social media engine for the enterprise, a sort of in-house Facebook. Launched in January by TIBCO, it's noteworthy because of TIBCO's experience; they're the company behind Spotfire among other things. It has some interesting features, like videocalling, voicemail integration and analytics (of course), that should differentiate it from competitors like Yammer. What these tools do for teamwork and integration is yet to be seen.

The 3D world in 3D

Occasionally you see software you can't wait to get your hands on. When Ron Schott posted this video of some mud-cracks, we immediately started thinking of the possibilities for outcrops, hand specimens, SEM photography,... However, the new 123D Catch software from Autodesk only runs on Windows so Matt hasn't been able to test it yet. On the plus side, it's free, for now at least.

To continue the social media thread, Ron is very interested in its role in geoscience. He's an early adopter of Google+, so if you're interested in how these tools might help you, add him to one of your circles or hangout with him. As for us, we're still finding our way in G+.

This regular news feature is for information only. We aren't connected with any of these people or organizations, and don't necessarily endorse their products or services. Unless we say we think they're great.

McKelvey's reserves and resources

Vincent McKelvey (right) was chief geologist at the US Geological Survey, and then its director from 1971 until 1977. Rather like Sherman Kent at the CIA, who I wrote about last week, one of his battles was against ambiguity in communication. But rather than worrying about the threat posed by the Soviet Union or North Korea, his concern was the reporting of natural resources in the subsurface of the earth. Today McKelvey's name is associated with a simple device for visualizing levels of uncertainty and risk associated with mineral resources: the McKelvey box.

Here (left) is a modernized version. It helps unravel some oft-heard jargon. The basic idea is that only discovered, commercially-viable deposits get to be called Reserves. Discovered but sub-commercial (with today's technology and pricing) are contingent resources. Potentially producible and viable deposits that we've not yet found are called prospective resources. These are important distinctions, especially if you are a public company or a government.

Over time, this device has been reorganized and subdivided with ever more subtle distinctions and definitions. I was uninspired by the slightly fuzzy graphics in the ongoing multi-part review of reserve reporting in the CSPG Reservoir magazine (Yeo and Derochie, 2011, Reserves and resources series, CSPG Reservoir, starting August 2011). So I decided to draw my own version. To reflect the possiblity that there may yet be undreamt-of plays out there, I added a category for Unimagined resources. One for the dreamers.

You can find the Scalable Vector Graphics file for this figure in SubSurfWiki. If you have ideas about other jargon to add, or ways to represent the uncertainty, please have a go at editing the wiki page, the figure, or drop us a line!

Matt Hall

Matt is a geoscientist in Nova Scotia, Canada. Founder of Agile Scientific, co-founder of The HUB South Shore. Matt is into geology, geophysics, and machine learning.

Wherefore art thou, Expert?

I don't buy the notion that we should be competent at everything we do. Unless you have chosen to specialize, as a petrophysicist or geophysical analyst perhaps, you are a generalist. Perhaps you are the manager of an asset team, or a lone geophysicist in a field development project, or a junior geologist looking after a drilling program. You are certainly being handed tasks you have never done before, and being asked to think about problems you didn't even know existed this time last year. If you're anything like me, you are bewildered at least 50% of the time.

In this post, I take a look at some problems with assuming technical professionals can be experts at everything, especially in this world of unconventional plays and methods. And I even have some ideas about what geoscientists, specialists and service companies can do about them...

News of the week

Happy Canada Day! Here is the news.

Scotian basin revivial?

Geologist–reporter Susan Eaton has a nice piece in the AAPG Explorer this month, explaining why some operators still see promise in the Scotian Basin, on Canada's Atlantic margin. The recent play fairway analysis mentioned in the report, however, is long overdue and still not forthcoming. When it is, we hope the CNSOPB and government promoters fully embrace openess and get more data into the public domain.

Yet another social network!

In the wake of LinkedIn's IPO, in which the first day of trading was over 500 times its net earnings in 2010, many other social networks are starting to pop up. Last month we mentioned SEG's new Communities. Finding Petroleum is a new social network, supported by the publishers of the Digital Energy Journal, aimed at oil and gas professionals. These sites are an anti-trust anomaly, since they almost have to be monopolies to succeed, and with so much momemtum carried by LinkedIn and Facebook, new entrants will struggle for attention. Most of the Commmunities in SEG seem to be essentially committee-based and closed, and LinkedIn micro-networks are getting chaotic, so maybe there's a gap here. Our guess is that there isn't.

The oil & gas blogosphere

Companies are increasingly turning to blogging and social media tools to expand their reach and promote their pursuits. Here are a couple of industry blogs that have caught our eye recently. If you are looking to read more about what's happening in subsurface oil and gas technology, these blogs are a good place to start.

If you use a microblogging service like Yammer, you may not know that you can also follow Twitter feeds. For example, here's a Twitter list of various companies in oil & gas.

Job security in geoscience

Historically, the oil and gas industry follows hot and cold (or, if you prefer, boom and bust) cycles, but the US Bureau of Labor Statistics predicts geoscience jobs will be increasingly in demand. A recent article from The Street reports on these statistics suggesting that the earth science sector is shaping up to be genuinely recession proof. If there is such a thing.

Agile* apps update

We're happy to report that all of Agile's apps have been updated in the last week, and we have a brand new app in the Android Market! The newest app, called Tune*, is a simple calculator for wedge modeling and estimating the amplitude tuning response of thin-beds, as shown here.

In our other apps, the biggest new feature is the ability to save cases or scenarios to a database on the device, so you can pull them up later.

Read more on our Apps page.

This regular news feature is for information only. Apart from Agile*, obviously, we aren't connected with any of these organizations, and don't necessarily endorse their products or services.