The Computer History Museum

Mountain View, California, looking northeast over US 101 and San Francisco Bay. The Computer History Museum sits between the Googleplex and NASA Ames. Hangar 1, the giant airship hangar, is visible on the right of the image. Imagery and map data © Google, Landsat/Copernicus.

A few days ago I was lucky enough to have a client meeting in Santa Clara, California. I had not been to Silicon Valley before, and it was more than a little exciting to drive down US Route 101 past the offices of Google, Oracle and Amazon and basically every other tech company, marvelling at Intel’s factory and the hangars at NASA Ames, and seeing signs to places like Stanford, Mountain View, and Menlo Park.

I had a spare day before I flew home, and decided to visit Stanford’s legendary geophysics department, where there was a lecture that day. With an hour or so to kill, I thought I’d take in the Computer History Museum on the way… I never made it to Stanford.

The museum

The Computer History Museum was founded in 1996, building on an ambition of über-geek Gordon Bell. It sits in the heart of Mountain View, surrounded by the Googleplex, NASA Ames, and Microsoft. It’s a modern, airy building with the museum and a small café downstairs, and meeting facilities on the upper floor. It turns out to be an easy place to burn four hours.

I saw a lot of computers that day. You can see them too because much of the collection is in the online catalog. A few things that stood out for me were:

No seismic

I had been hoping to read more about the early days of Texas Instruments, because it was spun out of a seismic company, Geophysical Service or GSI, and at least some of their early integrated circuit research was driven by the needs of seismic imaging. But I was surprised not to find a single mention of seismic processing in the place. We should help them fix this!

Matt Hall

I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis).

SEG-Y Rev 2 again: little-endian is legal!

Big news! Little-endian byte order is finally legal in SEG-Y files.

That's not all. I already spilled the beans on 64-bit floats. You can now have up to 18 quintillion traces (18 exatraces?) in a seismic line. And, finally, the hyphen confusion is cleared up: it's 'SEG-Y', with a hyphen. All this is spelled out in the new SEG-Y specification, Revision 2.0, which was officially released yesterday after at least five years in the making. Congratulations to Jill Lewis, Rune Hagelund, Stewart Levin, and the rest of the SEG Technical Standards Committee

Back up a sec: what's an endian?

Whenever you have to think about the order of bytes (the 8-bit chunks in a 'word' of 32 bits, for example) --- for instance when you send data down a wire, or store bytes in memory, or in a file on disk --- you have to decide if you're Roman Catholic or Church of England.

What?

In one of the more obscure satirical analogies in English literature, Jonathan Swift wrote about the ideological tussle between between two factions of Lilliputians in Gulliver's Travels (1726). The Big-Endians liked to break their eggs at the big end, while the Little-Endians preferred the pointier option. Chaos ensued.

Two hundred and fifty years later, Danny Cohen borrowed the terminology in his 1 April 1980 paper, On Holy Wars and a Plea for Peace --- in which he positioned the Big-Endians, preferring to store the big bytes first in memory, against the Little-Endians, who naturally prefer to store the little ones first. Big bytes first is how the Internet shuttles data around, so big-endian is sometimes called network byte order. The drawing (right) shows how the 4 bytes in a 32-bit 'word' (the hexadecimal codes 0A, 0B, 0C and 0D) sit in memory.

Because we write ordinary numbers big-endian style --- 2017 has the thousands first, the units last --- big-endian might seem intuitive. Then again, lots of people write dates as, say, 31-03-2017, which is analogous to little-endian order. Cohen reviews the computational considerations in his paper, but really these are just conventions. Some architectures pick one, some pick the other. It just happens that the x86 architecture that powers most desktop and laptop computers is little-endian, so people have been illegally (and often accidentally) writing little-endian SEG-Y files for ages. Now it's actually allowed.

Still other byte orders are possible. Some processors, notably ARM and other RISC architectures, are middle-endian (aka mixed endian or bi-endian). You can think of this as analogous to the month-first American date format: 03-31-2017. For example, the two halves of a 32-bit word might be reversed compared to their 'pure' endian order. I guess this is like breaking your boiled egg in the middle. Swift did not tell us which religious denomination these hapless folks subscribe to.

OK, that's enough about byte order

I agree. So I'll end with this handy SEG-Y cheatsheet. Click here for the PDF.

References and acknowledgments

Cohen, Danny (April 1, 1980). On Holy Wars and a Plea for PeaceIETF. IEN 137. "...which bit should travel first, the bit from the little end of the word, or the bit from the big end of the word? The followers of the former approach are called the Little-Endians, and the followers of the latter are called the Big-Endians." Also published at IEEE ComputerOctober 1981 issue.

Matt Hall

I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis).

More precise SEG-Y?

The impending SEG-Y Revision 2 release allows the use of double-precision floating point numbers. This news might leave some people thinking: "What?".

Integers and floats

In most computing environments, there are various kinds of number. The main two are integers and floating point numbers. Let's take a quick look at integers, or ints, first.

Integers can only represent round numbers: 0, 1, 2, 3, etc. They can have two main flavours: signed and unsigned, and various bit-depths, e.g. 8-bit, 16-bit, and so on. An 8-bit unsigned integer can have values between 0 and 255; signed ints go from -128 to +127 using a mathematical operation called two's complement.

As you might guess, floating point numbers, or floats, are used to represent all the other numbers — you know, numbers like 4.1 and –7.2346312 × 10¹³ — we need lots of those.

Floats in binary

OK, so we need to know about floats. To understand what double-precision means, we need to know how floats are represented in computers. In other words, how on earth can a binary number like 01000010011011001010110100010101 represent a floating point number?

It's fairly easy to understand how integers are stored in binary: the 8-bit binary number 01001101 is the integer 77 in decimal, or 4D in hexadecimal; 11111111 is 255 (base 10) or FF (base 16) if we're dealing with unsigned ints, or -1 decimal if we're in the two's complement realm of signed ints.

Clearly we can only represent a certain number of values with, say, 16 bits. This would give us 65 536 integers... but that's not enough dynamic range represent tiny or gigantic floats, not if we want any precision at all. So we have a bit of a paradox: we'd like to represent a huge range of numbers (down around the mass of an electron, say, and up to Avogadro's number), but with reasonably high precision, at least a few significant figures. This is where floating point representations come in.

Scientific notation, sort of

If you're thinking about scientific notation, you're thinking on the right lines. We raise some base (say, 10) to some integer exponent, and multiply by another integer (the mantissa, or significand). That way, we can write a huge range of numbers, with plenty of precision, using only two integers. So:

$$3.14159 = 314159 \times 10^{-5} \ \ \mathrm{and} \ \ 6.02214 \times 10^{23} = 602214 \times 10^{18}$$

If I have two bytes at my disposal (so-called 'half precision'), I could have an 8-bit int for the integer part, called the significand, and another 8-bit int for the exponent. Then we could have floats from $$0$$ up to $$255 \times 10^{255}$$. The range is pretty good, but clearly I need a way to get negative significands — maybe I could use one bit for the sign, and leave 7 bits for the exponent. I also need a way to get negative exponents — I could assign a bias of –64 the exponent, so that 127 becomes 63 and an exponent of 0 becomes –64. More bits would help, and there are other ways to apportion the bits, and we can use tricks like assuming that the significand starts with a 1, storing only the fractional part and thereby saving a bit. Every bit counts!

IBM vs IEEE floats

The IBM float and IEEE 754-2008 specifications are just different ways of splitting up the bits in a floating point representation. Single-precision (32-bit) IBM floats differ from single-precision IEEE floats in two ways: they use 7 bits and a base of 16 for the exponent. In contrast, IEEE floats — which are used by essentially all modern computers — use 8 bits and base 2 (usually) for the exponent. The IEEE standard also defines not-a-numbers (NaNs), and positive and negative infinities, among other conveniences for computing.

In double-precision, IBM floats get 56 bits for the fraction of the significand, allowing greater precision. There are still only 7 bits for the exponent, so there's no change in the dynamic range. 64-bit IEEE floats, however, use 11 bits for the exponent, leaving 52 bits for the fraction of the significand. This scheme results in 15–17 sigificant figures in decimal numbers.

The diagram below shows how four bytes (42, 6C, AD, 15) are interpreted under the two schemes. The results are quite different. Notice the extra bit for the exponent in the IEEE representation, and the different bases and biases.

IBM, IEEE, and seismic

When SEG-Y was defined in 1975, there were only IBM floats — IEEE floats were not defined until 1985. The SEG allowed the use of IEEE floating-point numbers in Revision 1 (2002), and they are still allowed in the impending Revision 2 specification. This is important because most computers these days use IEEE float representations, so if you want to read or write IBM floats, you're going to need to do some work.

The floating-point format in a particular SEG-Y files should be indicated by a flag in bytes 3225–3226. A value of 0x01 indicates IBM floats, while 0x05 indicates IEEE floats. Then again, you can't believe everything you read in headers. And, unfortunately, you can't tell an IBM float just by looking at it. Meisinger (2004) wrote a nice article in Recorder about the perils of loading IBM as IEEE and vice versa — illustrated below. You should read it.

From Meisinger, D (2004). SEGY floating point confusion. CSEG Recorder 29(7). Available online.

I wrote this post by accident while writing about endianness, the main big change in the new SEG-Y revision. Stay tuned for that post!

Matt Hall

I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis).

The quick green forsterite jumped over the lazy dolomite

The best-known pangram — a sentence containing every letter of the alphabet —  is probably

There are lots of others of course. If you write like James Joyce, there are probably an infinite number of others. The point is to be short, and one of the shortest, with only 29 letters (!), even has a geological flavour:

I know what you're thinking: Cool, but what's the shortest set of mineral names that uses all the letters of the alphabet? What logophiliac geologist would not wonder the same thing?

Well, we posed this question in the most recent "Riddle me this" segment on the Undersampled Radio podcast. This blog post is my solution.

The set cover problem

Finding pangrams in a list of words amounts to solving the classical set cover problem:

Our universe is the alphabet, and our $$S$$ is the list of $$m$$ mineral names. There is a slight twist in our case: the set cover problem wants the smallest subset of $$S$$ — the fewest members. But in this problem, I suspect there are several 4-word solutions (judging from my experiments), so I want the smallest total size of the members of the subset. That is, I want the fewest total letters in the solution.

The solution

The set cover problem was shown to be NP-complete in 1972. What does this mean? It means that it's easy to tell if you have an answer (do you have all the letters of the alphabet?), but the only way to arrive at a solution is — to oversimplify massively — by brute force. (If you're interested in this stuff, this edition of the BBC's In Our Time is one of the best intros to P vs NP and complexity theory that I know of.)

Anyway, the point is that if we find a better way than brute force to solve this problem, then we need to write a paper about it immediately, claim our prize, collect our turkey, then move to a sunny tax haven with good water and double-digit elevation.

So, this could take a while: there are over 95 billion ways to draw 3 words from my list of 4600 mineral names. If we need 4 minerals, there are 400 trillion combinations... and a quick calculation suggests that my laptop will take a little over 50 years to check all the combinations.

Can't we speed it up a bit?

Brute force is one thing, but we don't need to be brutish about it. Maybe we can think of some strategies to give ourselves a decent chance:

• The list is alphabetically sorted, so randomize the list before searching. (I did this.)
• Guess some 'useful' minerals and ensure that you get to them. (I did this too, with quartz.)
• Check there are at least 26 letters in the candidate words, and (if it's only records we care about) no more than 44, because I have a solution with 45 letters (see below).
• We could sort the list into word length order. That way we search shorter things first, so we should get shorter lists (which we want) earlier.
• My solution does not depend much on Python's set type. Maybe we could do more with set theory.
• Before inspecting the last word in each list, we could make sure it contains at least one letter that's so far missing.

So far, the best solution I've come up with so far has 45 letters, so there's plenty of room for improvement:

'quartz', 'kvanefjeldite', 'abswurmbachite', 'pyroxmangite'

My solution is in this Jupyter Notebook. Please put me out of my misery by improving on it.

Comment

Matt Hall

I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis).

Two new short courses in Calgary

We're running two one-day courses in Calgary for the CSPG Spring Education Week. One of them is a bit... weird, so I thought I'd try to explain what we're up to.

Both classes run from 8:30 till 4:00, and both of them cost just CAD 425 for CSPG members.

Get introduced to Python

The first course is Practical programming for geoscientists. Essentially a short version of our 2 to 3 day Creative geocomputing course, we'll take a whirlwind tour through the Python programming language, then spend the afternoon looking at some basic practical projects. It might seem trivial, but leaving with a machine fully loaded with all the tools you'll need, plus long list of resources and learning aids, is worth the price of admission alone.

If you've always wanted to get started with the world's easiest-to-learn programming language, this is the course you've been waiting for!

Hashtag geoscience

This is the weird one. Hashtag geoscience: communicating geoscience in the 21st century. Join me, Evan, Graham Ganssle (my co-host on Undersampled Radio) — and some special guests — for a one-day sci comm special. Writing papers and giving talks is all so 20th century, so let's explore social media, blogging, podcasting, open access, open peer review, and all the other exciting things that are happening in scientific communication today. These tools will not only help you in your job, you'll find new friends, new ideas, and you might even find new work.

I hope a lot of people come to this event. For one, it supports the CSPG (we're not getting paid, we're on expenses only). Secondly, it'll be way more fun with a crowd. Our goal is for everyone to leave burning to write a blog, record a podcast, or at least create a Twitter account.

One of our special guests will be young-and-famous geoscience vlogger Dr Chris. Coincidentally, we just interviewed him on Undersampled Radio. Here's the uncut video version; audio will be on iTunes and Google Play in a couple of days:

Comment

Matt Hall

I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis).

Unearthing gold in Toronto

I just got home from Toronto, the mining capital of the world, after an awesome weekend hacking with Diego Castañeda, a recent PhD grad in astrophysics that is working with us) and Anneya Golob (another astrophysicist and Diego's partner). Given how much I bang on about hackathons, it might surprise you to know that this was the first hackathon I have properly participated in, without having to order tacos or run out for more beer every couple of hours.

PArticipants being briefed by one of the problem sponsors on the first evening.

What on earth is Unearthed?

The event (read about it) was part of a global series of hackathons organized by Unearthed Solutions, a deservedly well-funded non-profit based in Australia that is seeking to disrupt every single thing in the natural resources sector. This was their fourteenth event, but their first in Canada. Remarkably, they got 60 or 70 hackers together for the event, which I know from my experience organizing events takes a substantial amount of work. Avid readers might remember us mentioning them before, especially in a guest post by Jelena Markov and Tom Horrocks in 2014.

A key part of Unearthed's strategy is to engage operating companies in the events. Going far beyond mere sponsorship, Barrick Gold sent several mentors to the event, the Chief Innovation Officer Michelle Ash, as well as two judges, Ed Humphries (head of digital transformation) and Iain Allen (head of digital mining). Barrick provided the chellenge themes, as well as data and vivid descriptions of operational challenges. The company was incredibly candid with the participants, and should be applauded for its support of what must have felt like a pretty wild idea.

Team Auger Effect: Diego and Anneya hacking away on Day 2.

What went down?

It's hard to describe a hackathon to someone who hasn't been to one. It's like trying to describe the Grand Canyon, ice climbing, or a 1985 Viña Tondonia Rioja. It's always fun to see and hear the reactions of the judges and other observers that come for the demos in the last hours of the event: disbelief at what small groups of humans can do in a weekend, for little tangible reward. It flies in the face of everything you think you know about creativity, productivity, motivation, and collaboration. Not to mention intellectual property.

As the fifteen (!) teams made their final 5-minute pitches, it was clear that every single one of them had created something unique and useful. The judges seemed genuinely blown away by the level of accomplishment. It's hard to capture the variety, but I'll have a go with a non-comprehensive list. First, there was a challenge around learning from geoscience data:

• BGC Engineering, one of the few pro teams and First Place winner, produced an impressive set of tools for scraping and analysing public geoscience data. I think it was a suite of desktop tools rather than a web application.
• Mango (winners of the Young Innovators award), Smart Miner (second place overall), Crater Crew, Aureka, and Notifyer and others presented map-based browsers for public mining data, with assistance from varying degrees of machine intelligence.
• Auger Effect (me, Diego, and Anneya) built a three-component system consisting of a browser plugin, an AI pipeline, and a social web app, for gathering, geolocating, and organizing data sources from people as they research.

The other challenge was around predictive maintenance:

• Tyrelyze, recognizing that two people a year are killed by tyre failures, created a concept for laser scanning haul truck tyres during operations. These guys build laser scanners for core, and definitely knew what they were doing.
• Decelerator (winners of the People's Choice award) created a concept for monitoring haul truck driving behaviour, to flag potentially expensive driving habits.
• Snapfix.io looked at inventory management for mine equipment maintenance shops.
• Arcana, Leo & Zhao, and others looked at various other ways of capturing maintenance and performace data from mining equipment, and used various strategies to try to predict

I will try to write some more about the thing we built... and maybe try to get it working again! The event was immensely fun, and I'm so glad we went. We learned a huge amount about mining too, which was eye-opening. Massive thanks to Unearthed and to Barrick on all fronts. We'll be back!

Brad BEchtold of Cisco (left) presenting the Young Innovator award for under-25s to Team Mango.

The winners of the People's Choice Award, Team Decelerate.

The winners of the contest component of the event, BGC Engineering, with Ed Humphries of Barrick (left).

UPDATE  View all the results and submissions from the event.

Wish there was a hackathon just for geoscientists and subsurface engineers?
You're in luck! Join us in Paris for the Subsurface Hackathon — sponsored by Dell EMC, Total E&P, NVIDIA, Teradata, and Sandstone. The theme is machine learning, and registration is open. There's even a bootcamp for anyone who'd like to pick up some skills before the hack.
Comment

Matt Hall

I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis).

Beyond pricing: the fine print

Earlier this week, I wrote about pricing professional services. A slippery topic, full of ifs and buts (just like geoscience!). And it was only half the story, because before commencing on a piece of work, you and your client have to agree on a lot of things besides price. To avoid confusion later, it's worth getting those things straight before you start.

Here are most of the things we try to cover in every agreement:

• Don't include expenses in your professional fees. Extras like travel expenses should always be separate. Be clear about your policies (for example, if I'm traveling more than 8 hours, I'm booking business class tickets). Do your client a favour by estimating the expenses for them, but pad everything a bit so you don't surprise them later with more than the estimate. Promise to provide receipts for everything.
• You should charge for your travel time. I usually charge this at my full day rate, but sometimes less if I know I can be somewhat productive on the journey. I've read of consultants not charging if they're traveling at the weekend, because it's not a normal work day... To me this is backwards: if I'm traveling for work at the weekend, someone better be paying for my time!
• Know your sales tax situation. I recommend getting professional help from a chartered accountant on this. Do not assume the client will know: they won't, and it's not their responsibility to, it's yours. I'm afraid you will be reading tax treaties and filling out some pretty gross forms.
• Charge more for travel outside your 'comfort zone'. For example, I add at least 20% outside the US & Canada or Western Europe, depending on the place. Travel is exhausting, you're away from home, you need vaccinations, you need visas, everything is unfamiliar. All good fun when you're on holiday, but stressful, expensive, and time-consuming when you're trying to be an awesome professional.
• Get paid as soon as possible. I've never done it, but I know people who charge a percentage up front. For longer jobs, specify that you wish to charge partial invoices, perhaps monthly. Ask for Net 15 terms (i.e. they'll have 15 days to pay your invoices), but settle for Net 30. Longer than this seems unfair to me. Add late fees and interest to overdue accounts, and reissue the invoice at every due date.
• Make it easy for people to pay you. Be specific about how to pay, and give people options. It's safe to give them your bank account details (I put them on my invoices, along with foreign banking details like SWIFT and IBAN codes), and electronic funds transfer is the best way to get paid. Put your tax number on your invoices, some clients need it. I use Stripe for accepting credit cards, but bear in mind that you probably don't want to accept credit cards over about $10k. • Get good at currencies. Remember to be very specific about currencies whenever you talk money. Use ISO4217. If you can, make things simple for people; I charge USD to US customers. I have a USD account and use a foreign exchange service (Firma) for FX. I spend a lot of money in the US too, so I also have a USD credit card, paid in USD. • Do the work you want to do, the way you want to do it. This is kind of the point of 'being your own boss', right? Of course, you have to be reasonable, and compromise when necessary (OK, if every contract is a compromise, maybe you are too particular!). Talk to your client! It's OK to negotiate, ask questions, and so on. Every time I've asked for contracts or terms to be changed, it has at least been entertained, and it usually works out. It can be awkward raising all this; you probably don't want to dump it all on someone at your first meeting. We usually put all the gory details into as short and informal a document as we can (usually part of the project description or 'scope of work', or in a 'letter of understanding'), show it to the client, answer their questions about it, and eventually reference it in the contract. Most contracts allow for some document that describes the specific conditions of the contract, but it's worth checking that those conditions don't contradict the contract. If they do, ask for the contract to be changed to reflect whatever you agree with the client. If you're just starting out on the professional services road, I hope this is all of some help to you. And I hope it's not too daunting. “Fine Print” by Damian Gadal is licensed under CC BY 2.0 1 Comment Matt Hall I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis). Pricing professional services, again I have written about this before, but in my other life as an owner of a coworking space. It's come up in Software Underground a couple of times recently, so I thought it might be time to revisit the crucial question for anyone offering services: what do I charge? Unfortunately, it's not a simple answer. And before you read any further, you also need to understand that I am no business mastermind. So you should probably ignore everything I write. (And please feel free to set me straight!) Here's a bit of the note I got recently from a blog reader: I'm planning to start doing consulting and projects of seismic interpretation and prospect generations but I don't know what's a fair price to charge for services. I sure there're many of factors. I was wondering if you can share some tips on how to calculate/determine the cost of a seismic interpreter project? Is it by sq mi of data interpreted, maps of different formations, presentations, etc.? Let's break the reply down into a few aspects: Know the price you're aiming for and don't go below it. I've let myself get beaten down once or twice, and it's not a recipe for success: you may end up resenting the entire job. One opinion on Software Underground was to start with a high price, then concede to the client during negotiations. I tend to keep a fair price fixed from the start, and negotiate on other things (scope and deliverables). Do try not to get sucked into too much itemization though: it will squeeze your margins. But what is the price you're aiming for? It depends on your fixed costs (how much do you need to get the work done and pay yourself what you need to live on?), time, complexity, your experience, how simple you want your pricing to be, and so on. All these things are difficult. I tend to go for simplicity, because I don't want the administrative overhead of many line items, keeping track of time, etc. Sometimes this bites me, sometimes (maybe) I come out ahead. Come on, be specific. If you've recently had a 'normal' job, then a good starting point is to know your "fully loaded cost" (i.e. what you really cost, with benefits, bonuses, cubicle, coffee, computer, and so on). This is typically about 2 to 2.5 times your salary(!). That's what you would like to make in about 200 days of work. You will quickly realize why consultants are apparently so expensive: people are expensive, especially people who are good at things. If I ever feel embarrassed to ask for my fee, I remind myself that when I worked at Halliburton, my list price as a young consultant was USD 2400 per day. Clients would sign year-long contracts for me at that rate. It's definitely a good idea to know what you're competing with. However, it can be very hard to find others' pricing information. If you have a good relationship with the client, they may even tell you what they are used to paying. Maybe you give them a better price, or maybe you're more expensive, because you're more awesome. Remember your other bottom lines. Money is not everything. If we get paid for work on an open source project (open code or open content), we always discount the price, often by 50%. If we care deeply about the work, we ask for less than usual. Conversely, if the work comes with added stress or administration, we charge a bit more. One thing's for sure: sometimes (often) you're leaving money on the table. Someone out there is charging (way) more for (much) lower quality. Conversely, someone is probably charging less and doing a better job. The lack of transparency around pricing and salaries in the industry doubtless contributes to this. In the end, I tend to be as open as possible with the client. Often, prices change for the next piece of work for the same client, because I have more information the second time. Opinions wanted There's no doubt, it's a difficult subject. The range of plausible prices is huge:$50 to $500 per hour, as someone on Software Underground put it. Nearer$50 to $100 for a routine programming job,$200 for professional input, \$400 for more awesomeness than you can handle. But if there's a formula, I've yet to discover it. And maybe a fair formula is impossible, because providing critical insight isn't really something you can pay for on a 'per hour' kind of basis — or shouldn't be.

I'm very open to more opinions on this topic. I don't think I've heard the same advice yet from any two people. When I asked one friend about it he said: "Keep increasing your prices until someone says No."

Then again, he doesn't drive a Porsche either.

If you found this post useful, you might like the follow-up post too: Beyond pricing: the fine print.

Matt Hall

I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis).

Strategies for a revolution

This must be a record. It has taken me several months to get around to recording the talk I gave last year at EAGE in Vienna — Strategies for a revolution. Rather a gradiose title, sorry about that, especially over-the-top given that I was preaching to the converted: the workshop on open source. I did, at least, blog aobut the goings on in the workshop itself at the time. I even followed it up with a slightly cheeky analysis of the discussion at the event. But I never posted my own talk, so here it is:

Too long didn't watch? No worries, my main points were:

1. It's not just about open source code. We must write open access content, put our data online, and push the whole culture towards openness and reproducibility.
2. We, as researchers, professionals, and authors, need to take responsibility for being more open in our practices. It has to come from within the community.
3. Our conferences need more tutorials, bootcamps, , hackathons and sprints. These events build skills and networks much faster than (just) lectures and courses.
4. We need something like an Open Geoscience Foundation to help streamline funding channels for open source projects and community events.

If you depend on open source software, or care about seeing more of it in our field, I'd love to hear your thoughts about how we might achieve the goal of having greater (scientific, professional, societal) impact with technology. Please leave a comment.

Matt Hall

I am a geoscientist in Nova Scotia, Canada. Founder of Agile Geoscience, co-founder of The HUB South Shore. Into geology, geophysics, programming in Python, and knowledge sharing (especially wikis).

No secret codes: announcing the winners

The SEG / Agile / Enthought Machine Learning Contest ended on Tuesday at midnight UTC. We set readers of The Leading Edge the challenge of beating the lithology prediction in October's tutorial by Brendon Hall. Forty teams, mostly of 1 or 2 people, entered the contest, submitting several hundred entries between them. Deadlines are so interesting: it took a month to get the first entry, and I received 4 in the second month. Then I got 83 in the last twenty-four hours of the contest.

How it ended

Team F1 Algorithm Language Solution
1 LA_Team (Mosser, de la Fuente) 0.6388 Boosted trees Python Notebook
2 PA Team (PetroAnalytix) 0.6250 Boosted trees Python Notebook
3 ispl (Bestagini, Tuparo, Lipari) 0.6231 Boosted trees Python Notebook
4 esaTeam (Earth Analytics) 0.6225 Boosted trees Python Notebook

The winners are a pair of graduate petroelum engineers, Lukas Mosser (Imperial College, London) and Alfredo de la Fuente (Wolfram Research, Peru). Not coincidentally, they were also one of the more, er, energetic teams — it's say to say that they explored a good deal of the solution space. They were also very much part of the discussion about the contest on GitHub.com and on the Software Underground Slack chat group, aka Swung (you're in there, right?).

I will be sending Raspberry Shakes to the winners, along with some other swag from Enthought and Agile. The second-place team will receive books from SEG (thank you SEG Book Mart!), and the third-place team will have to content themselves with swag. That team, led by Paolo Bestagini of the Politecnico di Milano, deserves special mention — their feature engineering approach was very influential, being used by most of the top-ranking teams.

Coincidentally Gram and I talked to Lukas on Undersampled Radio this week:

Back up a sec, what the heck is a machine learning contest?

To enter, a team had to predict the lithologies in two wells, given wireline logs and other data. They had complete data, including lithologies, in nine other wells — the 'training' data. Teams trained a wide variety of models — from simple nearest neighbour models and support vector machines, to sophisticated deep neural networks and random forests. These met with varying success, with accuracies ranging between about 0.4 and 0.65 (i.e., error rates from 60% to 35%). Here's one of the best realizations from the winning model:

One twist that made the contest especially interesting was that teams could not just submit their predictions — they had to submit the code that made the prediction, in the open, for all their fellow competitors to see. As a result, others were quickly able to adopt successful strategies, and I'm certain the final result was better than it would have been with secret code.

I spent most of yesterday scoring the top entries by generating 100 realizations of the models. This was suggested by the competitors themselves as a way to deal with model variance. This was made a little easier by the fact that all of the top-ranked teams used the same language — Python — and the same type of model: extreme gradient boosted trees. (It's possible that the homogeneity of the top entries was a negative consequence of the open format of the contest... or maybe it just worked better than anything else.)

What now?

There will be more like this. It will have something to do with seismic data. I hope I have something to announce soon.

I (or, preferably, someone else) could write an entire thesis on learnings from this contest. I am busy writing a short article for next month's Leading Edge, so if you're interested in reading more, stay tuned for that. And I'm sure there wil be others.