Training digital scientists

Gulp. My first post in… a while. Life, work, chaos, ideas — it all caught up with me recently. I’ve missed the blog greatly, and felt a regular pang of guilt at letting it gather dust. But I’m back! The 200+ draft posts in my backlog ain’t gonna write themselves. Thank you for returning and reading this one.


Recently I wrote about our continuing adventures in training; since I wrote that post in April, we’ve taught another 166 people. It occurred to me that while teaching scientists to code, we’ve also learned a bit about how to teach, and I wanted to share that too. Perhaps you will be inspired to share your skills, and together we can have exponential impact.

Wanting to get better

As usual, it all started with not knowing how to do something, doing it anyway, then wanting to get better.

We started teaching in 2014 as rank amateurs, both as coders and as teachers. But we soon discovered the ‘teaching tech’ subculture among computational scientists. In particular, we found Greg Wilson and the Software Carpentry movement he started. By that point, it had been around for many, many years. Incredibly, Software Carpentry has helped more than 34,000 researchers ‘go digital’. The impact on science can’t be measured.

Eager as ever, we signed up for the instructor’s course. It was fantastic. The course, taught by Greg Wilson himself, perfectly modeled the thing it was offering to teach you: “Do what I say, and what I do”. This is, of course, critically important in all things, especially teaching. We accepted the content so completely that I’m not even sure we graduated. We just absorbed it and ran with it, no doubt corrupting it on the way. But it works for us.

What to read

TTT_rules.png

I should preface what follows by telling you that I haven’t taken any other courses on the subject of teaching. For all I know, there’s nothing new here. That said, I have never experienced a course like Greg Wilson’s, so either the methods he promotes are not widely known, or they’re widely ignored, or I’ve been really unlucky.

The easiest way to get Greg Wilson’s wisdom is probably to read his book-slash-website, Teaching Tech Together. (It’s free, but you can get a hard copy if you prefer.) It’s really good. You can get the vibe — and much of the most important advice — from the ten Teaching Tech Together rules laid out on the main page of that site (box, right).

As you can probably tell, most of it is about parking your ego, plus most of your knowledge (for now), and orientating everything — every single thing — around the learner.

If you want to go deeper, I also recommend reading the excellent, if rather academic, How Learning Works, by Susan Ambrose (Northeastern University) and others. It’s strongly research-driven, and contains a lot of great advice. In particular, it does a great job of listing the factors that motivate students to learn (and those that demotivate them), and spelling out the various ways in which students acquire mastery of a subject.

How to practice

It goes without saying that you’ll need to teach. A lot. Not surprisingly, we find we get much better if we teach several courses in a short period. If you’re diligent, take a lot of notes and study them before the next class, maybe it’s okay if a few weeks or months go by. But I highly doubt you can teach once or twice a year and get good at it.

Something it took us a while to get comfortable with is what Evan calls ‘mistaking’. If you’re a master coder, you might not make too many mistakes (but your expertise means you will have other problems). If you’re not a master (join the club), you will make a lot of mistakes. Embracing everything as a learning opportunity is less awkward for you, and for the students — dealing with mistakes is a core competency for all programmers.

Reflective practice means asking for, and then acting on, student feedback — every day. We ask students to write it on sticky notes. Reading these back to the class the next morning is a good way to really read it. One of the many benefits of ‘never teach alone’ is always having someone to give you feedback from another teacher’s perspective too. Multi-day courses let us improve in real time, which is good for us and for the students.

Some other advice:

  • Keep the student:instructor ratio to no more than ten; seven or eight is better.

  • Take a packet of orange and a packet of green Post-It notes. Use them for names, as ‘help me’ flags, and for feedback.

  • When teaching programming, the more live coding — from scratch — you can do, the better. While you code, narrate your thought process. This way, students are able to make conections between ideas, code, and mistakes.

  • To explain concepts, draw on a whiteboard. Avoid slides whenever possible.

  • Our co-teacher John Leeman likes to say, “I just showed you something new, what questions do you have?” This beats “Any questions?” for opening the door to engagement.

  • “No-one left behind” is a nice idea, but it’s not always practical. If students can’t devote 100% to the class and then struggle because of it, you owe it to the the others to politely suggest they pick the class up again next time.

  • Devote some time to the practical application of the skills you’re teaching, preferably in areas of the participants’ own choosing. In our 5-day class, we devote a whole day to getting students started on their own projects.

  • Don’t underestimate the importance of a nice space, natural light, good food, and frequent breaks.

  • Recognize everyone’s achievement with a small gift at the end of the class.

  • Learning is hard work. Finish early every day.

Give it a try

If you’re interested in help people learn to code, the most obvious way to start is to offer to assist or co-teach in someone else’s class. Or simply start small, offering a half-day session to a few co-workers. Even if you only recently got started yourself, they’ll appreciate the helping hand. If you’re feeling really confident, or have been coding for a year or two at least, try something bolder — maybe offer a one-day class at a meeting or conference. You will find plenty of interest.

There are few better ways to improve your own skills than to teach. And the feeling of helping people develop a valuable skill is addictive. If you give it a try, let us know how you get on!

Feel superhuman: learning and teaching geocomputing

Diego teaching in Houston in 2018.

Diego teaching in Houston in 2018.

It’s five years since we started teaching Python to geoscientists. To be honest, it might have been premature. At the time, Evan and I were maybe only two years into serious, daily use of Python. But the first class, at the Atlantic Geological Society’s annual meeting in February 2014, was free so the pressure was not too high. And it turns out that only being a step or two ahead of your students can be an advantage. Your ‘expert blind spot’ is partially sighted not completely blind, because you can clearly remember being a noob.

Being a noob is a weird, sometimes very uncomfortable, even scary, feeling for some people. Many of us are used to feeling like experts, at least some of the time. Happy, feeling like a noob is a core competency in programming. Learning new things is a more or less hourly experience for coders. Even a mature language like Python evolves fast enough that it’s hard to keep up. Instead of feeling threatened or exhausted by this, I think the best strategy is to enjoy it. You’ll never be done, there are (way) more questions than answers, and you can learn forever!

One of the bootcamp groups at the Copenhagen hackathon in 2018

One of the bootcamp groups at the Copenhagen hackathon in 2018

This week we’re teaching our 40th course. Last year alone we gave digital superpowers to 325 people, mostly geoscientists, Not all of them learned to code, as such — some people already could, and some found out theydidn’t like it… coding really isn’t for everyone. But I think all of them learned something new about technology, and how it can serve them and their science. I hope all of them look at spreadsheets, and Petrel, and websites differently now. I think most of them want, at some point, to learn more. And everyone is excited about machine learning.

The expanding community of quantitative earth scientists

This year we’ve already spent 50 days teaching, and taught 174 people. Imagine that! I get emotional when I think about what these hundreds of new digital geoscientists and engineers will go and do with their new skills. I get really excited when I see what they are already doing — when they come to hackathons, send us screenshots, or write papers with beautiful figures. If the joy of sharing code and collaborating with peers has also rubbed off on them, there’s no telling where it could lead.

Matt teaching in Aberdeen in October 2018

Matt teaching in Aberdeen in October 2018

The last nine months or so have been an adventure. Teaching is not supposed to be what Agile is about. We’re a consulting company, a technology company. But for now we’re mostly a training company — it’s where we’re needed. And it makes sense... Programming is fundamentally about knowledge sharing. Teaching is about helping, collaborating. It’s perfect for us.

Besides, it’s a privilege and a thrill to meet all these fantastically smart, motivated people and to hear about their projects and their plans. Sometimes I wish it didn’t mean leaving my family in Nova Scotia and flying to Houston and London and Kuala Lumpur and Kalamazoo… but mostly I wish we could do more of it. Especially when we get comments like these:

Given how ‘dry’ programming can be, it was DYNAMIC.”
”Excellent teachers with geoscience background.”
”Great instructors, so so approachable, even for newbies like me.”
”Great course [...] Made me realize what could be done in a short time.”
”My only regret was not taking a class like this sooner.”
”Very positive, feel superhuman.

How many times have you felt superhuman at work recently?

The courses we teach are evolving and expanding in scope. But they all come back to the same thing: growing digital skills in our profession. This is critical because using computers for earth science is really hard. Why? The earth is weird. We’ve spent hundreds of years honing conceptual models, understanding deep time, and figuring out complex spatial relationships.

If data science eats the subsurface without us, we’re all going to get indigestion. Society needs to better understand the earth — for all sorts of reasons — and it’s our duty to build and adopt the most powerful analytical tools available so that we can help.


Learning resources

If you can’t wait to get started, here are some suggestions:

Classroom courses are a big investment in dollars and time, but they can get you a long way really quickly. Our courses are built especially for subsurface scientists and engineers. As far as I know, they are the only ones of their kind. If you think you’d like to take one, talk to us, or look out for a public course. You can find out more or sign up for email alerts here >> https://agilescientific.com/training/

Last thing: I suggest avoiding DataCamp, because of sexual misconduct by an executive, compounded by total inaction, dishonest obfuscation, and basically failing spectacularly. Even their own trainers have boycotted them. Steer clear.

How good is what?

Geology is a descriptive science, which is to say, geologists are label-makers. We record observations by assigning labels to data. Labels can either be numbers or they can be words. As such, of the numerous tasks that machine learning is fit for attacking, supervised classification problems are perhaps the most accessible – the most intuitive – for geoscientists. Take data that already has labels. Build a model that learns the relationships between the data and labels. Use that model to make labels for new data. The concept is the same whether a geologist or an algorithm is doing it, and in both cases we want to test how well our classifier is at doing its label-making.

2d_2class_classifier_left.png

Say we have a classifier that will tell us whether a given combination of rock properties is either a dolomite (purple) or a sandstone (orange). Our classifier could be a person named Sally, who has seen a lot of rocks, or it could be a statistical model trained on a lot of rocks (e.g. this one on the right). For the sake of illustration, say we only have two tools to measure our rocks – that will make visualizing things easier. Maybe we have the gamma-ray tool that measures natural radioactivity, and the density tool that measures bulk density. Give these two measurements to our classifier, and they return to you a label. 

How good is my classifier?

Once you've trained your classifier – you've done the machine learning and all that – you've got yourself an automatic label maker. But that's not even the best part. The best part is that we get to analyze our system and get a handle on how good we can expect our predictions to be. We do this by seeing if the classifier returns the correct labels for samples that it has never seen before, using a dataset for which we know the labels. This dataset is called validation data.

Using the validation data, we can generate a suite of statistical scores to tell us unambiguously how this particular classifier is performing. In scikit-learn, this information compiled into a so-called classification report, and it’s available to you with a few simple lines of code. It’s a window into the behaviour of the classifier that warrants deeper inquiry.

To describe various elements in a classification report, it will be helpful to refer to some validation data:

Our Two-class Classifier (left) has not seen the Validation Data (middle). We can calculate a classification report by Analyzing the intersection of the two (right).

Our Two-class Classifier (left) has not seen the Validation Data (middle). We can calculate a classification report by Analyzing the intersection of the two (right).

Accuracy is not enough

When people straight up ask about a model’s accuracy, it could be that they aren't thinking deeply enough about the performance of the classifier. Accuracy is a measure of the entire classifier. It tells us nothing about how well we are doing with one class compared to another, but there are other metrics that tell us this:

metric_definitions2.png

Support — how many instances there were of that label in the validation set.

Precision — the fraction of correct predictions for a given label. Also known as positive predictive value.

Recall — the proportion of the class that we correctly predicted. Also known as sensitivity.

F1 score — the harmonic mean of precision and recall. It's a combined metric for each class.

Accuracy – the total fraction of correct predictions for all classes. You can calculate this for each class, but it will be the same value for each of the class.   

DIY classification report

If you're like me and you find the grammar of true positives and false negatives confusing, it might help to to treat each class within the classifier as its own mini diagnostic test, and build up data for the classification report row by row. Then it's as simple as counting hits and misses from the validation data and computing some fractions. Inspired by this diagram on the Wikipedia page for the F1 score, I've given both text and pictorial versions of the equations:

dolomite_and_classifier_report_sheet.png

Have a go at filling in the scores for the two classes above. After that, fill in your answers into your own hand-drawn version of the empty table below. Notice that there is only a single score for accuracy for the entire classifier, and that there may be a richer story between the various other scores in the table. Do you want to optimize accuracy overall? Or perhaps you care about maximizing recall in one class above all else? What matters most to you? Should you penalize some mistakes stronger than others?

clf_report.png

When data sets get larger – by either increasing the number of samples, or increasing the dimensionality of the data – even though this scoring-by-hand technique becomes impractical, the implementation stays the same. In classification problems that have more than two classes we can add in a confusion matrix to our reporting, which is something that deserves a whole other post. 

Upon finishing logging a slab of core, if you were to ask Sally the stratigrapher, "How accurate are your facies?", she may dismiss your inquiry outright, or maybe point to some samples she's not completely confident in. Or she might tell you that she was extra diligent in the transition zones, or point to regions where this is very sandy sand, or this is very hydrothermally altered. Sadly, we in geoscience – emphasis on the science – seldom take the extra steps to test and report our own performance. But we totally could.

The ANSWERS. Upside Down. To two Decimal places.

The ANSWERS. Upside Down. To two Decimal places.

Big open data... or is it?

Huge news for data scientists and educators. Equinor, the company formerly known as Statoil, has taken a bold step into the open data arena. On Thursday last week, it 'disclosed' all of its subsurface and production data for the Volve oil field, located in the North Sea. 

What's in the data package?

A lot! The 40,000-file package contains 5TB of data, that's 5,000GB!

volve_data.png

This collection is substantially larger, both deeper and broader, than any other open subsurface dataset I know of. Most excitingly, Equinor has released a broad range of data types, from reports to reservoir models: 3D and 4D seismic, well logs and real-time drilling records, and everything in between. The only slight problem is that the seismic data are bundled in very large files at the moment; we've asked for them to be split up.

Questions about usage rights

Regular readers of this blog will know that I like open data. One of the cornerstones of open data is access, and there's no doubt that Equinor have done something incredible here. It would be preferable not to have to register at all, but free access to this dataset — which I'm guessing cost more than USD500 million to acquire — is an absolutely amazing gift to the subsurface community.

Another cornerstone is the right to use the data for any purpose. This involves the owner granting certain privileges, such as the right to redistribute the data (say, for a class exercise) or to share derived products (say, in a paper). I'm almost certain that Equinor intends the data to be used this way, but I can't find anything actually granting those rights. Unfortunately, if they aren't explicitly granted, the only safe assumption is that you cannot share or adapt the data.

For reference, here's the language in the CC-BY 4.0 licence:

 

Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to:

  1. reproduce and Share the Licensed Material, in whole or in part; and
  2. produce, reproduce, and Share Adapted Material.
 

You can dig further into the requirements for open data in the Open Data Handbook.

The last thing we need is yet another industry dataset with unclear terms, so I hope Equinor attaches a clear licence to this dataset soon. Or, better still, just uses a well-known licence such as CC-BY (this is what I'd recommend). This will clear up the matter and we can get on with making the most of this amazing resource.

More about Volve

The Volve field was discovered in 1993, but not developed until 15 years later. It produced oil and gas for 8.5 years, starting on 12 February 2008 and ending on 17 September 2016, though about half of that came in the first 2 years (see below). The facility was the Maersk Inspirer jack-up rig, standing in 80 m of water, with an oil storage vessel in attendance. Gas was piped to Sleipner A. In all, the field produced 10 million Sm³ (63 million barrels) of oil, so is small by most standards, with a peak rate of 56,000 barrels per day.

Volve production over time in standard m³ (i.e. at 20°C). Multiply by 6.29 for barrels.

Volve production over time in standard m³ (i.e. at 20°C). Multiply by 6.29 for barrels.

The production was from the Jurassic Hugin Formation, a shallow-marine sandstone with good reservoir properties, at a depth of about 3000 m. The top reservoir depth map from the discovery report in the data package is shown here. (I joined Statoil in 1997, not long after this report was written, and the sight of this page brings back a lot of memories.)

 

The top reservoir depth map from the discovery report. The Volve field (my label) is the small closure directly north of Sleipner East, with 15/9-19 well on it.

 

Get the data

To explore the dataset, you must register in the 'data village', which Equinor has committed to maintaining for 2 years. It only takes a moment. You can get to it through this link.

Let us know in the comments what you think of this move, and do share what you get up to with the data!

Why Python beats MATLAB for geophysics

MATLAB — the scientific computing environment which includes a programming language — is amazing. It has probably done as much for the development of new geophysical methods, and for the teaching and learning of geophysics, as any other tool or language. A purely anecdotal assertion, but it's rare to meet a geophysicist who has not at least dabbled in MATLAB, and it is used daily in geophysics labs and classrooms. Geophysics <3 MATLAB.

It's easy to see why — MATLAB definitely has some advantages.

Advantages of MATLAB

  • Matrices. MATLAB implicitly treats arrays as matrices (the name means 'matrix laboratory'). As a result, notation is quite intuitive for mathematicians. For example, a*b means standard matrix multiplication, the dot product. (Slightly confusingly, to get Python-style element-wise multiplication, add a dot: a.*b).
  • Lots of functions. MATLAB has been around for over 30 years, so there are many, many useful functions. Find them either in the core product, in one of the toolboxes, or in MATLAB Central.
  • Simulink. This block-based system design and simulation engine is much-loved by engineers. It allows users to model physical systems in an intuitive, graphical environment.
  • Easy to install. The MATLAB environment is a desktop application, so it is instantly familiar and can be managed under the same processes other software in your machine or organization is managed.
  • MATLAB is widespread in academia. Thanks to one of those generous schemes where software corporations give free software to universities, just because they're awesome and definitely not for any other reason, students and profs have easy and free access to MATLAB. Outside academia, however, you're looking at tens of thousands of dollars.

So far so good, but it's time for geophysics to switch to Python. On the face of it, the language has a lot in common with MATLAB: they're both easy to learn, and both have broad ecosystems that make things like image processing, statistics, and signal processing easy. But Python has some special features that make it a fantastic platform for scientific computing...

Advantages of Python

  • Free and open. Thanks to one of those generous schemes where people make software and let anyone use it for any purpose for free, Python is free! Not only is it free of charge, you are free to inspect and modify the code. Open is awesome. (There are other free alternatives to MATLAB, notably GNU Octave and SciLab.)
  • General purpose. One of the things I love about Python is its flexibility. You can use it in the shell on microtasks, or interactively, or in scripts, or to write server software, or to build enterprise software with GUIs.
  • Namespaces. Everything in MATLAB lives in the main namespace, whereas Python keeps things inherently modular. To access NumPy, say, you have to import it and then use its namespace to get at its contents: numpy.ndarray([1, 2, 3]). This has various advantages, including flexibility, readability, learnability, and portability.
  • Introspection. A powerful idea in Python, introspection means that you (or your code) can see inside every module, class, and function. You can use access private variables, or write code that 'knows' about other objects' interfaces.
  • Portable. You can run your Python code on any architecture, whereas to run MATLAB code you either need all the MATLAB licenses the software uses, or another pricey toolbox to make executables.
  • Popular. Python is the 7th most popular tag in Stack Overflow, whereas MATLAB is the 58th. While programming is not a popularity contest, think of your career, or the careers of your students. Once they graduate, Python will serve them better than MATLAB. There are over 300 jobs for Pythonistas on Stack Overflow Jobs right now. MATLAB jobs? Nine.

So there you have it. It's time to switch to Python. If you're new to programming, there's no contest. I suppose if you're productive in MATLAB, and have access to all the toolboxes, then admittedly it's hard to say you should switch.

But I'll still say it.


I was inspired to write this post after talking to a geophysicist about using programming languages in the classroom, and by the lists in this nice post on pyzo.org. It would be interesting to hear what you use in the classroom — as an instructor or as a student. I know geophysics is being taught with the help of MATLAB (in many places), Java (e.g. at Colorado School of Mines), Mathematica (e.g. by Chris Liner). I wonder if there's anyone using JavaScript, which wouldn't be a terrible choice. Or C++? Or Fortran?? Let us know in the comments!

A coding kitchen in Stavanger

Last week, I travelled to Norway and held a two day session of our Agile Geocomputing Training. We convened at the newly constructed Innovation Dock in Stavanger, and set up shop in an oversized, swanky kitchen. Despite the industry-wide squeeze on spending, the event still drew a modest turnout of seven geoscientists. That's way more traction then we've had in North America lately, so thumbs up to Norway! And, since our training is designed to be very active, a group of seven is plenty comfortable. 

A few of the participants had some prior experience writing code in languages such as Perl, Visual Basic, and C, but the majority showed up without any significant programming experience at all. 

Skills start with syntax and structures 

The first day we covered basic principles or programming, but because Python is awesome, we dive into live coding right from the start. As an instructor, I find that doing live coding has two hidden benefits: it stops me from racing ahead, and making mistakes in the open gives students permission to do the same. 

Using geoscience data right from the start, students learn about key data structures: lists, dicts, tuples, and sets, and for a given job, why they might chose between them. They wrote their own mini-module containing functions and classes for getting stratigraphic tops from a text file. 

Since syntax is rather dry and unsexy, I see the instructor's main role to inspire and motivate through examples that connect to things that learners already know well. The ideal containers for stratigraphic picks is a dictionary. Logs, surfaces, and seismic, are best cast into 1-, 2, and 3-dimensional NumPy arrays, respectively. And so on.

Notebooks inspire learning

We've seen it time and time again. People really like the format of Jupyter Notebooks (formerly IPython Notebooks). It's like there is something fittingly scientific about them: narrative, code, output, repeat. As a learning document, they aren't static — in fact they're meant to be edited. But they aren't so open-ended that learners fail to launch. Professional software developers may not 'get it', but scientists really subscribe do. Start at the start, end at the end, and you've got a complete record of your work. 

You don't get that with the black-box, GUI-heavy software applications we're used to. Maybe, all legitimate work should be reserved for notebooks: self-contained, fully-reproducible, and extensible. Maybe notebooks, in their modularity and granularity, will be the new go-to software for technical work.

Outcomes and feedback

By the end of day two, folks were parsing stratigraphic and petrophysical data from text files, then rendering and stylizing illustrations. A few were even building interactive animations on 3D seismic volumes.  One recommendation was to create a sort of FAQ or cookbook: "How do I read a log?", "How do I read SEGY?", "How do I calculate elastic properties from a well log?". A couple of people of remarked that they would have liked even more coached exercises, maybe even an extra day; a recognition of the virtue of sustained and structured practice.


Want training too?

Head to our courses page for a list of upcoming courses, or more details on how you can train your team


Photographs in this post are courtesy of Alessandro Amato del Monte via aadm on Flickr

On answering questions

On Tuesday I wrote about asking better questions. One of the easiest ways to ask better questions is to hang back a little. In a lecture, the answer to your question may be imminent. Even if it isn't, some thinking or research will help. It's the same with answering questions. Better to think about the question, and maybe ask clarifying questions, than to jump right in with "Let me explain".

Here's a slightly edited example from Earth Science Stack Exchange

I suppose natural gas underground caverns on Earth have substantial volume and gas is in gaseous form there. I wonder how it would look like inside such cavern (with artificial light of course). Will one see a rocky sky at big distance?

The first answer was rather terse:

What is a good answer?

This answer, addressing the apparent misunderstanding the OP (original poster) has about gas being predominantly found in caverns, was the first thing that occurred to me too. But it's incomplete, and has other problems:

  • It's not very patient, and comes across as rather dismissive. Not very welcoming for this new user.
  • The reference is far from being an appropriate one, and seems to have been chosen randomly.
  • It only addresses sandstone reservoirs, and even then only 'typical' ones.

In my own answer to the question, I tried to give a more complete answer. I tried to write down my principles, which are somewhat aligned with the advice given on the Stack Exchange site:

  1. Assume the OP is smart and interested. They were smart and curious enough to track down a forum and ask a question that you're interested enough in to answer, so give them some credit. 
  2. No bluffing! If you find yourself typing something like, "I don't know a lot about this, but..." then stop writing immediately. Instead, send the question to someone you know that can give a better answer then you.
  3. If possible, answer directly and clearly in the first sentence. I usually write it in bold. This should be the closest you can get to a one-word answer, especially if it was a direct question. 
  4. Illustrate the answer with an example. A picture or a numerical example — if possible with working code in an accessible, open source language — go a long way to helping someone get further. 
  5. Be brief but thorough. Round out your answer with some different angles on the question, especially if there's nuance in your answer. There's no need for an essay, so instead give links and references if the OP wants to know more.
  6. Make connections. If there are people in your community or organization who should be connected, connect them.

It's remarkable how much effort people are willing to put into a great answer. A question about detecting dog paw-prints on a pressure pad, posted to the programming community Stack Overflow, elicited some great answers.

The thread didn't end there. Check out these two answers by Joe Kington, a programmer–geoscientist in Houston:

  • One epic answer with code and animated GIFs, showing how to make a time-series of pawprints.
  • A second answer, with more code, introducing the concept of eigenpaws to improve paw recognition.

A final tip: writing informative answers might be best done on Wikipedia or your corporate wiki. Instead of writing a long response to the post, think about writing it somewhere more accessible, and instead posting a link to your answer. 

What do you think makes a good answer to a question? Have you ever received an answer that went beyond helpful? 

On asking questions

If I had only one hour to solve a problem, I would spend up to two-thirds of that hour in attempting to define what the problem is. — Anonymous Yale professor (often wrongly attributed to Einstein)

Asking questions is a core skill for professionals. Asking questions to know, to understand, to probe, to test. Anyone can feel exposed asking questions, because they feel like they should know or understand already. If novices and 'experts' alike have trouble asking questions, if your community or organization does not foster a culture of asking, then there's a problem.

What is a good question?

There are naive questions, tedious questions, ill-phrased questions, questions put after inadequate self-criticism. But every question is a cry to understand the world. There is no such thing as a dumb question. — Carl Sagan

Asking good questions is the best way to avoid the problem of feeling silly or — worse — being thought silly. Here are some tips from my experience in Q&A forums at work and on the Internet:

  1. Do some research. Go beyond a quick Google search — try Google Scholar, ask one or two colleagues for help, look in the index of a couple of books. If you have time, stew on it for a day or two. Do enough to make sure the answer isn't widely known or trivial to find. Once you've decided to ask a network...
  2. Ask your question in the right forum. You will save yourself a lot of time by going taking the trouble to find the right place — the place where the people most likely to be able to help you are. Avoid the shotgun approach: it's not considered good form to cross-post in multiple related forums.
  3. Make the subject or headline a direct question, with some relevant detail. This is how most people will see your question and decide whether to even read the rest of it. So "Help please" or "Interpretation question" are hopeless. Much better is something like "How do I choose seismic attribute parameters?" or "What does 'replacement velocity' mean?".
  4. Provide some detail, and ideally an image. A bit of background helps. If you have a software or programming problem, just enough information needed to reproduce the problem is critical. Tell people what you've read and where your assumptions are coming from. Tell people what you think is going on.
  5. Manage the question. Make sure early comments or answers seem to get your drift. Edit your question or respond to comments to help people help you. Follow up with new questions if you need clarification, but make a whole new thread if you're moving into new territory. When you have your answer, thank those who helped you and make it clear if and how your problem was solved. If you solved your own problem, post your own answer. Let the community know what happened in the end.

If you really want to cultivate your skills of inquiry, here is some more writing on the subject...

Supply and demand

Knowledge sharing networks like Stack Exchange, or whatever you use at work, often focus too much on answers. Capturing lessons learned, for example. But you can't just push knowledge at people — the supply and demand equation has two sides — there has to be a pull too. The pull comes from questions, and an organization or community that pulls, learns.

Do you ask questions on knowledge networks? Do you have any advice for the curious? 


Don't miss the next post, On answering questions.

Corendering attributes and 2D colourmaps

The reason we use colourmaps is to facilitate the human eye in interpreting the morphology of the data. There are no hard and fast rules when it comes to choosing a good colourmap, but a poorly chosen colourmap can make you see features in your data that don't actually exist. 

Colourmaps are typically implemented in visualization software as 1D lookup tables. Given a value, what colour should I plot it? But most spatial data is multi-dimensional, and it's useful to look at more than one aspect of the data at one time. Previously, Matt asked, "how many attributes can a seismic interpreter show with colour on a single display?" He did this by stacking up a series of semi-opaque layers, each one assigned its own 1D colourbar. 

Another way to add more dimensions to the display is corendering. This effectively adds another dimension to the colourmap itself: instead of a 1D colour line for a single attribute, for two attributes we're defining a colour square; for 3 attributes, a colour cube, and so on.

Let's illustrate this by looking at a time-slice through a portion of the F3 seismic volume. A simple way of displaying two attributes is to decrease the opacity of one, and lay it on top of the other. In the figure below, I'm setting the opacity of the continuity to 75% in the third panel. At first glance, this looks pretty good; you can see both attributes, and because they have different hues, they complement each other without competing for visual bandwidth. But the approach is flawed. The vividness of each dataset is diminished; we don't see the same range of colours as we do in the colour palette shown above.

Overlaying one map on top of the other is one way to look at multiple attributes within a scene. It's not ideal however.

Overlaying one map on top of the other is one way to look at multiple attributes within a scene. It's not ideal however.

Instead of overlaying maps, we can improve the result by modulating the lightness of the amplitude image according to the magnitude of the continuity attribute. This time the corendered result is one image, instead of two. I prefer it, because it preserves the original colours we see in the amplitude image. If anything, it seems to deepen the contrast:

The lightness value of the seismic amplitude time slice has been modulated by the continuity attribute.&nbsp;

The lightness value of the seismic amplitude time slice has been modulated by the continuity attribute. 

Such a composite display needs a two-dimensional colormap for a legend. Just as a 1D colourbar, it's also a lookup table; each position in the scene corresponds to a unique pair of values in the colourmap plane.

We can go one step further. Say we want to emphasize only the largest discontinuities in the data. We can modulate the opacity with a non-linear function. In this example, I'm using a sigmoid function:

In order to achieve this effect in most conventional software, you usually have to copy the attribute, colour it black, apply an opacity curve, then position it just above the base amplitude layer. Software companies call this workaround a 'workflow'. 

Are there data visualizations you want to create, but you're stuck with software limitations? In a future post, I'll recreate some cool co-rendering effects; like bump-mapping, and hill-shading.

To view and run the code that I used in creating the images for this post, grab the iPython/Jupyter Notebook.


You can do it too!

If you're in Calgary, Houston, New Orleans, or Stavanger, listen up!

If you'd like to gear up on coding skills and explore the benefits of scientific computing, we're going to be running the 2-day version of the Geocomputing Course several times this fall in select cities. To buy tickets or for more information about our courses, check out the courses page.

None of these times or locations good for you? Consider rounding up your colleagues for an in-house training option. We'll come to your turf, we can spend more than 2 days, and customize the content to suit your team's needs. Get in touch.

Once is never

Image by&nbsp; ZEEVVEEZ &nbsp;on Flickr, licensed  CC-BY . Ten points if you can tell what it is...


Image by ZEEVVEEZ on Flickr, licensed CC-BY. Ten points if you can tell what it is...

My eldest daughter is in grade 5, so she's getting into some fun things at school. This week the class paired off to meet a challenge: build a container to keep hot water hot. Cool!

The teams built their contraptions over the weekend, doubtless with varying degrees of rule interpretation (my daughter's involved HotHands hand warmers, which I would not have thought of), and the results were established with a side-by-side comparison. Someone (not my daughter) won. Kudos was achieved.

But this should not be the end of the exercise. So far, no-one has really learned anything. Stopping here is like grinding wheat but not making bread. Or making dough, but not baking it. Or baking it, but not making it into toast, buttering it, and covering it in Marmite...

Great, now I'm hungry.

The rest of the exercise

How could this experiment be improved?

For starters, there was a critical component missing: control. Adding a vacuum flask at one end, and an uninsulated beaker at the other would have set some useful benchmarks.

There was a piece missing from the end too: analysis. A teardown of the winning and losing efforts would have been quite instructive. Followed by a conversation about the relative merits of different insulators, say. I can even imagine building on the experience. How about a light introduction to thermodynamic theory, or a stab at simple numerical modeling? Or a design contest? Or a marketing plan?

But most important missing piece of all, the secret weapon of learning, is iteration. The crucial next step is to send the class off to do it again, better this time. The goal: to beat the best previous attempt, perhaps even to beat the vacuum flask. The reward: $20k in seed funding and a retail distribution deal. Or a house point for Griffindor.

Einmal ist keinmal, as they say in Germany: Once is never. What can you iterate today?