The machine learning algo zoo

One of the wonderful, but also baffling, things about machine learning is that there are so many ways to do it. At some very high level, most of them do something like this (highlighting some jargon):

  1. The human settles on a task (“Predict lithology”) and finds a bunch of data relevant to that task (say, some well logs A, B, and C). Then the human has to come up with some known instances or examples where these well log data go with those lithology labels.

  2. Stuff the logs into an equation. Not an equation like A + B + C, because there’s nothing to tweak in that equation. The equation needs parameters or coefficients, like \(\alpha A + \beta B + \gamma C\). The machine can tweak those Greek letters to change the output. At first, they’ll be random guesses.

  3. See how the output of that equation, which is the machine’s prediction, compares to the known labels. Come up with another equation whose output is a good measure of how far away the predictions are from the known labels. This distance is called the cost, and the equation to compute it the cost function.

  4. Now that the machine has something to guess (the Greek parameters) and a way to know how well its doing (the cost function), it just needs a way to minimize the cost, or to put it another way, optimize the parameters. This optimization process is called learning.

Together, these steps constitute a learning algorithm. An algorithm with a set of optimized weights is usually referred to simply as a model.

All the algorithms

Every piece of this story is worth a whole blog post on its own, but for today let’s stay high-level.

The problem is that the algorithm zoo can be overwhelming. My post last week was an attempt to compare a lot of regression algorithms, in terms of how they make sense of three synthetic datasets.

Today I’m sharing a Big Giant Spreadsheet™ that attempts to compare some of the most popular ‘shallow’ learning algorithms in terms of their most important characteristics. For example, can they predict probabilities? Are they deterministic? What are the key hyperparameters? And so on.

Here’s a small version of the table (see the links below for other versions):

There’s a PDF version here — and here’s the original spreadsheet.

Eventually, I’m visualizing a poster for the wall. I think it would be nice to have some equations on here. Maybe the plots from the various comparisons too (see last weeks post!). And even more advice, like which ones break when you have too many features. What else would you like to see on there?

Rocks in the Playground

It’s debatable whether neural networks should feature in an introductory course on machine learning. But it’s hard to avoid at least mentioning them, and many people are attracted to machine learning courses because they have heard so much about deep learning. So, reluctantly, we almost always get into neural nets in our Machine learning for geoscientists classes.

Our approach is to build a neural network from scratch, using only standard Python and NumPy data structures — that is, without using a specialist deep-learning framework. The code is all adapted from Gram Ganssle’s awesome Leading Edge tutorial from 2018. I like it because it lays out the components — the data, the activation function, the cost function, the forward pass, and all the steps involved in backpropagation — then combines them into a working neural network.

Figure 2 from Gram Ganssle’s 2018 tutorial in the Leading Edge. Licensed CC BY.

Figure 2 from Gram Ganssle’s 2018 tutorial in the Leading Edge. Licensed CC BY.

One drawback of our approach is that it would be quite fiddly to change some aspects of the network. For example, adding regularization, which almost all networks use, or even just adding another layer, are both beyond the scope of the class. So I like to follow it up with getting the students to build the same network using the scikit-learn library’s multilayer perceptron model. Then we build the same network using the PyTorch framework. (And one could do it in TensorFlow too, of course.) All of these libraries make it easier to play with the various options.

Introducing the Rocky Playground

Now we have another tool — one that makes it even easier to change parameters, add layers, use regularization, and so on. The students don’t even have to write any code! I invite you to play with it too — check out the Rocky Playground, an interactive deep neural network you can see inside.

Part of the user interface. Click on the image to visit the site.

Part of the user interface. Click on the image to visit the site.

This tool is a fork of Google’s well-known Neural Network Playground, as described at the bottom of our tool’s page. We made a few changes:

  • Added several new real and synthetic datasets, with descriptions.

  • There are more activation functions to try, including ELU and Swish.

  • You can change the regularization during training and watch the weights.

  • Anyone can upload their own dataset! (These stay on your computer, they are not uploaded anywhere.)

  • We added an the expression of the network in Python code.

One of the datasets we added is the same shear-sonic prediction dataset we use in the neural network class. So students can watch the same neural net they built (more or less) learn the task in real time. It’s really very cool.

I’ve written before about different expressions of mathematical ideas — words, symbols, annotations, code, etc. — and this is really just a natural extension of that thought. When people can hear and see the same idea in three — or five, or ten — different ways, it sticks. Or at least has a better chance of sticking.

What do you think? Do this tool help you? Could you use it for teaching? If you have suggestions feel free to drop them here in the comments, or submit an issue to the tool’s repo. We’d love your help to make it even more useful.

New virtual training for digital geoscience

Looking to skill up before 2022 hits us with… whatever 2022 is planning? We have quite a few training classes coming up — don’t miss out! Our classes use 100% geoscience data and examples, and are taught exclusively by earth scientists.

We’re also always happy to teach special classes in-house for you and your colleagues. Just get in touch.

Special classes for CSEG in Calgary

Public classes with timing for Americas

  • Geocomputing: week of 22 November

  • Machine Learning: week of 6 December

Public classes with timing for Europe, Africa and Middle East

  • Geocomputing: week of 27 September

  • Machine Learning: week of 8 November

So far we’ve taught 748 people on the Geocomputing class, and 445 on the Machine Learning class — this wave of new digital scientists is already doing fascinating new work and publishing new research. I’m very excited to see what unfolds over the next year or two!

Find out more about Agile’s public classes by clicking this big button:

A (useless) map of geo-mathematics

Most scientific problems involve at least a bit of maths, even if it’s just adding things up or finding averages.

But some problems require quite a bit of maths, like solving an equation, or throwing vectors around, or even a Fourier transform or two. A lot of people switch off at this point.

Yet other problems require a lot of maths. Maybe we need a finite difference model, a volume integral, or a deep neural network. Most of us back away from the problem at this point and look for a collaborator with a lot of equations on their whiteboard.

But it’s pretty hard to find new collaborators right now. And what if you run into these problems a lot? Maybe you need to be the one with the equationy whiteboard!

What then? Where do you start? We need a map!

A roadmap for learning…

…is what I set out to draw. I failed. Possibly there exists a map, with START HERE in one corner and a whiteboard full of equations in the other. But I doubt it.

I ended up drawing this:

math_map_2400px.png

It was fun to draw, but I highly doubt that it’s any practical use. The conclusion I came to is: there is no path. In fact, this artificially flattened projection of the n-dimensional mathiverse — no doubt reflecting my own weak grasp on half of these topics — is probably a unique, personal perspective. It reflects my interests and my nonlinear journey from A-level calculus (which I loved) to undergraduate maths (which I found very hard) to… whatever half-truths I know today.

But I want to learn, where do I start?

So if there is no path, what can you do to improve? Where should you start? How can you learn? Easy: follow your nose. Start with a project — something that interests you, something you’ll stick with. Maybe it’s a spreadsheet you have, or a plot you want to make. When you get to the maths, as you inevitably will, dig in. Read around. Google things. Get a whiteboard.

As an example, when I worked on the tricky (and unsolved!) task of recovering data from pseudocolour images, my maths journey looked something like this:

Images ➡ clustering ➡ RGB vectors ➡ distance metrics ➡ k-d trees ➡ graphs ➡ Hamiltonian pathsTSP solvers

Admittedly, there’s some computer science in there too, but hey, this is applied maths.

As I described recently in Illuminated equations, there are several ways to serve, and consume, mathematical ideas: words, pictures, plots, symbols, annotations, and code (and probably some others). Seek out sources that give you three or more of these things. For me, being able to run some code makes a huge difference. Indeed, learning Python has directly led to me reading entire books on graph theory, linear algebra, deep learning, Fourier transforms, and all sorts of other things.

Well, learning Python and watching Numberphile.

I think it’s a myth that you have to be good at maths to learn to code. Instead, I think learning to code can — if you want — help make you good, or at least better, at maths. By giving you a way to try things without fully knowing what you are doing (after all, np.fft.fft(x) is pretty easy to type!), code gives you a way to peek at the answer. If you do it often enough, and follow up with some reading, understanding follows eventually.

Which programming language should you learn first?

The question I get asked most often is:

I want to learn to code, where should I start?

To which there’s really no perfect answer. It depends on a lot of things… Why do you want to learn to program? What domain are you in? Have you tried before? Do you like computers? Do your colleagues use anything in particular?

Undeterred by the futility, and inspired by an awesome blog post on freecodecamp.org, which advises you to learn JavaScript (not terrible advice), I thought I’d try to answer the question — for scientists. I adapted a rather old decision tree in that blog post to the specific needs of scientists. Here is the latest version:

which.png

Yes, it does have a lot of Python on it. Yes, I am biased.

These sorts of things are necessarily rather one-dimensional. In an effort to give general advice, all the interesting corners are sanded down. For instance, there is definitely some domain specificity to languages. In the subsurface domain, a lot of geophysicists learned their craft in MATLAB, but today are excited about Julia. I think environmental folks are more into R. Geologists mostly still like coloured pencils best. And of course reservoir engineers mastered VBA years ago. Clearly, if you’re learning to code to start a postgraduate degree, you should probably find out what language others in your lab are using before you crack into that old copy of FORTRAN in a Weekend.*

Anyway, this decision tree thing provoked quite a bit of discussion on Software Underground and Twitter. Some people felt challenged, although my purpose was to suggest a starting point for people, not to say “Never touch Java” (although, seriously, never touch Java). It’s natural — learning to master a language takes years and people are sensitive to perceived criticism of how they spent their time. But this misses the point a bit — programming is really just about getting things done, preferably in an open language (<cough> not MATLAB). So what’s the quickest path for a new programmer to start getting things done?

I appreciated this thoughtful comment from Kris Kuhlman:

I think it worked out more or less that way for me. I learned a bit of BASIC as a 12-year-old, and knew enough assembly to crash a BBC Micro. Then I learned awk in 1993, and used it for basically everything — including many things it certainly was not designed for. I tried and failed to learn Java in 2002, instead picking up MATLAB… which led to Python in about 2008. I was a slow learner though; it took years to be convinced that I needed NumPy. (Yes, you can load seismic as a list of lists.)

In the end, you need several tools in your belt. Several people pointed out that SQL (a so-called ‘domain specific language’ rather than a full-blown programming language) is incredibly useful to know. I think you could say the same for HTML and maybe even XML — or perhaps JSON these days. Then again, maybe these stretch the definition of ‘programming language’ a bit too far. Besides, if you write code, you’ll meet them eventually.

In the end, the point is to get things done. Every language on that tree will enable you to get things done. (Admittedly, Scratch, Processing, ChucK have rather narrow domains.) Fortran has been around for 70 years (not a typo) and is still in the top 20 languages. So don’t sweat it — if Kris is right, you’ll need to learn 2 languages before one sticks anyway.


* There is no such book, lol.

Three books about machine learning

I recently finished a Udemy machine learning course, and wrote on LinkedIn afterwards: “While I am no [machine learning] expert, this is one step on the way to better skills with [Python]”. So which other steps have I taken along that route to learn more about machine learning?

Here I share my thoughts on three books; two of which I have read cover to cover, and the third which I can hardly put down! When students in our machine learning class ask about books, these are the ones we recommend.

The Hundred-Page Machine Learning Book

Andriy Burkov (2019). Self-published, 141 p, ISBN 978-1-9995795-0-0. List price USD35. $30.83 at Amazon.com, £25.27 at Amazon.co.uk.

Andriy Burkov states right at the start that “[This] book is distributed on the read first, buy later principle.” That is the first time I’ve seen this in a book despite the fact you can try a car before buying or visit a house before taking out a mortgage.

This was the first book I read that is fully dedicated to machine learning. I knew a little about the topic beforehand, but wasn’t yet ready to use any machine learning algorithm at that point, so this was a perfect introduction to the what, the why and the how of machine learning. The mathematics are introduced and explained in a way that is accessible without being overwhelming, although I acknowledge that this is of course a very subjective comment.

When I turned the last page of this book (and there are a few more than 100), I was even keener to explore further, and I still refer back to this book when I want a quick summary of a machine learning concept.

Data Science from Scratch

Joel Grus (2015). O’Reilly, 311 p, ISBN 978-1-492-04113-9. List price USD 41.99 at O’Reilly. $38.85 at Amazon.com, £27.56 at Amazon.co.uk.

I read the 1st edition of this book, which uses Python 2.7 but often refers to Python 3.4; the 2nd edition (2019) uses Python 3.6 throughout.

Joel Grus, of Ten Essays on Fizz Buzz fame amongst many other achievements, has a knack of breaking problems down to their constituent parts and gracefully rebuilding a solution. While I sometimes struggled with the level of mathematics he’s comfortable with, I never felt that I couldn’t follow his journey. This book really gave me the sequence of steps in data science, and a fantastic resource to refer back to whenever an algorithm seems too opaque to me.

Introduction to Machine Learning with Python: A Guide for Data Scientists

Andreas C. Müller and Sarah Guido (2017). O’Reilly, 384 p, ISBN 978-1-449-36941-5. $40.00 at Amazon.com, £31.45 at Amazon.co.uk

At the time of writing I am halfway through this book but I’ve already gone through Chapter 2 twice: once with the book and a second time to practice with different data sets. This is symptomatic of my experience with this book so far: it’s totally addictive. Tremendously well explained, building on the power of Jupyter notebooks thanks to all the code being available on GitHub, always explaining and illustrating the effects of only the important hyperparameters in each algorithm — this is fast turning into my go-to companion for machine learning.

If you only buy one machine learning book, or don’t know where to start, this is probably the one to go with.

We all have different technical backgrounds and abilities, and as mathematics figures prominently in the implementation of all machine learning solutions, it’s not the most approachable of subjects. I’d love to hear your comments about books you would recommend to other scientists getting started in machine learning.


These prices are Amazon's discounted prices and are subject to change. The links contain a tag that earns us a small commission, but does not change the price to you. You can almost certainly buy these books elsewhere. 

The images on this page are copyright of their respective owners and are used here in accordance with fair use doctrine.

Illuminated equations

Last year I wrote a post about annotated equations, and why they are useful teaching tools. But I never shared all the cool examples people tweeted back, and some of them are too good not to share.

Let’s start with this one from Andrew Alexander that he uses to explain complex number notation:

illuminated_complex.png

Paige Bailey tweeted some examples of annotated equations and code from the reinforcement learning tutorial, Building a Powerful DQN in TensorFlow by Sebastian Theiler. Here’s one of the algorithms, with slightly muted annotations:

Illuminated_code_Theiler_edit.jpeg.png

Finally, Jesper Dramsch shared a new one today (and reminded me that I never finished this post). It links to Edward Raff’s book, Inside Deep Learning, which has some nice annotations, e.g. expressing a fundamental idea of machine learning:

Raff_cost_function.png

Dynamic explication

The annotations are nice, but it’s quite hard to fully explain an equation or algorithm in one shot like this. It’s easier to do, and easier to digest, over time, in a presentation. I remember a wonderful presentation by Ross Mitchell (then U of Calgary) at the also brilliant lunchtime mathematics lectures that Shell used to sponsor in Calgary. He unpeeled time-frequency analysis, especially the S transform, and I still think about his talk today.

What Ross understood is that the learner really wants to see the maths build, more or less from first principles. Here’s a nice example — admittedly in the non-ideal medium of Twitter: make sure you read the whole thread — from Darrel Francis, a cardiologist at Imperial Colege, London:

A video is even more dynamic of course. Josef Murad shared a video in which he derives the Navier–Stokes equation:

In this video, Grant Sanderson, perhaps the equation explainer nonpareil, unpacks the Fourier transform. He creeps up on the equation, starting instead with building the intuition around frequency decomposition:

If you’d like to try making this sort of thing, you might like to know that Sanderson’s Python software, manim, is open source.


Multi-modal explication

Sanderson illustrates nicely that the teacher has several pedagogic tools at their disposal:

  • The spoken word.

  • The written word, especially the paragraph describing a function.

  • A symbolic representation of the function.

  • A graphical representation of the function.

  • A code representation of the function, which might also have a docstring, which is a formal description of the code, its inputs, and its outputs. It might also produce the graphical representation.

  • Still other modes, e.g. pseudocode (see Theiler’s example, above), a cartoon (esssentially a ‘pseudofigure’),

Virtually all of these things are, or can be dynamic (in a video, on a whiteboard) and annotated. They approach the problem from different directions. The spoken and written descriptions should be rigorous and unambiguous, but this can make them clumsy. Symbolic maths can be useful to those that can read it, but authors must take care to define symbols properly and to be consistent. The code representation must be strict (assuming it works), but might be hard for non-programmers to parse. Figures help most people, but are more about building intuition than providing the detail you might need for implementation, say. So perhaps the best explanations have several modes of explication.

In this vein of multi-modal explication, Jeremy Howard shared a nice example from his book, Deep learning for coders, of combining text, symbolic maths, and code:

illuminated_jeremy_howard.png

Eventually I settled on calling these things, that go beyond mere annotation, illuminated equations (not to directly compare them to the beautiful works of devotion produced by monks in the 13th century, but that’s the general idea). I made an attempt to describe linear regression and the neural network equation (not sure what else to call it!) in a series of tweets last year. Here’s the all-in-one poster version (as a PDF):

linear_inversion_page.png

There’s nothing intuitive about physics, maths, or programming. The more tricks we have for spreading intuition about these important scientific tools, the better. I think there’s something in illuminated equations for teachers to practice — and students too. In fact, Jackie Caplan-Auerbach decribes coaching her students in creating ‘equation dictionaries’ in her geophysics classes. I think this is a wonderful idea.

If you’re teaching or learning maths, I’d love to hear your thoughts. Are these things worth the effort to produce? Do you have any favourite examples to share?

Learn to code in 2020

Happy New Year! I hope 2020 is going well so far and that you have audacious plans for the new decade.

Perhaps among your plans is learning to code — or improving your skills, if you’re already on the way. As I wrote in 2011, programming is more than just writing code: it’s about learning a new way to think, not just about data but about problems. It’s also a great way to quickly raise your digital literacy — something most employers value more each year. And it’s fun.

We have three public courses planned for 2020. We’re also planning some public hackathons, which I’ll write about in the next week or three. Meanwhile, here’s the lowdown on the courses:

Lausanne in March

Rob Leckenby will be teaming up with Valentin Metraux of Geo2X to teach this 3-day class in Lausanne, Switzerland. We call it Intro to Geocomputing and it’s 100% suitable for beginners and people with less than a year or so of experience in Python. By the end, you’ll be able to read and write Python, write functions, read files, and run Jupyter Notebooks. More info here.

Amsterdam in June

If you can’t make it to Lausanne, we’ll be repeating the Intro to Geocomputing class in Amsterdam, right before the Software Underground’s Amstel Hack hackathon event (and then the EAGE meeting the following week). Check out the Software Underground Slack — look for the #amstel-hack-2020 channel — to find out more about the hackathon. More info here.

Houston in June

There’s also a chance to take the class in the US. The week before AAPG (which clashes with EAGE this year, which is very weird), we’ll be teaching not one but two classes: Intro to Geocomputing, and Intro to Machine Learning. You can take either one, or both — but be aware that the machine learning class assumes you know the basics of Python and NumPy. More info here.

In-house options

We still teach in-house courses (last year we taught 37 of them!). If you have more than about 5 people to train, then in-house is probably the way to go; we’d be delighted to work with you to figure out the best curriculum for your team.

Most of our classes fall into one of the following categories:

  • Beginner classes like the ones described above, usually 3 days.

  • Machine learning classes, like the Houston class above, usually 2 or 3 days.

  • Other more advanced classes built around engineering skills (object-oriented programming, testing, packaging, and so on), usually 3 days.

  • High-level digital literacy classes for middle to upper management, usually 1 day.

We also run hackathons and design sprints for teams that are trying to solve tricky problems in the digital subsurface, but those are another story…

Get in touch if you want more info about any of these.


Whatever you want to learn in 2020, give it everything you have. Schedule time for it. The discipline will pay off. If we can help or support you somehow, please let us know — above all, we want you to succeed.

Training digital scientists

Gulp. My first post in… a while. Life, work, chaos, ideas — it all caught up with me recently. I’ve missed the blog greatly, and felt a regular pang of guilt at letting it gather dust. But I’m back! The 200+ draft posts in my backlog ain’t gonna write themselves. Thank you for returning and reading this one.


Recently I wrote about our continuing adventures in training; since I wrote that post in April, we’ve taught another 166 people. It occurred to me that while teaching scientists to code, we’ve also learned a bit about how to teach, and I wanted to share that too. Perhaps you will be inspired to share your skills, and together we can have exponential impact.

Wanting to get better

As usual, it all started with not knowing how to do something, doing it anyway, then wanting to get better.

We started teaching in 2014 as rank amateurs, both as coders and as teachers. But we soon discovered the ‘teaching tech’ subculture among computational scientists. In particular, we found Greg Wilson and the Software Carpentry movement he started. By that point, it had been around for many, many years. Incredibly, Software Carpentry has helped more than 34,000 researchers ‘go digital’. The impact on science can’t be measured.

Eager as ever, we signed up for the instructor’s course. It was fantastic. The course, taught by Greg Wilson himself, perfectly modeled the thing it was offering to teach you: “Do what I say, and what I do”. This is, of course, critically important in all things, especially teaching. We accepted the content so completely that I’m not even sure we graduated. We just absorbed it and ran with it, no doubt corrupting it on the way. But it works for us.

What to read

TTT_rules.png

I should preface what follows by telling you that I haven’t taken any other courses on the subject of teaching. For all I know, there’s nothing new here. That said, I have never experienced a course like Greg Wilson’s, so either the methods he promotes are not widely known, or they’re widely ignored, or I’ve been really unlucky.

The easiest way to get Greg Wilson’s wisdom is probably to read his book-slash-website, Teaching Tech Together. (It’s free, but you can get a hard copy if you prefer.) It’s really good. You can get the vibe — and much of the most important advice — from the ten Teaching Tech Together rules laid out on the main page of that site (box, right).

As you can probably tell, most of it is about parking your ego, plus most of your knowledge (for now), and orientating everything — every single thing — around the learner.

If you want to go deeper, I also recommend reading the excellent, if rather academic, How Learning Works, by Susan Ambrose (Northeastern University) and others. It’s strongly research-driven, and contains a lot of great advice. In particular, it does a great job of listing the factors that motivate students to learn (and those that demotivate them), and spelling out the various ways in which students acquire mastery of a subject.

How to practice

It goes without saying that you’ll need to teach. A lot. Not surprisingly, we find we get much better if we teach several courses in a short period. If you’re diligent, take a lot of notes and study them before the next class, maybe it’s okay if a few weeks or months go by. But I highly doubt you can teach once or twice a year and get good at it.

Something it took us a while to get comfortable with is what Evan calls ‘mistaking’. If you’re a master coder, you might not make too many mistakes (but your expertise means you will have other problems). If you’re not a master (join the club), you will make a lot of mistakes. Embracing everything as a learning opportunity is less awkward for you, and for the students — dealing with mistakes is a core competency for all programmers.

Reflective practice means asking for, and then acting on, student feedback — every day. We ask students to write it on sticky notes. Reading these back to the class the next morning is a good way to really read it. One of the many benefits of ‘never teach alone’ is always having someone to give you feedback from another teacher’s perspective too. Multi-day courses let us improve in real time, which is good for us and for the students.

Some other advice:

  • Keep the student:instructor ratio to no more than ten; seven or eight is better.

  • Take a packet of orange and a packet of green Post-It notes. Use them for names, as ‘help me’ flags, and for feedback.

  • When teaching programming, the more live coding — from scratch — you can do, the better. While you code, narrate your thought process. This way, students are able to make conections between ideas, code, and mistakes.

  • To explain concepts, draw on a whiteboard. Avoid slides whenever possible.

  • Our co-teacher John Leeman likes to say, “I just showed you something new, what questions do you have?” This beats “Any questions?” for opening the door to engagement.

  • “No-one left behind” is a nice idea, but it’s not always practical. If students can’t devote 100% to the class and then struggle because of it, you owe it to the the others to politely suggest they pick the class up again next time.

  • Devote some time to the practical application of the skills you’re teaching, preferably in areas of the participants’ own choosing. In our 5-day class, we devote a whole day to getting students started on their own projects.

  • Don’t underestimate the importance of a nice space, natural light, good food, and frequent breaks.

  • Recognize everyone’s achievement with a small gift at the end of the class.

  • Learning is hard work. Finish early every day.

Give it a try

If you’re interested in help people learn to code, the most obvious way to start is to offer to assist or co-teach in someone else’s class. Or simply start small, offering a half-day session to a few co-workers. Even if you only recently got started yourself, they’ll appreciate the helping hand. If you’re feeling really confident, or have been coding for a year or two at least, try something bolder — maybe offer a one-day class at a meeting or conference. You will find plenty of interest.

There are few better ways to improve your own skills than to teach. And the feeling of helping people develop a valuable skill is addictive. If you give it a try, let us know how you get on!

Feel superhuman: learning and teaching geocomputing

Diego teaching in Houston in 2018.

Diego teaching in Houston in 2018.

It’s five years since we started teaching Python to geoscientists. To be honest, it might have been premature. At the time, Evan and I were maybe only two years into serious, daily use of Python. But the first class, at the Atlantic Geological Society’s annual meeting in February 2014, was free so the pressure was not too high. And it turns out that only being a step or two ahead of your students can be an advantage. Your ‘expert blind spot’ is partially sighted not completely blind, because you can clearly remember being a noob.

Being a noob is a weird, sometimes very uncomfortable, even scary, feeling for some people. Many of us are used to feeling like experts, at least some of the time. Happy, feeling like a noob is a core competency in programming. Learning new things is a more or less hourly experience for coders. Even a mature language like Python evolves fast enough that it’s hard to keep up. Instead of feeling threatened or exhausted by this, I think the best strategy is to enjoy it. You’ll never be done, there are (way) more questions than answers, and you can learn forever!

One of the bootcamp groups at the Copenhagen hackathon in 2018

One of the bootcamp groups at the Copenhagen hackathon in 2018

This week we’re teaching our 40th course. Last year alone we gave digital superpowers to 325 people, mostly geoscientists, Not all of them learned to code, as such — some people already could, and some found out theydidn’t like it… coding really isn’t for everyone. But I think all of them learned something new about technology, and how it can serve them and their science. I hope all of them look at spreadsheets, and Petrel, and websites differently now. I think most of them want, at some point, to learn more. And everyone is excited about machine learning.

The expanding community of quantitative earth scientists

This year we’ve already spent 50 days teaching, and taught 174 people. Imagine that! I get emotional when I think about what these hundreds of new digital geoscientists and engineers will go and do with their new skills. I get really excited when I see what they are already doing — when they come to hackathons, send us screenshots, or write papers with beautiful figures. If the joy of sharing code and collaborating with peers has also rubbed off on them, there’s no telling where it could lead.

Matt teaching in Aberdeen in October 2018

Matt teaching in Aberdeen in October 2018

The last nine months or so have been an adventure. Teaching is not supposed to be what Agile is about. We’re a consulting company, a technology company. But for now we’re mostly a training company — it’s where we’re needed. And it makes sense... Programming is fundamentally about knowledge sharing. Teaching is about helping, collaborating. It’s perfect for us.

Besides, it’s a privilege and a thrill to meet all these fantastically smart, motivated people and to hear about their projects and their plans. Sometimes I wish it didn’t mean leaving my family in Nova Scotia and flying to Houston and London and Kuala Lumpur and Kalamazoo… but mostly I wish we could do more of it. Especially when we get comments like these:

Given how ‘dry’ programming can be, it was DYNAMIC.”
”Excellent teachers with geoscience background.”
”Great instructors, so so approachable, even for newbies like me.”
”Great course [...] Made me realize what could be done in a short time.”
”My only regret was not taking a class like this sooner.”
”Very positive, feel superhuman.

How many times have you felt superhuman at work recently?

The courses we teach are evolving and expanding in scope. But they all come back to the same thing: growing digital skills in our profession. This is critical because using computers for earth science is really hard. Why? The earth is weird. We’ve spent hundreds of years honing conceptual models, understanding deep time, and figuring out complex spatial relationships.

If data science eats the subsurface without us, we’re all going to get indigestion. Society needs to better understand the earth — for all sorts of reasons — and it’s our duty to build and adopt the most powerful analytical tools available so that we can help.


Learning resources

If you can’t wait to get started, here are some suggestions:

Classroom courses are a big investment in dollars and time, but they can get you a long way really quickly. Our courses are built especially for subsurface scientists and engineers. As far as I know, they are the only ones of their kind. If you think you’d like to take one, talk to us, or look out for a public course. You can find out more or sign up for email alerts here >> https://agilescientific.com/training/

Last thing: I suggest avoiding DataCamp, because of sexual misconduct by an executive, compounded by total inaction, dishonest obfuscation, and basically failing spectacularly. Even their own trainers have boycotted them. Steer clear.