EarthArXiv wants your preprints

eartharxiv.png

If you're into science, and especially physics, you've heard of arXiv, which has revolutionized how research in physics is shared. BioarXiv, SocArXiv and PaleorXiv followed, among others*.

Well get excited, because today, at last, there is an open preprint server especially for earth science — EarthArXiv has landed! 

I could write a long essay about how great this news is, but the best way to get the full story is to listen to two of the founders — Chris Jackson (Imperial College London and fellow University of Manchester alum) and Tom Narock (University of Maryland, Baltimore) — on Undersampled Radio this morning:

Congratulations to Chris and Tom, and everyone involved in EarthArXiv!

  • Friedrich Hawemann, ETH Zurich, Switzerland
  • Daniel Ibarra, Earth System Science, Standford University, USA
  • Sabine Lengger, University of Plymouth, UK
  • Andelo Pio Rossi, Jacobs University Bremen, Germany
  • Divyesh Varade, Indian Institute of Technology Kanpur, India
  • Chris Waigl, University of Alaska Fairbanks, USA
  • Sara Bosshart, International Water Association, UK
  • Alodie Bubeck, University of Leicester, UK
  • Allison Enright, Rutgers - Newark, USA
  • Jamie Farquharson, Université de Strasbourg, France
  • Alfonso Fernandez, Universidad de Concepcion, Chile
  • Stéphane Girardclos, University of Geneva, Switzerland
  • Surabhi Gupta, UGC, India

Don't underestimate how important this is for earth science. Indeed, there's another new preprint server coming to the earth sciences in 2018, as the AGU — with Wiley! — prepare to launch ESSOAr. Not as a competitor for EarthArXiv (I hope), but as another piece in the rich open-access ecosystem of reproducible geoscience that's developing. (By the way, AAPG, SEG, SPE: you need to support these initiatives. They want to make your content more relevant and accessible!)

It's very, very exciting to see this new piece of infrastructure for open access publishing. I urge you to join in! You can submit all your published work to EarthArXiv — as long as the journal's policy allows it — so you should make sure your research gets into the hands of the people who need it.

I hope every conference from now on has an EarthArXiv Your Papers party. 


* Including snarXiv, don't miss that one!

Conversation not discussion

It's a while since we had a 'conferences are broken' rant on the Agile blog!

Five or six of the sessions at this year's conference were... different. I already mentioned the Value In Geophysics session, which was a cross between a regular series of talks and a panel discussion. I went to another, The modern geoscientist, which was structured the same way. A third one, Fundamentals of Professional Career Branding, was a mini workshop with Jackie Rafter of Higher Landing. There were at least a couple of other such sessions.

It's awesome to see the societies experimenting with something outside the usual plethora of talks and posters. I hope they were well received, because we need more of this in our discipline, now more than ever. If you went to one and enjoyed it, please let the organizers know.

But... the sessions — especially the panel discussion sessions — lacked something. One thing really:

The sessions we saw were nowhere near participatory enough. Not even close.

The 'expert-panel-enlightens-audience' pattern is slowing us down, perpetuating broken models of leadership and hierarchy. There isn't an expert in Calgary or the universe that knows how or when this downturn is going to end, or what we need to do to improve our chances of continuing to contribute to society and make a living in our profession. So please, stop throwing people up on a stage, making them give 5 minute presentations, and occasionally asking for questions from the audience. That is nothing like a discussion. Tune in to a political debate show to see what those look like: rapid-fire, punchy, controversial. In short: interesting. And, from an organizer's point of view, really hard, which is why we should stop.

Real conversation

What I think is really needed right now, more than half-baked expert discussion, is conversation. Conversations happen between small groups of people, all sitting on the same plane, around a table, with napkins to draw on and time to draw on them. They connect people and spread awesome ideas like viruses. What's more, great conversations have outcomes.

I don't want to claim that Agile has all this figured out, but we have demonstrated various ways of connecting scientists in meaningful ways and with lasting outcomes. We've also written extensively on the subject (e.g. here and here and here and here). Other verticals have conducted many more experiments, and documented the results. Humans know how to do this.

So there's no excuse — it's not too dramatic to call the current 'situation' a crisis in our profession in Canada — so we need to get beyond tinkering at the edges and half-hearted attempts at change. Our societies need to pay attention to what's needed, and get on with making it happen.

Still more ranting...

We talked about this topic at some length on the Undersampled Radio podcast yesterday. Here's the uncut video version:

The quick green forsterite jumped over the lazy dolomite

The best-known pangram — a sentence containing every letter of the alphabet —  is probably

 
The quick brown fox jumped over the lazy dog.

There are lots of others of course. If you write like James Joyce, there are probably an infinite number of others. The point is to be short, and one of the shortest, with only 29 letters (!), even has a geological flavour:

 
Sphinx of black quartz, judge my vow.

I know what you're thinking: Cool, but what's the shortest set of mineral names that uses all the letters of the alphabet? What logophiliac geologist would not wonder the same thing?

Well, we posed this question in the most recent "Riddle me this" segment on the Undersampled Radio podcast. This blog post is my solution.


The set cover problem

Finding pangrams in a list of words amounts to solving the classical set cover problem:

 
Given a set of elements \(\{U_1, U_2,\ldots , U_n\}\) (called the ‘universe’) and a collection \(S\) of \(m\) sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of \(S\) whose union equals (or ‘covers’) the universe.

Our universe is the alphabet, and our \(S\) is the list of \(m\) mineral names. There is a slight twist in our case: the set cover problem wants the smallest subset of \(S\) — the fewest members. But in this problem, I suspect there are several 4-word solutions (judging from my experiments), so I want the smallest total size of the members of the subset. That is, I want the fewest total letters in the solution.

The solution

The set cover problem was shown to be NP-complete in 1972. What does this mean? It means that it's easy to tell if you have an answer (do you have all the letters of the alphabet?), but the only way to arrive at a solution is — to oversimplify massively — by brute force. (If you're interested in this stuff, this edition of the BBC's In Our Time is one of the best intros to P vs NP and complexity theory that I know of.)

Anyway, the point is that if we find a better way than brute force to solve this problem, then we need to write a paper about it immediately, claim our prize, collect our turkey, then move to a sunny tax haven with good water and double-digit elevation.

So, this could take a while: there are over 95 billion ways to draw 3 words from my list of 4600 mineral names. If we need 4 minerals, there are 400 trillion combinations... and a quick calculation suggests that my laptop will take a little over 50 years to check all the combinations. 

Can't we speed it up a bit?

Brute force is one thing, but we don't need to be brutish about it. Maybe we can think of some strategies to give ourselves a decent chance:

  • The list is alphabetically sorted, so randomize the list before searching. (I did this.)
  • Guess some 'useful' minerals and ensure that you get to them. (I did this too, with quartz.)
  • Check there are at least 26 letters in the candidate words, and (if it's only records we care about) no more than 44, because I have a solution with 45 letters (see below).
  • We could sort the list into word length order. That way we search shorter things first, so we should get shorter lists (which we want) earlier.
  • My solution does not depend much on Python's set type. Maybe we could do more with set theory.
  • Before inspecting the last word in each list, we could make sure it contains at least one letter that's so far missing.

So far, the best solution I've come up with so far has 45 letters, so there's plenty of room for improvement:

 
'quartz', 'kvanefjeldite', 'abswurmbachite', 'pyroxmangite'

My solution is in this Jupyter Notebook. Please put me out of my misery by improving on it.

Two new short courses in Calgary

We're running two one-day courses in Calgary for the CSPG Spring Education Week. One of them is a bit... weird, so I thought I'd try to explain what we're up to.

Both classes run from 8:30 till 4:00, and both of them cost just CAD 425 for CSPG members. 

Get introduced to Python

The first course is Practical programming for geoscientists. Essentially a short version of our 2 to 3 day Creative geocomputing course, we'll take a whirlwind tour through the Python programming language, then spend the afternoon looking at some basic practical projects. It might seem trivial, but leaving with a machine fully loaded with all the tools you'll need, plus long list of resources and learning aids, is worth the price of admission alone.

If you've always wanted to get started with the world's easiest-to-learn programming language, this is the course you've been waiting for!

Hashtag geoscience

This is the weird one. Hashtag geoscience: communicating geoscience in the 21st century. Join me, Evan, Graham Ganssle (my co-host on Undersampled Radio) — and some special guests — for a one-day sci comm special. Writing papers and giving talks is all so 20th century, so let's explore social media, blogging, podcasting, open access, open peer review, and all the other exciting things that are happening in scientific communication today. These tools will not only help you in your job, you'll find new friends, new ideas, and you might even find new work.

I hope a lot of people come to this event. For one, it supports the CSPG (we're not getting paid, we're on expenses only). Secondly, it'll be way more fun with a crowd. Our goal is for everyone to leave burning to write a blog, record a podcast, or at least create a Twitter account. 


One of our special guests will be young-and-famous geoscience vlogger Dr Chris. Coincidentally, we just interviewed him on Undersampled Radio. Here's the uncut video version; audio will be on iTunes and Google Play in a couple of days:

Nowhere near Nyquist

This is a guest post by my Undersampled Radio co-host, Graham Ganssle.

You can find Gram on the webLinkedInTwitterGitHub

This post is a follow up to Tuesday's post about the podcast — you might want to read that first.


Undersampled Radio was born out of a dual interest in podcasting. Matt and I both wanted to give it a shot, but we didn’t know what to talk about. We still don’t. My philosophy on UR is that it’s forumesque; we have a channel on the Software Underground where we solicit ideas, draft guests, and brainstorm about what should be on the show. We take semi-formed thoughts and give them a good think with a guest who knows more than us. Live and uncensored.

Since with words I... have not.. a way... the live nature of the show gives it a silly, laid back attitude. We attempt to bring our guests out of interview mode by asking about their intellectual curiosities in addition to their professional interests. Though the podcast releases are lightly edited, the YouTube live-stream recordings are completely raw. For a good laugh at our expense you should certainly watch one or two.

Techie deets

Have a look at the command center. It’s where all the UR magic (okay, digital trickery) happens in pre- and post-production.

It's a mess but it works!

It's a mess but it works!

We’ve migrated away from the traditional hardware combination used by most podcasters. Rather than use the optimum mic/mixer/spaghetti-of-cables preferred by podcasting operations which actually generate revenue, we’ve opted to use less hardware and do a bit of digital conditioning on the back end. We conduct our interviews via YouTube live (aka Google Hangouts on Air) then on my Ubuntu machine I record the audio through stereo mix using PulseAudio and do the filtering and editing in Audacity.

Though we usually interview guests via Google Hangouts, we have had one interviewee in my office for an in-person chat. It was an incredible episode that was filled with the type of nonlinear thinking which can only be accomplished face to face. I mention this because I’m currently soliciting another New Orleans recording session (message me if you’re interested). You buy the plane ticket to come record in the studio. I buy the beer we’ll drink while recording.

as Matt  guessed  there actually are paddle boats rolling by while I record. Here’s the view from my recording studio; note the paddle boat on the left.

as Matt guessed there actually are paddle boats rolling by while I record. Here’s the view from my recording studio; note the paddle boat on the left.

Forward projections

We have several ideas about what to do next. One is a live competition of some sort, where Matt and I compete while a guest(s) judge our performance. We’re also keen to do a group chat session, in which all the members of the Software Underground will be invited to a raucous, unscripted chat about whatever’s on their minds. Unfortunately we dropped the ball on a live interview session at the SEG conference this year, but we’d still like to get together in some sciencey venue and grab randos walking by for lightning interviews.

In accord with the remainder of our professional lives, Matt and I both conduct the show in a manner which keeps us off balance. I have more fun, and learn more information more quickly, by operating in a space outside of my realm of knowledge. Ergo, we are open to your suggestions and your participation in Undersampled Radio. Come join us!

 

Tune in to Undersampled Radio

Back in the summer I mentioned Undersampled Radio, the world's newest podcast about geoscience. Well, geoscience and computers. OK, machine learning and geoscience. And conferences.

We're now 25 shows in, having started with Episode 0 on 28 January. The show is hosted by Graham 'Gram' Ganssle, a consulting and research geophysicist based in New Orleans, and me. Appropriately enough, I met Gram at the machine-learning-themed hackathon we did at SEG in 2015. He was also a big help with the local knowledge.

I broadcast from one of the phone rooms at The HUB South Shore. Gram has the luxury of a substantial book-lined office, which I imagine has ample views of paddle-steamers lolling on the Mississippi (but I actually have no idea where it is). 

To get an idea of what we chat about, check out the guests on some recent episodes:

Better than cable

The podcast is really more than just a podcast, it's really a live TV show, broadcasting on YouTube Live. You can catch the action while it's happening on the Undersampled Radio channel. However, it's not easy to catch live because the episodes are not that predictable — they are announced about 24 hours in advance on the Software Underground Slack group (you are in there, right?). We should try to put them out on the @undrsmpldrdio Twitter feed too... 

So, go ahead and watch the very latest episode, recorded last Thursday. We spoke to Tim Hopper, a data scientist in Raleigh, NC, who works at Distil Networks, a cybersecurity firm. It turns out that using machine learning to filter web traffic has some features in common with computational geophysics...

You can subscribe to the show in iTunes or Google Play, or anywhere else good podcasts are served. Grab the RSS Feed from the UndersampledRad.io website.

Of course, we take guest requests. Who would you like to hear us talk to? 

The sound of the Software Underground

If you are a geoscientist or subsurface engineer, and you like computery things — in other words, if you read this blog — I have a treat for you. In fact, I have two! Don't eat them all at once.

Software Underground

Sometimes (usually) we need more diversity in our lives. Other times we just want a soul mate. Or at least someone friendly to ask about that weird new seismic attribute, where to find a Python library for seismic imaging, or how to spell Kirchhoff. Chat rooms are great for those occasions, Slack is where all the cool kids go to chat, and the Software Underground is the Slack chat room for you. 

It's free to join, and everyone is welcome. There are over 130 of us in there right now — you probably know some of us already (apart from me, obvsly). Just go to http://swung.rocks/ to sign up, and we will welcome you at the door with your choice of beverage.

To give you a flavour of what goes on in there, here's a listing of the active channels:

  • #python — for people developing in Python
  • #sharp-rocks — for people developing in C# or .NET
  • #open-geoscience — for chat about open access content, open data, and open source software
  • #machinelearning — for those who are into artificial intelligence
  • #busdev — collaboration, subcontracting, and other business opportunities 
  • #general — chat about anything to do with geoscience and/or computers
  • #random — everything else

Undersampled Radio

If you have a long commute, or occasionally enjoy being trapped in an aeroplane while it flies around, you might have discovered the joy of audiobooks and podcasts. You've probably wished many times for a geosciencey sort of podcast, the kind where two ill-qualified buffoons interview hyper-intelligent mega-geoscientists about their exploits. I know I have.

Well, wish no more because Undersampled Radio is here! Well, here:

The show is hosted by New Orleans-based geophysicist Graham Ganssle and me. Don't worry, it's usually not just us — we talk to awesome guests like geophysicists Mika McKinnon and Maitri Erwin, geologist Chris Jackson, and geopressure guy Mark Tingay. The podcast is recorded live every week or three in Google Hangouts on Air — the link to that, and to show notes and everything else — is posted by Gram in the #undersampled Software Underground channel. You see? All these things are connected, albeit in a nonlinear, organic, highly improbable way. Pseudoconnection: the best kind of connection.

Indeed, there is another podcast pseudoconnected to Software Underground: the wonderful Don't Panic Geocast — hosted by John Leeman and Shannon Dulin — also has a channel: #dontpanic. Give their show a listen too! In fact, here's a show we recorded together!

Don't have an hour right now? OK, you asked for it, here's a clip from that show to get you started. It starts with John Leeman explaining what Fun Paper Friday is, and moves on to one of my regular rants about conferences...

In case you're wondering, neither of these projects is explicitly connected to Agile — I am just involved in both of them. I just wanted to clear up any confusion. Agile is not a podcast company, for the time being anyway.