An update on Volve

Writing about the new almost-open dataset at Groningen yesterday reminded me that things have changed a little on Equinor’s Volve dataset in Norway. Illustrating the principle that there are more ways to get something wrong than to get them right, here’s the situation there.


In 2018, Equinor generously released a very large dataset from the decommissioned field Volve. The data is undoubtedly cool, but initially it was released with no licence. Later in 2018, a licence was added but it was a non-open licence, CC BY-NC-SA. Then, earlier this year, the licence was changed to a modified CC BY licence. Progress, sort of.

I think CC BY is an awesome licence for open data. But modifying licences is always iffy and in this case the modifications mean that the licence can no longer be called ‘open’, because the restrictions they add are not permitted by the Open Definition. For me, the problematic clauses in the modification are:

  • You can’t sell the dataset. This is almost as ambiguous as the previous “non-commercial” clause. What if it’s a small part of a bigger offering that adds massive value, for example as demo data for a software package? Or as one piece in a large data collection? Or as the basis for a large and expensive analysis? Or if it was used to train a commercial neural network?

  • The license covers all data in the dataset whether or not it is by law covered by copyright. It's a bit weird that this is tucked away in a footnote, but okay. I don't know how it would work in practice because CC licenses depend on copyright. (The whole point of uncopyrightable content is that you can't own rights in it, nevermind license it.)

It’s easy to say, “It’s fine, that’s not what Equinor meant.” My impression is that the subsurface folks in Equinor have always said, "This is open," and their motivation is pure and good, but then some legal people get involved and so now we have what we have. Equinor is an enormous company with (compared to me) infinite resources and a lot of lawyers. Who knows how their lawyers in a decade will interpret these terms, and my motivations? Can you really guarantee that I won’t be put in an awkward situation, or bankrupted, by a later claim — like some of GSI’s clients were when they decided to get tough on their seismic licenses?

Personally, I’ve decided not to touch Volve until it has a proper open licence that does not carry this risk.

Volve: not open after all

Back in June, Equinor made the bold and exciting decision to release all its data from the decommissioned Volve oil field in the North Sea. Although the intent of the release seemed clear, the dataset did not carry a license of any kind. Since you cannot use unlicensed content without permission, this was a problem. I wrote about this at the time.

To its credit, Equinor listened to the concerns from me and others, and considered its options. Sensibly, it chose an off-the-shelf license. It announced its decision a few days ago, and the dataset now carries a Creative Commons Attribution-NonCommercial-ShareAlike license.

Unfortunately, this license is not ‘open’ by any reasonable definition. The non-commercial stipulation means that a lot of people, perhaps most people, will not be able to legally use the data (which is why non-commercial licenses are not open licenses). And the ShareAlike part means that we’re in for some interesting discussion about what derived products are, because any work based on Volve will have to carry the CC BY-NC-SA license too.

Non-commercial licenses are not open

Here are some of the problems with the non-commercial clause:

NC licenses come at a high societal cost: they provide a broad protection for the copyright owner, but strongly limit the potential for re-use, collaboration and sharing in ways unexpected by many users

  • NC licenses are incompatible with CC-BY-SA. This means that the data cannot be used on Wikipedia, SEG Wiki, or AAPG Wiki, or in any openly licensed work carrying that license.

  • NC-licensed data cannot be used commercially. This is obvious, but far-reaching. It means, for example, that nobody can use the data in a course or event for which they charge a fee. It means nobody can use the data as a demo or training data in commercial software. It means nobody can use the data in a book that they sell.

  • The boundaries of the license are unclear. It's arguable whether any business can use the data for any purpose at all, because many of the boundaries of the scope have not been tested legally. What about a course run by AAPG or SEG? What about a private university? What about a government, if it stands to realize monetary gain from, say, a land sale? All of these uses would be illiegal, because it’s the use that matters, not the commercial status of the user.

Now, it seems likely, given the language around the release, that Equinor will not sue people for most of these use cases. They may even say this. Goodness knows, we have enough nudge-nudge-wink-wink agreements like that already in the world of subsurface data. But these arrangements just shift the onus onto the end use and, as we’ve seen with GSI, things can change and one day you wake up with lawsuits.

ShareAlike means you must share too

Creative Commons licenses are, as the name suggests, intended for works of creativity. Indeed, the whole concept of copyright, depends on creativity: copyright protects works of creative expression. If there’s no creativity, there’s no basis for copyright. So for example, a gamma-ray log is unlikely to be copyrightable, but seismic data is (follow the GSI link above to find out why). Non-copyrightable works are not covered by Creative Commons licenses.

All of which is just to help explain some of the language in the CC BY-NC-SA license agreement, which you should read. But the key part is in paragraph 4(b):

You may distribute, publicly display, publicly perform, or publicly digitally perform a Derivative Work only under the terms of this License

What’s a ‘derivative work’? It’s anything ‘based upon’ the licensed material, which is pretty vague and therefore all-encompassing. In short, if you use or show Volve data in your work, no matter how non-commercial it is, then you must attach a CC BY-NC-SA license to your work. This is why SA licenses are sometimes called ‘viral’.

By the way, the much-loved F3 and Penobscot datasets also carry the ShareAlike clause, so any work (e.g. a scientific paper) that uses them is open-access and carries the CC BY-SA license, whether the author of that work likes it or not. I’m pretty sure no-one in academic publishing knows this.

By the way again, everything in Wikipedia is CC BY-SA too. Maybe go and check your papers and presentations now :)

problems-dont-have.png

What should Equinor do?

My impression is that Equinor is trying to satisfy some business partner or legal edge case, but they are forgetting that they have orders of magnitude more capacity to deal with edge cases than the potential users of the dataset do. The principle at work here should be “Don’t solve problems you don’t have”.

Encumbering this amazing dataset with such tight restrictions effectively kills it. It more or less guarantees it cannot have the impact I assume they were looking for. I hope they reconsider their options. The best choice for any open data is CC-BY.