Which open licence should I choose?

I’ve written about open data a few times recently. And not-so-recently. And there’s been quite a bit of chat about open subsurface benchmarks in the Software Underground recently. As more people consider openly releasing data — or code, or other content — one question comes up fairly often is: Which licence should I choose?

I’ll start at the beginning, and I am not a lawyer, but this is going to be very high level. So do click on the links to read more.

What is copyright?

You automatically own the copyright to anything original that you create. You don’t have to register it, but the thing you made — and it must be a thing, you can’t copyright ideas — must be original. It could be a photo, a song, or a seismic interpretation. Physical measurements with no creative input, such as well logs, are not copyrightable… but a database consisting of such data is (so-called database rights). Your rights are exclusive, worldwide, and last until some years after you die (it varies).

If someone wants to use your work, even if they just found it on the Internet, they must either claim Fair Use, or seek permission from you. Giving permission means granting a licence; it can be as restrictive and arcane as you want.

If you don’t want people bothering you about licences, or if you want to actively encourage people to use and adapt your work, you can preemptively grant an open licence.

What is openness?

Before you start thinking about licences, there are two more big things to learn about:

  1. What is open? Not all licences, not even all Creative Commons licences, meet the Open Definition. In brief, this states that “Open data and content can be freely used, modified, and shared by anyone for any purpose” — you can’t restrict people based on their use case or location. So licences that forbid commercial application are not open.

  2. What is permissiveness? Once you’ve decided to go open, you need to decide where you stand on permissiveness. Some licences, notably those advocated by the GNU Free Software Movement, compel licensees (users) to preserve the openness of the work in any future redistribution. This ‘viral’ condition is sometimes called copyleft.

In some circles, a near-religious war smoulders on the permissiveness issue. You need to make up your own mind where you stand, or at least understand the issues.

By the way, granting a licence does not mean giving up your rights. In fact, you must own the copyright in order to grant the licence. Many scientists don’t realize we’ve been giving away the copyright in our work for decades, as a (completely unnecessary and made up) condition of publication.

Another source of confusion: open licences are also not the same thing as public domain. Public domain means that the work is free from copyright restrictions. In general though, it cannot be applied to a copyrighted work (though CC0 tries to relinquish copyright where possible). For example, On The Origin of Species is public domain, as is most work produced by the United States government (for example, by the USGS).

One last thing: an often overlooked aspect of licensing is protection for you, the licensor. All common licences include language that indemnifies you from misuse or misinterpretation of your work. So be careful about putting your stuff ‘out there’ with anything other than a standard licence: you may be leaving yourself open to liability issues later.

Open licences

Rather than writing a lot of stuff that’s been written by smarter people than me, I thought I’d draw a diagram to try to explain the differences between some common licences (there are certainly a lot more than the ones I mention here).

Just to re-iterate: there are a lot more licences than the ones mentioned here, these are just examples.

What do I recommend?

For content, my personal belief is that CC-BY most aptly captures the way science works. Scientists 'build on the shoulders of giants' by re-using the work of others with fastidious attribution, usually by citation. Accordingly, the CC-BY protects the licensor, ensures attribution, and that's it. If you prefer copyleft licences, the equivalent licence is CC-BY-SA.

But Creative Commons recommend against using CC licences for source code, so what should you do then?

For code, the permissive licence closest to CC-BY is the MIT/BSD/Apache family of licences, of which only the Apache 2.0 licence offers some specific protections with respect to patents (in particular, it protects licensees from ‘upstream’ patent infringements). The equivalent copyleft licences are the GPL (for applications) and LGPL (for libraries).

For data, I tend to use CC-BY, but there are some specialist data licences (beware, they are poorly named in my opinion: the seemingly ‘vanilla’ ODbL is copyleft; the permissive equivalent is ODC-By).

What about mixed content, like a Jupyter Notebook? You have to be practical; maybe it depends on whether you consider your notebooks to be 'content' or 'source code'. I sometimes put at the bottom of a notebook something like Open source content. Text is CC-BY, code is Apache 2.0 and I think this makes my intent clear.

Tools

There are some tools around to help you make a choice of licence:

Last thing

Note that open licences are just one piece of the jigsaw puzzle of reproducible science and reusable content. You also need to think about open and accessible data formats (e.g. CSV not XLS), accessible content (DOIs and open indexes), and documentation.

Although insufficient, open licences are a necessary component though. And while licences can be changed, they cannot be revoked… so it’s worth putting some thought into your choices before you start pushing your content out into the world.

If it seems hard to navigate, do get in touch, we’d be happy to help if and where we can (notwithstadning IANAL). If your situation is at all complicated I recommend seeking professional legal advice — but do go out of your way to find one who understands both the motivation for, and the legal issues around, open licensing.

Why I don't flout copyright

Lots of people download movies illegally. Or spoof their IP addresses to get access to sports fixtures. Or use random images they found on the web in publications and presentations (I've even seen these with the watermark of the copyright owner on them!). Or download PDFs for people who aren't entitled to access (#icanhazpdf). Or use sketchy Russian paywall-crumbling hacks. It's kind of how the world works these days. And I realize that some of these things don't even sound illegal.

This might surprise some people, because I go on so much about sharing content, open geoscience, and so on. But I am an annoying stickler for copyright rules. I want people to be able to re-use any content they like, without breaking the law. And if people don't want to share their stuff, then I don't want to share it.

Maybe I'm just getting old and cranky, but FWIW here are my reasons:

  1. I'm a content producer. I would like to set some boundaries to how my stuff is shared. In my case, the boundaries amount to nothing more than attribution, which is only fair. But still, it's my call, and I think that's reasonable, at least until the material is, say, 5 years old. But some people don't understand that open is good, that shareable content is better than closed content, that this is the way the world wants it. And that leads to my second reason:
  2. I don't want to share closed stuff as if it was open. If someone doesn't openly license their stuff, they don't deserve the signal boost — they told the world to keep their stuff secret. Why would I give them the social and ethical benefits of open access while they enjoy the financial benefits of closed content? This monetary benefit comes from a different segment of the audience, obviously. At least half the people who download a movie illegally would not, I submit, have bought the movie at a fair price.

So make a stand for open content! Don't share stuff that the creator didn't give you permission to share. They don't deserve your gain filter.

What is Creative Commons?

Not a comprehensive answer either, but much more brilliantI just found myself typing a long email in reply to the question, "What is a Creative Commons license and how do I use it?" Instead, I thought I'd post it here. Note: I am not a lawyer, and this is not a comprehensive answer.

Creative Commons depends on copyright

There is no relinquishment of copyright. This is important. In the case of 52 Things, Agile Geoscience is the copyright holder. In the case of an article, it's the authors themselves, unless the publisher gets them to sign a form relinquishing it. Copyright is an automatic, moral right (under the Berne Convention), and boils down to the right to be identified as the authors of the work ('attribution').

Most copyright holders grant licenses to re-use their work. For instance, you can pay hundreds of dollars to reproduce a couple of pages from an SPE manual for a classroom of students (if you are insane). You can reprint material from a book or journal article — again, usually for a fee. These licenses are usually non-exclusive, non-transferable, and use-specific. And the licensee must (a) ask and (b) pay the licensor (that is, the copyright holder). This is 'the traditional model'.

Obscurity is a greater threat than piracy

Some copyright holders are even more awesome. They recognize that (a) it's a hassle to always have to ask, and (b) they'd rather have the work spread than charge for it and stop it spreading. They wish to publish 'open' content. It's exactly like open source software. Creative Commons is one, very widespread, license you can apply to your work that means (a) they don't have to ask to re-use it, and (b) they don't have to pay. You can impose various restrictions if you must.

Creative Commons licenses are everywhere. You can apply Creative Commons licenses at will, to anything you like. For example, you can CC-license your YouTube videos or Flickr photos. We CC-license our blog posts. Almost everything in Wikipedia is CC-licensed. You could CC-license a single article in a magazine (in fact, I somewhat sneakily did this last February). You could even CC-license a scientific journal (imagine!). Just look at all the open content in the world!

Creative Commons licenses are easy to use. Using the license is very easy: you just tell people. There is no cost or process. Look at the footer of this very page, for example. In print, you might just add the line This article is licensed under a Creative Commons Attribution license. You may re-use this work without permission. See http://creativecommons.org/licenses/by/3.0/ for details. (If you choose another license, you'd use different wording.)

Creative_Commons.jpg

I recommend CC-BY licenses. There are lots of open licenses, but CC-BY strikes a good balance between being well-documented and trusted, and being truly open (though it is not recognized as such, on a technicality, by copyfree.org). The main point is that they are very open, allowing anyone to use the work in any way, provided they attribute it to the author and copyright holder — it's just like scientific citation, in a way.

Do you openly license your work? Or do you wish more people did? Do you notice open licenses?

Creative Commons graphic by Flickr user Michael Porter. The cartoon is from Nerdson, and licensed CC-BY. 'Obscurity is a greater threat than piracy' is paraphrased from a quote by Tim O'Reilly, publishing 2.0 legend.