x lines of Python: web scraping and web APIs

The Web is obviously an incredible source of information, and sometimes we'd like access to that information from within our code. Indeed, if the information keeps changing — like the price of natural gas, say — then we really have no alternative.

Fortunately, Python provides tools to make it easy to access the web from within a program. In this installment of x lines of Python, I look at getting information from Wikipedia and requesting natural gas prices from Yahoo Finance. All that in 10 lines of Python — total.

As before, there's a completely interactive, live notebook version of this post for you to run, right in your browser. Quick tip: Just keep hitting Shift+Enter to run the cells. There's also a static repo if you want to run it locally.

Geological ages from Wikipedia

Instead of writing the sentences that describe the code, I'll just show you the code. Here's how we can get the duration of the Jurassic period fresh from Wikipedia:

url = "http://en.wikipedia.org/wiki/Jurassic"
r = requests.get(url).text
start, end = re.search(r'<i>([\.0-9]+)–([\.0-9]+)&#160;million</i>', r.text).groups()
duration = float(start) - float(end)
print("According to Wikipedia, the Jurassic lasted {:.2f} Ma.".format(duration))

The output:

According to Wikipedia, the Jurassic lasted 56.30 Ma.

There's the opportunity for you to try writing a little function to get the age of any period from Wikipedia. I've given you a spot of help, and you can even complete it right in your browser — just click here to launch your own copy of the notebook.

Gas price from Yahoo Finance

url = "http://download.finance.yahoo.com/d/quotes.csv"
params = {'s': 'HHG17.NYM', 'f': 'l1'}
r = requests.get(url, params=params)
price = float(r.text)
print("Henry Hub price for Feb 2017: ${:.2f}".format(price))

Again, the output is fast, and pleasingly up-to-the-minute:

Henry Hub price for Feb 2017: $2.86

I've added another little challenge in the notebook. Give it a try... maybe you can even adapt it to find other live financial information, such as stock prices or interest rates.

What would you like to see in x lines of Python? Requests welcome!

What I learned at Wikimania

As you may know, I like going to conferences outside the usual subsurface circuit. For this year's amusement, I spent part of last week at the annual Wikimania conference, which this year was in London, UK. I've been to Wikimania before, but this year the conference promised to be bigger and/or better than ever. And I was looking for an excuse to visit the motherland...

What is Wikimania?

Wikipedia, one of humanity's greatest achievements, has lots of moving parts:

  • All the amazing content on Wikipedia.org — the best encyclopedia the world has ever seen (according to a recent study by Rodrigues and Silvério).
  • The huge, diverse, distributed community of contributors and editors that writes and maintains the content.
  • The free, open source software it runs on, MediaWiki, and the community of developers that built it.
  • The family of sister projects: Wikimedia Commons for images, Wikidata for facts, WikiSource for references, Wiktionary for definitions, and several others.
  • The Wikimedia Foundation, the non-profit that makes all this amazing stuff happen.

Wikimania is the gathering for all of these pieces. And this year the event drew over 2000 employees of the Foundation, software contributors, editors, and consultants like me. I can't summarize it all, so here are a few highlights...

Research reviews

My favourite session, The state of WikiMedia scholarship, was hosted by Benjamin Mako Hill, Tilman Bayer, and Aaron Shaw. These guys are academics conducting research into the sociological side of wikis. They took it upon themselves to survey most of the 800 papers that appeared in the last 12 months, and to pick a few themes and highlights them for everyone. A little like the Geophysics Bright Spots column in The Leading Edge, but for the entire discipline. Very cool — definitely an idea to borrow!

A definition of community

Communities are one thing, but what sets the Wikimania community apart is its massive productivity. It has created one of the premier intellectual works in history, and done so in under 10 years, and without a leader or a Gantt chart. So it's interesting to hear about what makes this community work. What would you guess? Alignment? Collaboration? Altruism?

No, it seems to be conflict. Conflict, centered firmly on content—specifically sources, wording, accuracy, and article structure—is more prevalent in the community than collaboration (Kim Osman, WikiSym 2013). It's called it 'generative friction', and it underlines something I think is intuitively obvious: communities thrive on diversity, not homogeneity.

How to make a difference

The most striking talk, illustrating perfectly how the world today is a new and wonderful place, was by one of the most inspiring leaders I've ever seen in action: Clare Sutcliffe. In 2012, she discovered that kids weren't getting a chance to give computers instructions (other than 'post this', or 'buy that') in most UK primary schools. Instead of writing a paper about it, or setting up a research institute, or indeed blogging about it, she immediately started doing something about it. Her program, Code Club, is now running in more than 2000 schools. Today, less than 3 years after starting, Code Club is teaching teachers too, and has spread internationally. Amazing and inspiring.

Amusingly, here's a (paraphrased) comment she got from a computer science professor at the end:

I teach computer science at university, where we have to get the kids to unlearn all the stuff they think they know about programming. What are you teaching them about computer science and ethics, or is it all about making games?

Some people are beyond help.

The product is not the goal

I'll finish off with a remark by the new Executive Director of the WikiMedia Foundation, Lila Tretikov. Now that Wikipedia's quality issues are well and truly behind it — the enemy now is bias. At least 87% of edits are by men. She wondered if it might be time to change the goal of the community from 'the greatest possible article', to 'the greatest possible participation'. By definition, the greatest article is also presumably unbiased.

In other words, instead of imagining a world where everyone has free access to the sum of all human knowledge, she is asking us to imagine a world where everyone contributes to the sum of all human knowledge. If you can think of a more profound idea than this — let's hear it in the comments!

The next Wikimania will be in Mexico City, in July 2015. See you there!

Here's a thought. All this stuff is free — yay! But happy thoughts aren't enough to get stuff done. So if you value this corner of the Internet, please consider donating to the Foundation. Better still, if your company values it — maybe it uses the MediaWiki software for its own wiki — then it can help with the software's development by donating. Instead of giving Microsoft $1M for a rubbish SharePoint pseudowiki, download MediaWiki for free and donate $250k to the foundation. It's a win-win... and it's tax-deductible!

Atlantic geology hits Wikipedia

WikiProject Geology is one of the gathering places for geoscientists in Wikipedia.Regular readers of this blog know that we're committed to open scientific communication, and that we're champions of wikis as one of the venues for that communication, and that we want to see more funky stuff happen at conferences. In this spirit, we hosted a Wikipedia editing session at the Atlantic Geoscience Society Colloquium in Wolfville, Nova Scotia, this past weekend. 

As typically happens with these funky sessions, it wasn't bursting at the seams: The Island of Misfit Toys is not overcrowded. There were only 7 of us: three Agilistas, another consultant, a professor, a government geologist, and a student. But it's not the numbers that matter (I hope), it's the spirit of the thing. We were a keen bunch and we got quite a bit done. Here are the articles we started or built upon:

The birth of the Atlantic Geoscience Society page gave the group an interesting insight into Wikipedia's quality control machine. Within 10 minutes of publishing it, the article was tagged for speedy deletion by an administrator. This sort of thing is always a bit off-putting to noobs, because Wikipedia editors can be a bit, er, brash, or at least impersonal. This is not that surprising when you consider that new pages are created at a rate of about one a minute some days. Just now I resurrected a stripped-down version of the article, and it has already been reviewed. Moral: don't let anyone tell you that Wikipedia is a free-for-all.

All of these pages are still (and always will be) works in progress. But we added 5 new pages and a substantial amount of material with our 28 or so hours of labour. Considering most of those who came had never edited a wiki before, I'm happy to call this a resounding success. 

Much of my notes from the event could be adapted to any geoscience wiki editing session — use them as a springboard to get some champions of open-access science together at your next gathering. If you'd like our help, get in touch.

January linkfest

Time for the quarterly linkfest! Got stories for next time? Contact us.

BP's new supercomputer, reportedly capable of about 2.2 petaflops, is about as fast as Total's Pangea machine in Paris, which booted up almost a year ago. These machines are pretty amazing — Pangea has over 110,000 cores, and 442 terabytes of memory — but BP claims to have bested that with 1 petabyte of RAM. Remarkable. 

Leo Uieda's open-source modeling tool Fatiando a Terra got an upgrade recently and hit version 0.2. Here's Leo himself demonstrating a forward seismic model:

I'm a geoscientst, get me out of here is a fun-sounding new educational program from the European Geosciences Union, which has recently been the very model of a progressive technical society (along with the AGU is another great example). It's based on the British outreach program, I'm a scientist, get me out of here, and if you're an EGU member (or want to be), I think you should go for it! The deadline: 17 March, St Patrick's Day.

Darren Wilkinson writes a great blog about some of the geekier aspects of geoscience. You should add it to your reader (I'm using The Old Reader to keep up with blogs since Google Reader was marched out of the building). He wrote recently about this cool tool — an iPad controller for desktop apps. I have yet to try it, but it seems a good fit for tools like ArcGIS, Adobe Illustrator.

Speaking of big software, check out Joe Kington's Python library for GeoProbe volumes — I wish I'd had this a few years ago. Brilliant.

And speaking of cool tools, check out this great new book by technology commentator and philosopher Kevin Kelly. Self-published and crowd-sourced... and drawn from his blog, which you can obviously read online if you don't like paper. 

If you're in Atlantic Canada, and coming to the Colloquium next weekend, you might like to know about the wikithon on Sunday 9 February. We'll be looking for articles relevant to geoscientists in Atlantic Canada to improve. Tim Sherry offers some inspiration. I would tell you about Evan's geocomputing course too... but it's sold out.

Heard about any cool geostuff lately? Let us know in the comments. 

October linkfest

From Hart (2013). ©SEG/AAPGIt's the Hallowe'en linkfest! Just the good bits from our radar...

If you're a member of SEG or AAPG, you can't have missed their new joint-venture journal, Interpretation. Issue 2 just came out. My favourite article so far has been Bruce Hart's Whither seismic stratigraphy in Issue 1. It included these excellent little forward models from an earlier paper of his — it's so important for interpreter's to think in this space where geological architecture and geophysical imaging overlap. 

Muon tomography is in the news again, this time in relation to monitoring CCS repositories (last time it was volcanos). Jon Gluyas, author of the textbook Petroleum Geoscience, is the investgator at Durham in the UK (my alma mater). I do love the concept — imaging the subsurface with cosmic rays — but I'm only just getting to grips with sound waves.

If you read this blog regularly, you probably have some geeky tendencies. We've linked to a couple of these blogs before, but they're must-read for anyone into technology and geoscience, with lots of code and workflow examples: 

Continuing the geeky theme, we've been getting more and more into building things recently. Check out our fiddling in GitHub (a code repository) — an easy way in is code.agilegeoscience.com. Watch this space!

Speaking of fiddling with code, you already know about the hackathon we hosted in Houston last month. But there's talk of repeating the fun at the AAPG Annual Convention, also in Houston, in April next year. Brian Romans has started a list of potential projects around digital stratigraphy — please leave a comment there or here to add to it. Where's the gap in your workflow?

A few more quick hits:

If you want these nuggets fresh, you can follow me on Twitter or glance at my pinboard. If you have stuff to share, use the comments or get in touch. Over and out.

Seismic models: Hart, BS (2013). Whither seismic stratigraphy? Interpretation, volume 1 (1), and is copyright of SEG and AAPG. The image from the Trowel Blazers event is licensed CC-BY-SA by Wikipedia user Mrjohncummings

Ten ways to make a difference

SEG WikiAfter reading my remarks yesterday about geoscience wikis, perhaps you're itching to share some of what you know. Below are ten quick ways to get started. And if you're going to SEG next week, you're in luck: you'll find a quick way to get started. 

Ten things you can do

First, if you really just want to dive in, here are ten easy things you can do in almost any wiki. Let's use SEG Wiki as an example — but this applies equally well to SubSurfWiki, PetroWiki, or Wikipedia.

  1. Read it — find a page or category that interests you, and start exploring the content
  2. Edit it — nothing tricky, but if you find a typo or other small error, hit Edit and fix it (you can do this without logging in on Wikipedia, but most other wikis require you to make an account first. This isn't usually a deliberate effort to put you off — allowing anonymous editing results in an amazing amount robot spam. Yes, robot spam.)
  3. Share it — like most of the web, wikis need to be shared to survive. When you find something useful, share it.
  4. Add a profile — if you're an SEG member, you already have an account on SEG Wiki. Why not add some info about yourself? Go log in to SEG.org then click this link. Here's mine
  5. Add a sandbox — Edit your user page, add this: [[/Sandbox/]], then save your page. You'll see a red link. Click on it. Try some editing — you can do anything you like here. Again, here's mine — click Edit and copy my code. 
  6. Fix equations — most of the equations in the SEG Encyclopedic Dictionary are poorly formatted. If you know LaTeX, you can help fix them. Here's one that's been fixed. Here's a bad one (if it looks OK, someone beat you to it :)
  7. Add references — Just like technical papers, wikis need citations and references if they are to be useful and trusted. Most articles in SEG Wiki have citations, but the references are on another page. Here's one I've fixed. 
  8. Add a figure — Again, the figures are mostly divorced from their articles. The Q article shows one way to integrate them. Some articles have lots of figures. 
  9. Improve a definition — Many of the Dictionary definitions are out of date or unhelpfully terse. Long articles probably belong in the 'main' namespace (that is, not the Dictionary part) — so for example I split Spectral decomposition into a main article, apart from the short dictionary definition.
  10. Add an article — This may seem like a big step, but don't be shy. Be bold! We can worry later if the new article needs to be split or combined or renamed or reformatted. The point is to start.

Wiki markup takes a little getting used to, but you can get a very long way with a little know-how. This wiki markup cheatsheet will give you a head start.

One place you can start

SEG Annual MeetingAt the SEG Annual Meeting next week, I'll be hanging about the Press Room from 11 am till 1 pm every day, with John Stockwell, Karl Schleicher and some other wiki enthusiasts. We'd be happy to answer any questions or help you get started.

Bring your laptop! Spread the word! Bring a friend! See you there!

Wiki world of geoscience

This weekend, I noticed that there was no Wikipedia article about Harry Wheeler, one of the founders of theoretical stratigraphy. So I started one. This brings the number of biographies I've started to 3:

  • Karl Zoeppritz — described waves almost perfectly, but died at the age of 26
  • Johannes Walther — started as a biologist, but later preferred rocks
  • Harry Wheeler — if anyone has a Wheeler diagram to share, please add it!

Many biographies of notable geoscientists are still missing (there are hundreds, but here are three): 

  • Larry Sloss — another pioneer of modern stratigraphy
  • Oz Yilmaz — prolific seismic theoretician and practioner
  • Brian Russell — entrepreneur and champion of seismic analysis

It's funny, Wikipedia always seems so good — it has deep and wide content on everything imaginable. I think I must visit it 20 or 30 times a day. But when you look closely, especially at a subject you know a bit about, there are lots of gaps (I wonder if this is one of the reasons people sometimes deride it?). There is a notability requirement for biographies, but for some reason this doesn't seem to apply to athletes or celebrities. 

I was surprised the Wheeler page didn't exist, but once you start reading, there are lots of surprises:

I run a geoscience wiki, but this is intended for highly esoteric topics that probably don't really belong in Wikipedia, e.g. setting parameters for seismic autopickers, or critical reviews of subsurface software (both on my wish list). I am currently working on a wiki for AAPG — is that the place for 'deep' petroleum geoscience? I also spend time on SEG Wiki... With all these wikis, I worry that we risk spreading ourselves too thinly? What do you think?

In the meantime, can you give 10 minutes to improve a geoscience article in Wikipedia? Or perhaps you have a classful of students to unleash on an assignment?

Tomorrow, I'll tell you about an easy way to help improve some geophysics content.

News of the month

Another month flies by, and it's time for our regular news round-up! News tips, anyone?

Knowledge sharing

At the start of the month, SPE launched PetroWiki. The wiki has been seeded with one part of the 7-volume Petroleum Engineering Handbook, a tome that normally costs over $600. They started with Volume 2, Drilling Engineering, which includes lots of hot topics, like fracking (right). Agile was involved in the early design of the wiki, which is being built by Knowledge Reservoir

Agile stuff

Our cheatsheets are consistenly some of the most popular things on our site. We love them too, so we've been doing a little gardening — there are new, updated editions of the rock physics and geophysics cheatsheets.

Thank you so much to the readers who've let us know about typos! 


Nothing else really hit the headlines this month — perhaps people are waiting for SEG. Here are some nibbles...

  • We just upgraded a machine from Windows to Linux, sadly losing Spotfire in the process. So we're on the lookout for another awesome analytics tool. VISAGE isn't quite what we need, but you might like these nice graphs for oil and gas.
  • Last month we missed the newly awarded exploration licenses in the inhospitable Beaufort Sea [link opens a PDF]. Franklin Petroleum of the UK might have been surprised by the fact that they don't seem to have been bidding against anyone, as they picked up all six blocks for little more than the minimum bid.
  • It's the SEG Annual Meeting next week... and Matt will be there. Look out for daily updates from the technical sessions and the exhibition floor. There's at least one cool new thing this year: an app!

This regular news feature is for information only. We aren't connected with any of these organizations, and don't necessarily endorse their products or services. 

Wiki maniacs wanted

Jimmy Wales, saluting the crowd at Wikimania 2012Jimmy Wales (right) believes profoundly in the Wikimedia Foundation's mission:

Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That's what we're doing.

If that mission sounds a bit grand, that's because it is. The amazing thing about this crusade, possibly the most altruistic and ambitious goal ever undertaken, is that you can help. The grand mission, should you choose to accept it, belongs to you—and to every other highly privileged, highly educated person you know.

Wikipedia needs you

One of the most surprising things I heard last week at Wikimania was that the number of active editors is falling, down 4000 since 2011 at 85 000. You can help fix it: 

  • Create an account to watch pages, change the look and behaviour of Wikipedia, and edit articles without revealing your IP address.
  • Next time you see something wrong or incomplete, edit it! Just click Edit.
  • Help improve articles on your home town, your hobbies, and your profession.
  • Pick a subject you care about (Well logging?) and look for red links, which are articles in need of creation.
  • Join a project like WikiProject:Geology to collaborate with other editors.
  • The Wikimedia Foundation runs on donations. Donate!
  • If you want somewhere to practise, use your Wikipedia Sandbox (requires an account), or poke around on SEGwiki or SubSurfWiki, where you're always welcome.

Imagine a world in which you can contribute to the sum of all human knowledge. That's what we have.

Wiki maniacs unite

Last year, we decided to go to at least one non-geoscience conference every year. The idea is to meet other communities, learn about other fields, have some new ideas, and find more ways to be useful. So far, Evan and I have been to symposiums on mathematics, geothermal energy, being more awesome, and science online. Continuing in this vein, I just got home from Wikimania 2012 — the international conference about all things wiki.

Strictly speaking, Wikimania is about the Wikimedia movement, the global effort to "give to every single person on the planet free access to the sum of all human knowledge". This quest is supported by the Wikimedia Foundation, a non-profit organization of professional enthusiasts. Their most conspicuous project is Wikipedia, but it's far from being the only one. Have you heard of Wikimedia Commons? Wikisource? Wikibooks? Read all about them.

The conference was unlike anything I've ever been to. Despite attracting over 950 delegates, it felt more like a meeting of colleagues and friends than a conference of professionals and strangers. I've never felt such a strong undertow of common purpose, and quiet, deliberate action. The phrase intentional community was made for this group.

In short, Wikipedia looks even more awesome from the inside than it does from the outside.

If you too are a Wikipedia enthusiast, here are some things I learned:

  • The number of active editors has fallen by 4000 since 2011, to 85k
  • During the conference, the number of articles in English Wikipedia passed 4 million
  • Developers are working hard to make Wikipedia easier to edit, and big changes are coming
  • Wikipedia Zero is an important effort to make the site available to everyone
  • Developers are working on making Wikipedia available via SMS and other channels
  • Wikis—both private and public—are everywhere: schools, museums, libraries, galleries, academia, government, societies, and corporations

Next time, I'll list a few ways you can get more involved.

The photo is from Wikimedia Commons, licensed CC-BY-SA by User:Awersowy