x lines of Python: web scraping and web APIs
/The Web is obviously an incredible source of information, and sometimes we'd like access to that information from within our code. Indeed, if the information keeps changing — like the price of natural gas, say — then we really have no alternative.
Fortunately, Python provides tools to make it easy to access the web from within a program. In this installment of x lines of Python, I look at getting information from Wikipedia and requesting natural gas prices from Yahoo Finance. All that in 10 lines of Python — total.
As before, there's a completely interactive, live notebook version of this post for you to run, right in your browser. Quick tip: Just keep hitting Shift+Enter to run the cells. There's also a static repo if you want to run it locally.
Geological ages from Wikipedia
Instead of writing the sentences that describe the code, I'll just show you the code. Here's how we can get the duration of the Jurassic period fresh from Wikipedia:
url = "http://en.wikipedia.org/wiki/Jurassic" r = requests.get(url).text start, end = re.search(r'<i>([\.0-9]+)–([\.0-9]+) million</i>', r.text).groups() duration = float(start) - float(end) print("According to Wikipedia, the Jurassic lasted {:.2f} Ma.".format(duration))
The output:
According to Wikipedia, the Jurassic lasted 56.30 Ma.
There's the opportunity for you to try writing a little function to get the age of any period from Wikipedia. I've given you a spot of help, and you can even complete it right in your browser — just click here to launch your own copy of the notebook.
Gas price from Yahoo Finance
url = "http://download.finance.yahoo.com/d/quotes.csv" params = {'s': 'HHG17.NYM', 'f': 'l1'} r = requests.get(url, params=params) price = float(r.text) print("Henry Hub price for Feb 2017: ${:.2f}".format(price))
Again, the output is fast, and pleasingly up-to-the-minute:
Henry Hub price for Feb 2017: $2.86
I've added another little challenge in the notebook. Give it a try... maybe you can even adapt it to find other live financial information, such as stock prices or interest rates.
What would you like to see in x lines of Python? Requests welcome!