x lines of Python: read and write a shapefile

Shapefiles are a sort-of-open format for geospatial vector data. They can encode points, lines, and polygons, plus attributes of those objects, optionally bundled into groups. I say 'sort-of-open' because the format is well-known and widely used, but it is maintained and policed, so to speak, by ESRI, the company behind ArcGIS. It's a slightly weird (annoying) format because 'a shapefile' is actually a collection of files, only one of which is the eponymous SHP file. 

Today we're going to read a SHP file, change its Coordinate Reference System (CRS), add a new attribute, and save a new file in two different formats. All in x lines of Python, where x is a small number. To do all this, we need to add a new toolbox to our xlines virtual environment: geopandas, which is a geospatial flavour of the popular data management tool pandas.

Here's the full rundown of the workflow, where each item is a line of Python:

  1. Open the shapefile with fiona (i.e. not using geopandas yet).
  2. Inspect its contents.
  3. Open the shapefile again, this time with geopandas.
  4. Inspect the resulting GeoDataFrame in various ways.
  5. Check the CRS of the data.
  6. Change the CRS of the GeoDataFrame.
  7. Compute a new attribute.
  8. Write the new shapefile.
  9. Write the GeoDataFrame as a GeoJSON file too.

By the way, if you have not come across EPSG codes yet for CRS descriptions, they are the only way to go. This dataset is initially in EPSG 4267 (NAD27 geographic coordinates) but we change it to EPSG 26920 (NAD83 UTM20N projection).

Several bits of our workflow are optional. The core part of the code, items 3, 6, 7, and 8, are just a few lines of Python:

    import geopandas as gpd
    gdf = gpd.read_file('data_in.shp')
    gdf = gdf.to_crs({'init': 'epsg:26920'})
    gdf['seafl_twt'] = 2 * 1000 * gdf.Water_Dept / 1485
    gdf.to_file('data_out.shp')

That's it! 

As in all these posts, you can follow along with the code in the Jupyter Notebook.