10. GeoPandas#

Open In Colab

10.1. Overview#

GeoPandas is an open-source Python library that simplifies working with geospatial data by extending Pandas data structures. It seamlessly integrates geospatial operations with a pandas-like interface, allowing for the manipulation of geometric types such as points, lines, and polygons. GeoPandas combines the functionalities of Pandas and Shapely, enabling geospatial operations like spatial joins, buffering, intersections, and projections with ease.

10.2. Learning Objectives#

By the end of this lecture, you should be able to:

  • Understand the basic data structures in GeoPandas: GeoDataFrame and GeoSeries.

  • Create GeoDataFrames from tabular data and geometric shapes.

  • Read and write geospatial data formats like Shapefile and GeoJSON.

  • Perform common geospatial operations such as measuring areas, distances, and spatial relationships.

  • Visualize geospatial data using Matplotlib and GeoPandas’ built-in plotting functions.

  • Work with different Coordinate Reference Systems (CRS) and project geospatial data.

10.3. Concepts#

The core data structures in GeoPandas are GeoDataFrame and GeoSeries. A GeoDataFrame extends the functionality of a Pandas DataFrame by adding a geometry column, allowing spatial data operations on geometric shapes. The GeoSeries handles geometric data (points, polygons, etc.).

A GeoDataFrame can have multiple geometry columns, but only one is considered the active geometry at any time. All spatial operations are applied to this active geometry, accessible via the .geometry attribute.

10.4. Installing and Importing GeoPandas#

Before we begin, make sure you have geopandas installed. You can install it using:

# %pip install geopandas

Once installed, import GeoPandas and other necessary libraries:

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

10.5. Creating GeoDataFrames#

A GeoDataFrame is a tabular data structure that contains a geometry column, which holds the geometric shapes. You can create a GeoDataFrame from a list of geometries or from a pandas DataFrame.

# Creating a GeoDataFrame from scratch
data = {
    "City": ["Tokyo", "New York", "London", "Paris"],
    "Latitude": [35.6895, 40.7128, 51.5074, 48.8566],
    "Longitude": [139.6917, -74.0060, -0.1278, 2.3522],
}

df = pd.DataFrame(data)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))
gdf
City Latitude Longitude geometry
0 Tokyo 35.6895 139.6917 POINT (139.6917 35.6895)
1 New York 40.7128 -74.0060 POINT (-74.006 40.7128)
2 London 51.5074 -0.1278 POINT (-0.1278 51.5074)
3 Paris 48.8566 2.3522 POINT (2.3522 48.8566)

10.6. Reading and Writing Geospatial Data#

GeoPandas allows reading and writing a variety of geospatial formats, such as Shapefiles, GeoJSON, and more. We’ll use a GeoJSON dataset of New York City borough boundaries.

10.6.1. Reading a GeoJSON File#

We’ll load the New York boroughs dataset from a GeoJSON file hosted online.

url = "https://github.com/opengeos/datasets/releases/download/vector/nybb.geojson"
gdf = gpd.read_file(url)
gdf.head()
BoroCode BoroName Shape_Leng Shape_Area geometry
0 5 Staten Island 330470.010332 1.623820e+09 MULTIPOLYGON (((970217.022 145643.332, 970227....
1 4 Queens 896344.047763 3.045213e+09 MULTIPOLYGON (((1029606.077 156073.814, 102957...
2 3 Brooklyn 741080.523166 1.937479e+09 MULTIPOLYGON (((1021176.479 151374.797, 102100...
3 1 Manhattan 359299.096471 6.364715e+08 MULTIPOLYGON (((981219.056 188655.316, 980940....
4 2 Bronx 464392.991824 1.186925e+09 MULTIPOLYGON (((1012821.806 229228.265, 101278...

This GeoDataFrame contains several columns, including BoroName, which represents the names of the boroughs, and geometry, which stores the polygons for each borough.

10.6.2. Writing to a GeoJSON File#

GeoPandas also supports saving geospatial data back to disk. For example, we can save the GeoDataFrame as a new GeoJSON file:

output_file = "nyc_boroughs.geojson"
gdf.to_file(output_file, driver="GeoJSON")
print(f"GeoDataFrame has been written to {output_file}")
GeoDataFrame has been written to nyc_boroughs.geojson

Similarly, you can write GeoDataFrames to other formats, such as Shapefiles, GeoPackage, and more.

output_file = "nyc_boroughs.shp"
gdf.to_file(output_file)
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 1623819823.80999994 of field Shape_Area of feature 0 not successfully written. Possibly due to too larger number with respect to field width
  ogr_write(
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 3045212795.19999981 of field Shape_Area of feature 1 not successfully written. Possibly due to too larger number with respect to field width
  ogr_write(
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 1937478507.6099999 of field Shape_Area of feature 2 not successfully written. Possibly due to too larger number with respect to field width
  ogr_write(
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 636471539.774000049 of field Shape_Area of feature 3 not successfully written. Possibly due to too larger number with respect to field width
  ogr_write(
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 1186924686.49000001 of field Shape_Area of feature 4 not successfully written. Possibly due to too larger number with respect to field width
  ogr_write(
output_file = "nyc_boroughs.gpkg"
gdf.to_file(output_file, driver="GPKG")

10.7. Simple Accessors and Methods#

Now that we have the data, let’s explore some simple GeoPandas methods to manipulate and analyze the geometric data.

10.7.1. Measuring Area#

We can calculate the area of each borough. GeoPandas automatically calculates the area of each polygon:

# Set BoroName as the index for easier reference
gdf = gdf.set_index("BoroName")

# Calculate the area
gdf["area"] = gdf.area
gdf
BoroCode Shape_Leng Shape_Area geometry area
BoroName
Staten Island 5 330470.010332 1.623820e+09 MULTIPOLYGON (((970217.022 145643.332, 970227.... 1.623822e+09
Queens 4 896344.047763 3.045213e+09 MULTIPOLYGON (((1029606.077 156073.814, 102957... 3.045214e+09
Brooklyn 3 741080.523166 1.937479e+09 MULTIPOLYGON (((1021176.479 151374.797, 102100... 1.937478e+09
Manhattan 1 359299.096471 6.364715e+08 MULTIPOLYGON (((981219.056 188655.316, 980940.... 6.364712e+08
Bronx 2 464392.991824 1.186925e+09 MULTIPOLYGON (((1012821.806 229228.265, 101278... 1.186926e+09

10.7.2. Getting Polygon Boundaries and Centroids#

To get the boundary (lines) and centroid (center point) of each polygon:

# Get the boundary of each polygon
gdf["boundary"] = gdf.boundary

# Get the centroid of each polygon
gdf["centroid"] = gdf.centroid

gdf[["boundary", "centroid"]]
boundary centroid
BoroName
Staten Island MULTILINESTRING ((970217.022 145643.332, 97022... POINT (941639.45 150931.991)
Queens MULTILINESTRING ((1029606.077 156073.814, 1029... POINT (1034578.078 197116.604)
Brooklyn MULTILINESTRING ((1021176.479 151374.797, 1021... POINT (998769.115 174169.761)
Manhattan MULTILINESTRING ((981219.056 188655.316, 98094... POINT (993336.965 222451.437)
Bronx MULTILINESTRING ((1012821.806 229228.265, 1012... POINT (1021174.79 249937.98)

10.7.3. Measuring Distance#

We can also measure the distance from each borough’s centroid to a reference point, such as the centroid of Manhattan.

# Use Manhattan's centroid as the reference point
manhattan_centroid = gdf.loc["Manhattan", "centroid"]

# Calculate the distance from each centroid to Manhattan's centroid
gdf["distance_to_manhattan"] = gdf["centroid"].distance(manhattan_centroid)
gdf[["centroid", "distance_to_manhattan"]]
centroid distance_to_manhattan
BoroName
Staten Island POINT (941639.45 150931.991) 88247.742789
Queens POINT (1034578.078 197116.604) 48401.272479
Brooklyn POINT (998769.115 174169.761) 48586.299386
Manhattan POINT (993336.965 222451.437) 0.000000
Bronx POINT (1021174.79 249937.98) 39121.024479

10.7.4. Calculating Mean Distance#

We can calculate the mean distance between the borough centroids and Manhattan:

mean_distance = gdf["distance_to_manhattan"].mean()
print(f"Mean distance to Manhattan: {mean_distance} units")
Mean distance to Manhattan: 44871.26782659276 units

10.8. Plotting Geospatial Data#

GeoPandas integrates with Matplotlib for easy plotting of geospatial data. Let’s create some maps to visualize the data.

10.8.1. Plotting the Area of Each Borough#

We can color the boroughs based on their area and display a legend:

gdf.plot("area", legend=True, figsize=(10, 6))
plt.title("NYC Boroughs by Area")
plt.show()
../../_images/c7debc38d2426ac7811433cb53a5773d483147e7500dbe19f2c030e396573ebf.png

10.8.2. Plotting Centroids and Boundaries#

We can also plot the centroids and boundaries:

# Plot the boundaries and centroids
ax = gdf["geometry"].plot(figsize=(10, 6), edgecolor="black")
gdf["centroid"].plot(ax=ax, color="red", markersize=50)
plt.title("NYC Borough Boundaries and Centroids")
plt.show()
../../_images/cef60a62b5c6600185d5b05e50181c70811552f269fb97b3fed7999ba3cc3925.png

You can also explore your data interactively using GeoDataFrame.explore(), which behaves in the same way plot() does but returns an interactive map instead.

gdf.explore("area", legend=False)
Make this Notebook Trusted to load map: File -> Trust Notebook