10. GeoPandas#
10.1. Overview#
GeoPandas is an open-source Python library that simplifies working with geospatial data by extending Pandas data structures. It seamlessly integrates geospatial operations with a pandas-like interface, allowing for the manipulation of geometric types such as points, lines, and polygons. GeoPandas combines the functionalities of Pandas and Shapely, enabling geospatial operations like spatial joins, buffering, intersections, and projections with ease.
10.2. Learning Objectives#
By the end of this lecture, you should be able to:
Understand the basic data structures in GeoPandas:
GeoDataFrame
andGeoSeries
.Create
GeoDataFrames
from tabular data and geometric shapes.Read and write geospatial data formats like Shapefile and GeoJSON.
Perform common geospatial operations such as measuring areas, distances, and spatial relationships.
Visualize geospatial data using Matplotlib and GeoPandas’ built-in plotting functions.
Work with different Coordinate Reference Systems (CRS) and project geospatial data.
10.3. Concepts#
The core data structures in GeoPandas are GeoDataFrame
and GeoSeries
. A GeoDataFrame
extends the functionality of a Pandas DataFrame by adding a geometry column, allowing spatial data operations on geometric shapes. The GeoSeries
handles geometric data (points, polygons, etc.).
A GeoDataFrame
can have multiple geometry columns, but only one is considered the active geometry at any time. All spatial operations are applied to this active geometry, accessible via the .geometry
attribute.
10.4. Installing and Importing GeoPandas#
Before we begin, make sure you have geopandas installed. You can install it using:
# %pip install geopandas
Once installed, import GeoPandas and other necessary libraries:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
10.5. Creating GeoDataFrames#
A GeoDataFrame is a tabular data structure that contains a geometry
column, which holds the geometric shapes. You can create a GeoDataFrame from a list of geometries or from a pandas DataFrame.
# Creating a GeoDataFrame from scratch
data = {
"City": ["Tokyo", "New York", "London", "Paris"],
"Latitude": [35.6895, 40.7128, 51.5074, 48.8566],
"Longitude": [139.6917, -74.0060, -0.1278, 2.3522],
}
df = pd.DataFrame(data)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude))
gdf
City | Latitude | Longitude | geometry | |
---|---|---|---|---|
0 | Tokyo | 35.6895 | 139.6917 | POINT (139.6917 35.6895) |
1 | New York | 40.7128 | -74.0060 | POINT (-74.006 40.7128) |
2 | London | 51.5074 | -0.1278 | POINT (-0.1278 51.5074) |
3 | Paris | 48.8566 | 2.3522 | POINT (2.3522 48.8566) |
10.6. Reading and Writing Geospatial Data#
GeoPandas allows reading and writing a variety of geospatial formats, such as Shapefiles, GeoJSON, and more. We’ll use a GeoJSON dataset of New York City borough boundaries.
10.6.1. Reading a GeoJSON File#
We’ll load the New York boroughs dataset from a GeoJSON file hosted online.
url = "https://github.com/opengeos/datasets/releases/download/vector/nybb.geojson"
gdf = gpd.read_file(url)
gdf.head()
BoroCode | BoroName | Shape_Leng | Shape_Area | geometry | |
---|---|---|---|---|---|
0 | 5 | Staten Island | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((970217.022 145643.332, 970227.... |
1 | 4 | Queens | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((1029606.077 156073.814, 102957... |
2 | 3 | Brooklyn | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((1021176.479 151374.797, 102100... |
3 | 1 | Manhattan | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((981219.056 188655.316, 980940.... |
4 | 2 | Bronx | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((1012821.806 229228.265, 101278... |
This GeoDataFrame
contains several columns, including BoroName
, which represents the names of the boroughs, and geometry
, which stores the polygons for each borough.
10.6.2. Writing to a GeoJSON File#
GeoPandas also supports saving geospatial data back to disk. For example, we can save the GeoDataFrame as a new GeoJSON file:
output_file = "nyc_boroughs.geojson"
gdf.to_file(output_file, driver="GeoJSON")
print(f"GeoDataFrame has been written to {output_file}")
GeoDataFrame has been written to nyc_boroughs.geojson
Similarly, you can write GeoDataFrames to other formats, such as Shapefiles, GeoPackage, and more.
output_file = "nyc_boroughs.shp"
gdf.to_file(output_file)
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 1623819823.80999994 of field Shape_Area of feature 0 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 3045212795.19999981 of field Shape_Area of feature 1 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 1937478507.6099999 of field Shape_Area of feature 2 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 636471539.774000049 of field Shape_Area of feature 3 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
/home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/pyogrio/raw.py:723: RuntimeWarning: Value 1186924686.49000001 of field Shape_Area of feature 4 not successfully written. Possibly due to too larger number with respect to field width
ogr_write(
output_file = "nyc_boroughs.gpkg"
gdf.to_file(output_file, driver="GPKG")
10.7. Simple Accessors and Methods#
Now that we have the data, let’s explore some simple GeoPandas methods to manipulate and analyze the geometric data.
10.7.1. Measuring Area#
We can calculate the area of each borough. GeoPandas automatically calculates the area of each polygon:
# Set BoroName as the index for easier reference
gdf = gdf.set_index("BoroName")
# Calculate the area
gdf["area"] = gdf.area
gdf
BoroCode | Shape_Leng | Shape_Area | geometry | area | |
---|---|---|---|---|---|
BoroName | |||||
Staten Island | 5 | 330470.010332 | 1.623820e+09 | MULTIPOLYGON (((970217.022 145643.332, 970227.... | 1.623822e+09 |
Queens | 4 | 896344.047763 | 3.045213e+09 | MULTIPOLYGON (((1029606.077 156073.814, 102957... | 3.045214e+09 |
Brooklyn | 3 | 741080.523166 | 1.937479e+09 | MULTIPOLYGON (((1021176.479 151374.797, 102100... | 1.937478e+09 |
Manhattan | 1 | 359299.096471 | 6.364715e+08 | MULTIPOLYGON (((981219.056 188655.316, 980940.... | 6.364712e+08 |
Bronx | 2 | 464392.991824 | 1.186925e+09 | MULTIPOLYGON (((1012821.806 229228.265, 101278... | 1.186926e+09 |
10.7.2. Getting Polygon Boundaries and Centroids#
To get the boundary (lines) and centroid (center point) of each polygon:
# Get the boundary of each polygon
gdf["boundary"] = gdf.boundary
# Get the centroid of each polygon
gdf["centroid"] = gdf.centroid
gdf[["boundary", "centroid"]]
boundary | centroid | |
---|---|---|
BoroName | ||
Staten Island | MULTILINESTRING ((970217.022 145643.332, 97022... | POINT (941639.45 150931.991) |
Queens | MULTILINESTRING ((1029606.077 156073.814, 1029... | POINT (1034578.078 197116.604) |
Brooklyn | MULTILINESTRING ((1021176.479 151374.797, 1021... | POINT (998769.115 174169.761) |
Manhattan | MULTILINESTRING ((981219.056 188655.316, 98094... | POINT (993336.965 222451.437) |
Bronx | MULTILINESTRING ((1012821.806 229228.265, 1012... | POINT (1021174.79 249937.98) |
10.7.3. Measuring Distance#
We can also measure the distance from each borough’s centroid to a reference point, such as the centroid of Manhattan.
# Use Manhattan's centroid as the reference point
manhattan_centroid = gdf.loc["Manhattan", "centroid"]
# Calculate the distance from each centroid to Manhattan's centroid
gdf["distance_to_manhattan"] = gdf["centroid"].distance(manhattan_centroid)
gdf[["centroid", "distance_to_manhattan"]]
centroid | distance_to_manhattan | |
---|---|---|
BoroName | ||
Staten Island | POINT (941639.45 150931.991) | 88247.742789 |
Queens | POINT (1034578.078 197116.604) | 48401.272479 |
Brooklyn | POINT (998769.115 174169.761) | 48586.299386 |
Manhattan | POINT (993336.965 222451.437) | 0.000000 |
Bronx | POINT (1021174.79 249937.98) | 39121.024479 |
10.7.4. Calculating Mean Distance#
We can calculate the mean distance between the borough centroids and Manhattan:
mean_distance = gdf["distance_to_manhattan"].mean()
print(f"Mean distance to Manhattan: {mean_distance} units")
Mean distance to Manhattan: 44871.26782659276 units
10.8. Plotting Geospatial Data#
GeoPandas integrates with Matplotlib for easy plotting of geospatial data. Let’s create some maps to visualize the data.
10.8.1. Plotting the Area of Each Borough#
We can color the boroughs based on their area and display a legend:
gdf.plot("area", legend=True, figsize=(10, 6))
plt.title("NYC Boroughs by Area")
plt.show()

10.8.2. Plotting Centroids and Boundaries#
We can also plot the centroids and boundaries:
# Plot the boundaries and centroids
ax = gdf["geometry"].plot(figsize=(10, 6), edgecolor="black")
gdf["centroid"].plot(ax=ax, color="red", markersize=50)
plt.title("NYC Borough Boundaries and Centroids")
plt.show()

You can also explore your data interactively using GeoDataFrame.explore()
, which behaves in the same way plot()
does but returns an interactive map instead.
gdf.explore("area", legend=False)