15. Whitebox#

Open In Colab

15.1. Overview#

In this lecture, we will explore the use of WhiteboxTools, a powerful open-source library for performing geospatial analysis. Specifically, we will focus on two key applications: watershed analysis and LiDAR data analysis. You will learn how to manipulate geospatial data using Python, conduct hydrological analysis, and derive digital elevation models (DEMs) and canopy height models (CHMs) from LiDAR data.

This lecture is structured into two main sections:

  1. Watershed Analysis: Using DEMs and hydrological tools to delineate watersheds, calculate flow accumulation, and extract stream networks.

  2. LiDAR Data Analysis: Processing LiDAR point cloud data to derive DEMs, DSMs, and CHMs while removing outliers and improving data quality.

By the end of this session, you will have gained hands-on experience with WhiteboxTools and leafmap, allowing you to perform a wide range of geospatial and hydrological analyses.

15.2. Learning Objectives#

By the end of this lecture, you will be able to:

  • Install and configure WhiteboxTools and leafmap for geospatial analysis.

  • Create interactive maps to visualize basemaps and geospatial datasets.

  • Perform watershed analysis by delineating watersheds, flow directions, and stream networks.

  • Manipulate and analyze Digital Elevation Models (DEMs) to conduct hydrological modeling.

  • Process and analyze LiDAR data to generate Digital Surface Models (DSMs), Digital Elevation Models (DEMs), and Canopy Height Models (CHMs).

  • Integrate WhiteboxTools with Python workflows to automate geospatial analysis.

15.3. Introduction#

Below is a brief introduction to Whitebox, a powerful open-source library for geospatial analysis. For more information, please refer to the whiteboxgeo website: https://www.whiteboxgeo.com.

15.3.1. What is Whitebox?#

Whitebox is geospatial data analysis software originally developed at the University of Guelph‘s Geomorphometry and Hydrogeomatics Research Group (GHRG) directed by Dr. John Lindsay. Whitebox contains over 550 tools for processing many types of geospatial data. It has many great features such as its extensive use of parallel computing, it doesn’t need other libraries to be installed (e.g., GDAL), it can be used from scripting environments, and it easily plugs into other geographical information system (GIS) software. The Whitebox Toolset Extension provides even more power.

15.3.2. What can Whitebox do?#

Whitebox can be used to perform common GIS and remote sensing analysis tasks. Whitebox also contains advanced tooling for spatial hydrological analysis and LiDAR data processing. Whitebox is not a cartographic or spatial data visualization package; instead it’s meant to serve as an analytical back-end for other data visualization software, like QGIS and ArcGIS.

15.3.3. How is Whitebox different?#

Whitebox doesn’t compete with QGIS, ArcGIS/Pro, and ArcPy but rather it extends them. You can plug WhiteboxTools into QGIS and ArcGIS and it’ll provide hundreds of additional tools for analyzing all kinds of geospatial data. You can also call Whitebox functions from Python scripts using Whitebox Workflows (WbW). Combine WbW with ArcPy to more effectively automate your data analysis workflows and streamline your geoprocessing solutions.

There are many tools in Whitebox that you won’t find elsewhere. You can think of Whitebox as a portable, cross-platform GIS analysis powerhouse, allowing you to extend your preferred GIS or to embed Whitebox capabilities into your automated scripted workflows. Oh, and it’s fast, really fast!

15.4. Useful Resources for Whitebox#

15.5. Installation#

To get started, we need to install the required packages, such as leafmap, and whitebox. Uncomment the following lines to install the packages.

# %pip install "leafmap[raster]" "leafmap[lidar]" mapclassify
# %pip install numpy==1.26.4

15.6. Import libraries#

import os
import leafmap
import numpy as np

Some of the raster datasets generated by whitebox will be int32 type with a nodata value of -32768. To make it easier to visualize these datasets, we set the nodata value as an environment variable, which will be used by leafmap to set the nodata value when visualizing the raster datasets.

os.environ["NODATA"] = "-32768"

15.7. Part 1: Watershed Analysis#

15.7.1. Create Interactive Maps#

To perform watershed analysis, we first create an interactive map using leafmap. This step allows us to visualize different basemaps.

m = leafmap.Map()
m.add_basemap("OpenTopoMap")
m.add_basemap("USGS 3DEP Elevation")
m.add_basemap("USGS Hydrography")
m

15.7.2. Download Watershed Data#

Next, we download watershed data for the Calapooia River basin in Oregon. We’ll use the latitude and longitude of a point in the basin to extract watershed boundary data.

lat = 44.361169
lon = -122.821802

m = leafmap.Map(center=[lat, lon], zoom=10)
m.add_marker([lat, lon])
m

Download the watershed data and visualize it:

geometry = {"x": lon, "y": lat}
gdf = leafmap.get_wbd(geometry, digit=10, return_geometry=True)
gdf.explore()
Make this Notebook Trusted to load map: File -> Trust Notebook

Save the watershed boundary to a GeoJSON file:

gdf.to_file("basin.geojson")

15.7.3. Download and Display DEM#

We download a Digital Elevation Model (DEM) from the USGS 3DEP Elevation service to analyze the terrain of the watershed. The DEM will be used to delineate watersheds, calculate flow accumulation, and extract stream networks. The leafmap.get_3dep_dem() function returns the DEM as an xarray.DataArray object. Optionally, you can save the DEM to a GeoTIFF file by setting the output parameter.

array = leafmap.get_3dep_dem(
    gdf,
    resolution=30,
    output="dem.tif",
    dst_crs="EPSG:3857",
    to_cog=True,
    overwrite=True,
)
array
Reading input: /home/runner/work/geog-312/geog-312/book/geospatial/dem.tif

Adding overviews...
Updating dataset tags...
Writing output to: /home/runner/work/geog-312/geog-312/book/geospatial/dem.tif
<xarray.DataArray 'elevation' (y: 1511, x: 2804)> Size: 17MB
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
  * x            (x) float64 22kB -1.371e+07 -1.371e+07 ... -1.362e+07
  * y            (y) float64 12kB 5.547e+06 5.547e+06 ... 5.498e+06 5.498e+06
    band         int64 8B 1
    spatial_ref  int64 8B 0
Attributes:
    scale_factor:         1.0
    add_offset:           0.0
    units:                m
    vertical_datum:       NAVD88
    vertical_resolution:  0.001
    _FillValue:           nan

Visualize the DEM on the map:

m.add_raster("dem.tif", palette="terrain", nodata=np.nan, layer_name="DEM")
m

15.7.4. Get DEM metadata#

We can get the metadata of the DEM, such as the spatial resolution, bounding box, and coordinate reference system (CRS).

metadata = leafmap.image_metadata("dem.tif")
metadata
{'bounds': {'left': -123.146944,
  'bottom': 44.209447,
  'right': -122.321296,
  'top': 44.5275},
 'minzoom': 9,
 'maxzoom': 12,
 'band_metadata': [('b1', {})],
 'band_descriptions': [('b1', 'elevation')],
 'dtype': 'float32',
 'nodata_type': 'Nodata',
 'colorinterp': ['gray'],
 'scales': [1.0],
 'offsets': [0.0],
 'colormap': None,
 'driver': 'GTiff',
 'count': 1,
 'width': 2804,
 'height': 1511,
 'overviews': [2, 4],
 'nodata_value': nan,
 'nodata': nan,
 'crs': 'PROJCS["WGS 84 / Pseudo-Mercator",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Mercator_1SP"],PARAMETER["central_meridian",0],PARAMETER["scale_factor",1],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs"],AUTHORITY["EPSG","3857"]]',
 'transform': [32.77843952299033,
  0.0,
  -13708655.097672503,
  0.0,
  -32.7784395229904,
  5547440.188351146,
  0.0,
  0.0,
  1.0]}

Get a summary statistics of the DEM.

15.7.5. Add colorbar#

Add a colorbar to the map to show the elevation values. Use the image_min_max() function to get the minimum and maximum values of the DEM.

leafmap.image_min_max("dem.tif")
(69.4413833618164, 1564.8543701171875)
m.add_colormap(cmap="terrain", vmin="60", vmax=1500, label="Elevation (m)")

15.7.6. Initialize WhiteboxTools#

Initialize the WhiteboxTools class.

wbt = leafmap.WhiteboxTools()
Downloading WhiteboxTools pre-compiled binary for first time use ...
Downloading WhiteboxTools binary from https://www.whiteboxgeo.com/WBT_Linux/WhiteboxTools_linux_amd64.zip
Decompressing WhiteboxTools_linux_amd64.zip ...
WhiteboxTools package directory: /home/runner/work/geog-312/geog-312/.venv/lib/python3.12/site-packages/whitebox
Downloading testdata ...

Check the WhiteboxTools version.

wbt.version()
"WhiteboxTools v2.4.0 (c) Dr. John Lindsay 2017-2023\n\nWhiteboxTools is an advanced geospatial data analysis platform developed at\nthe University of Guelph's Geomorphometry and Hydrogeomatics Research \nGroup (GHRG). See www.whiteboxgeo.com for more details.\n"

Display the WhiteboxTools interface, which lists all available tools. You can use this interface to search for specific tools. You can also run any tool using the interface. However, we will use the Python API to run the tools.

leafmap.whiteboxgui()

15.7.7. Set working directory#

Set the working directory to save the intermediate and output files. Set wbt.version=False to suppress the Whitebox processing log.

wbt.set_working_dir(os.getcwd())
wbt.verbose = False

15.7.8. Smooth DEM#

All WhiteboxTools functions will return 0 if they are successful, and 1 if they are not.

Smoothing the DEM enhances the quality of subsequent hydrological analysis.

wbt.feature_preserving_smoothing("dem.tif", "smoothed.tif", filter=9)
0

Visualize the smoothed DEM and watershed boundary on the map.

Visualize the smoothed DEM:

m = leafmap.Map()
m.add_basemap("Satellite")
m.add_raster("smoothed.tif", colormap="terrain", layer_name="Smoothed DEM")
m.add_geojson("basin.geojson", layer_name="Watershed", info_mode=None)
m.add_basemap("USGS Hydrography", show=False)
m

15.7.9. Create hillshade#

Create a hillshade from the smoothed DEM.

wbt.hillshade("smoothed.tif", "hillshade.tif", azimuth=315, altitude=35)
0

Overlay the hillshade on the smoothed DEM with transparency.

m.add_raster("hillshade.tif", layer_name="Hillshade")
m.layers[-1].opacity = 0.6

15.7.10. Find no-flow cells#

Find cells with undefined flow, i.e. no valid flow direction, based on the D8 flow direction algorithm.

wbt.find_no_flow_cells("smoothed.tif", "noflow.tif")
0

Visualize the no-flow cells on the map.

m.add_raster("noflow.tif", layer_name="No Flow Cells")

15.7.11. Fill depressions#

First, we fill any depressions in the DEM to ensure proper flow simulation.

wbt.fill_depressions("smoothed.tif", "filled.tif")
0

Alternatively, you can use depression breaching to fill the depressions.

wbt.breach_depressions("smoothed.tif", "breached.tif")
0
wbt.find_no_flow_cells("breached.tif", "noflow2.tif")
0
m.layers[-1].visible = False
m.add_raster("noflow2.tif", layer_name="No Flow Cells after Breaching")
m

15.7.12. Delineate flow direction#

Next, we delineate the flow direction based on the D8 algorithm.

wbt.d8_pointer("breached.tif", "flow_direction.tif")
0

15.7.13. Calculate flow accumulation#

Now, calculate flow accumulation to understand how water collects across the landscape.

wbt.d8_flow_accumulation("breached.tif", "flow_accum.tif")
0
m.add_raster("flow_accum.tif", layer_name="Flow Accumulation")

15.7.14. Extract streams#

Extract the stream network using the flow accumulation data.

wbt.extract_streams("flow_accum.tif", "streams.tif", threshold=5000)
0
m.layers[-1].visible = False
m.add_raster("streams.tif", layer_name="Streams")

15.7.15. Calculate distance to outlet#

wbt.distance_to_outlet(
    "flow_direction.tif", streams="streams.tif", output="distance_to_outlet.tif"
)
0
m.add_raster("distance_to_outlet.tif", layer_name="Distance to Outlet")

15.7.16. Vectorize streams#

wbt.raster_streams_to_vector(
    "streams.tif", d8_pntr="flow_direction.tif", output="streams.shp"
)
0

The raster_streams_to_vector tool has a bug. The output vector file is missing the coordinate system. Use leafmap.vector_set_crs() to set the coordinate system.

leafmap.vector_set_crs(source="streams.shp", output="streams.shp", crs="EPSG:3857")
m.add_shp(
    "streams.shp",
    layer_name="Streams Vector",
    style={"color": "#ff0000", "weight": 3},
    info_mode=None,
)
m

You can turn on the USGS Hydrography basemap to visualize the stream network and compare it with the extracted stream network.

15.7.17. Delineate the longest flow path#

You can delineate the longest flow path in the watershed.

wbt.basins("flow_direction.tif", "basins.tif")
0
wbt.longest_flowpath(
    dem="breached.tif", basins="basins.tif", output="longest_flowpath.shp"
)
0

Select only the longest flow path.

leafmap.select_largest(
    "longest_flowpath.shp", column="LENGTH", output="longest_flowpath.shp"
)
m.add_shp(
    "longest_flowpath.shp",
    layer_name="Longest Flowpath",
    style={"color": "#ff0000", "weight": 3},
)
m

15.7.18. Generate a pour point#

To delineate a watershed, you need to specify a pour point. You can use the outlet of the longest flow path as the pour point or specify a different point. Use the drawing tool to place a marker on the map to specify the pour point. If no marker is placed, the default pour point below will be used.

if m.user_roi is not None:
    m.save_draw_features("pour_point.shp", crs="EPSG:3857")
else:
    lat = 44.284642
    lon = -122.611217
    leafmap.coords_to_vector([lon, lat], output="pour_point.shp", crs="EPSG:3857")
    m.add_marker([lat, lon])

15.7.19. Snap pour point to stream#

Snap the pour point to the nearest stream.

wbt.snap_pour_points(
    "pour_point.shp", "flow_accum.tif", "pour_point_snapped.shp", snap_dist=300
)
0

Visualize the snapped pour point on the map.

m.add_shp("pour_point_snapped.shp", layer_name="Pour Point", info_mode=False)

15.7.20. Delineate watershed#

Delineate the watershed using a pour point and the flow direction data.

wbt.watershed("flow_direction.tif", "pour_point_snapped.shp", "watershed.tif")
0

Visualize the watershed boundary on the map.

m.add_raster("watershed.tif", layer_name="Watershed")
m

15.7.21. Convert watershed raster to vector#

You can convert the watershed raster to a vector file.

wbt.raster_to_vector_polygons("watershed.tif", "watershed.shp")
0

Visualize the watershed boundary on the map.

m.layers[-1].visible = False
m.add_shp(
    "watershed.shp",
    layer_name="Watershed Vector",
    style={"color": "#ffff00", "weight": 3},
    info_mode=False,
)

15.8. Part 2: LiDAR Data Analysis#

In this section, we will process LiDAR data to derive Digital Surface Models (DSMs), Digital Elevation Models (DEMs), and Canopy Height Models (CHMs). We will also remove outliers and improve the quality of the LiDAR data.

15.8.1. Set up whitebox#

First, we set up the WhiteboxTools and leafmap libraries.

import leafmap
wbt = leafmap.WhiteboxTools()
wbt.set_working_dir(os.getcwd())
wbt.verbose = False

15.8.2. Download a sample dataset#

We download a sample LiDAR dataset for Madison.

url = "https://github.com/opengeos/datasets/releases/download/lidar/madison.zip"
filename = "madison.las"
leafmap.download_file(url, "madison.zip", quiet=True)
'/home/runner/work/geog-312/geog-312/book/geospatial/madison.zip'

15.8.3. Read LAS/LAZ data#

Load and inspect the LiDAR data:

laz = leafmap.read_lidar(filename)
laz
<LasData(1.3, point fmt: <PointFormat(1, 0 bytes of extra dims)>, 4068294 points, 2 vlrs)>
str(laz.header.version)
'1.3'

15.8.4. Upgrade file version#

Upgrade the LAS file version to 1.4.

las = leafmap.convert_lidar(laz, file_version="1.4")
str(las.header.version)
'1.4'

15.8.5. Write LAS data#

Save the LAS data to a new file.

leafmap.write_lidar(las, "madison.las")

15.8.6. Histogram analysis#

Plot the histogram of the LiDAR data. The histogram shows the distribution of the LiDAR points based on their elevation values. The tool generates a histogram of the LiDAR data and saves it to an HTM file. You can open the HTM file in a web browser to view the histogram.

wbt.lidar_histogram("madison.las", "histogram.html")
0

15.8.7. Visualize LiDAR data#

Run the view_lidar() function to visualize the LiDAR data in 3D. Note that the view_lidar() function may not work in some environments, such as Google Colab.

leafmap.view_lidar("madison.las")
Something went wrong.
PyVista must be installed. Try `pip install pyvista`

15.8.8. Remove outliers#

Remove outliers from the LiDAR dataset:

wbt.lidar_elevation_slice("madison.las", "madison_rm.las", minz=0, maxz=450)
0

15.8.9. Visualize LiDAR data after removing outliers#

We can visualize the LiDAR data after removing the outliers.

leafmap.view_lidar("madison_rm.las", cmap="terrain")
Something went wrong.
PyVista must be installed. Try `pip install pyvista`

15.8.10. Create DSM#

Using the LiDAR data to create a Digital Surface Model (DSM).

wbt.lidar_digital_surface_model(
    "madison_rm.las", "dsm.tif", resolution=1.0, minz=0, maxz=450
)
0

The DSM generated by whitebox is missing the coordinate system. Use leafmap.raster_set_crs() to set the coordinate system.

leafmap.add_crs("dsm.tif", epsg=2255)

15.8.11. Visualize DSM#

Visualize the DSM on the map.

m = leafmap.Map()
m.add_basemap("Satellite")
m.add_raster("dsm.tif", colormap="terrain", layer_name="DSM")
m

15.8.12. Create DEM#

We can create a bare-earth DEM from the DSM. The tool is typically applied to LiDAR DEMs which frequently contain numerous off-terrain objects (OTOs) such as buildings, trees and other vegetation, cars, fences and other anthropogenic objects.

wbt.remove_off_terrain_objects("dsm.tif", "dem.tif", filter=25, slope=15.0)
0

15.8.13. Visualize DEM#

Visualize the bear-earth DEM on the map.

m.add_raster("dem.tif", palette="terrain", layer_name="DEM")
m

15.8.14. Create CHM#

We can a Canopy Height Model (CHM) by subtracting the DEM from the DSM.

chm = wbt.subtract("dsm.tif", "dem.tif", "chm.tif")

Visualize the CHM on the map.

m.add_raster("chm.tif", palette="gist_earth", layer_name="CHM")
m.add_layer_manager()
m

15.9. Summary#

This lecture provided a comprehensive introduction to WhiteboxTools, an open-source geospatial analysis library with a focus on hydrological and LiDAR data analysis. Through this session, students learned to install and configure WhiteboxTools in Python, integrate it with leafmap for interactive mapping, and apply it to specific geospatial tasks.

Key takeaways from this lecture include:

  • Watershed Analysis: The lecture covered watershed delineation using Digital Elevation Models (DEMs), including techniques like flow direction, flow accumulation, stream extraction, and watershed boundary delineation.

  • LiDAR Data Processing: Students were introduced to LiDAR data manipulation, including the derivation of Digital Surface Models (DSMs), Digital Elevation Models (DEMs), and Canopy Height Models (CHMs), along with methods for outlier removal and terrain quality improvement.

By completing the hands-on exercises, students gained practical experience with WhiteboxTools for geospatial processing, preparing them to apply these techniques in varied real-world GIS workflows.