13. Rioxarray#

13.1. Overview#

rioxarray is an extension of the powerful Python library Xarray that focuses on geospatial raster data. It provides easy access to georeferencing information and geospatial transforms using Xarray’s labeled, multi-dimensional data structures, which makes it an ideal tool for working with geospatial data like satellite imagery or climate data.

The key feature of rioxarray is its seamless integration of rasterio’s geospatial data handling capabilities (such as CRS and affine transforms) with Xarray’s efficient multi-dimensional array handling. This allows you to manipulate, analyze, and visualize raster data with ease.

13.2. Learning Objectives#

By the end of this lecture, you should be able to:

  • Understand how rioxarray extends Xarray for geospatial data handling.

  • Load and inspect georeferenced raster datasets using rioxarray.

  • Perform basic geospatial operations, such as clipping, reprojection, and masking, using rioxarray.

  • Use rioxarray to manage CRS and spatial dimensions in raster datasets.

  • Export and visualize geospatial raster datasets.

13.3. Installing rioxarray#

To use rioxarray, you’ll need to install it along with rasterio and its dependencies. You can install it via pip by uncommenting the following line:

# %pip install rioxarray rasterio

13.4. Importing rioxarray#

You can start by importing rioxarray and other necessary libraries:

import rioxarray
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

xr.set_options(keep_attrs=True, display_expand_data=False)
<xarray.core.options.set_options at 0x7fc3ba9458b0>

13.5. Loading Georeferenced Raster Data#

One of the core functionalities of rioxarray is the ability to load georeferenced raster data, including its CRS and geospatial transformations. You can load a raster file (e.g., a GeoTIFF file) directly using rioxarray:

# Load a raster dataset using rioxarray
url = "https://github.com/opengeos/datasets/releases/download/raster/LC09_039035_20240708_90m.tif"
data = rioxarray.open_rasterio(url)
data
<xarray.DataArray (band: 7, y: 2563, x: 2485)> Size: 178MB
[44583385 values with dtype=float32]
Coordinates:
  * band         (band) int64 56B 1 2 3 4 5 6 7
  * x            (x) float64 20kB 5.824e+05 5.825e+05 ... 8.059e+05 8.06e+05
  * y            (y) float64 21kB 4.106e+06 4.105e+06 ... 3.875e+06 3.875e+06
    spatial_ref  int64 8B 0
Attributes: (12/96)
    ALGORITHM_SOURCE_SURFACE_REFLECTANCE:  LaSRC_1.6.0
    ALGORITHM_SOURCE_SURFACE_TEMPERATURE:  st_1.5.0
    CLOUD_COVER:                           0
    CLOUD_COVER_LAND:                      0
    COLLECTION_CATEGORY:                   T1
    COLLECTION_NUMBER:                     2
    ...                                    ...
    WRS_TYPE:                              2
    AREA_OR_POINT:                         Area
    _FillValue:                            -inf
    scale_factor:                          1.0
    add_offset:                            0.0
    long_name:                             ('SR_B1', 'SR_B2', 'SR_B3', 'SR_B4...

Here, rioxarray.open_rasterio loads the raster data into an Xarray DataArray and automatically attaches the geospatial metadata, including CRS, affine transformations, and spatial coordinates.

13.5.1. Inspecting the Dataset#

You can easily inspect the loaded dataset, including its dimensions, coordinates, and attributes:

# View the structure of the DataArray
data.dims  # Dimensions (e.g., band, y, x)
('band', 'y', 'x')
data.coords  # Coordinates (e.g., y, x in geographic or projected CRS)
Coordinates:
  * band         (band) int64 56B 1 2 3 4 5 6 7
  * x            (x) float64 20kB 5.824e+05 5.825e+05 ... 8.059e+05 8.06e+05
  * y            (y) float64 21kB 4.106e+06 4.105e+06 ... 3.875e+06 3.875e+06
    spatial_ref  int64 8B 0
print(data.attrs)  # Metadata (including CRS)
{'ALGORITHM_SOURCE_SURFACE_REFLECTANCE': 'LaSRC_1.6.0', 'ALGORITHM_SOURCE_SURFACE_TEMPERATURE': 'st_1.5.0', 'CLOUD_COVER': 0, 'CLOUD_COVER_LAND': 0, 'COLLECTION_CATEGORY': 'T1', 'COLLECTION_NUMBER': 2, 'DATA_SOURCE_AIR_TEMPERATURE': 'VIIRS', 'DATA_SOURCE_ELEVATION': 'GLS2000', 'DATA_SOURCE_OZONE': 'VIIRS', 'DATA_SOURCE_PRESSURE': 'Calculated', 'DATA_SOURCE_REANALYSIS': 'GEOS-5 IT', 'DATA_SOURCE_WATER_VAPOR': 'VIIRS', 'DATE_ACQUIRED': '2024-07-08', 'DATE_PRODUCT_GENERATED': 1720562288000, 'DATUM': 'WGS84', 'EARTH_SUN_DISTANCE': 1.016687, 'ELLIPSOID': 'WGS84', 'GEOMETRIC_RMSE_MODEL': 3.828, 'GEOMETRIC_RMSE_MODEL_X': 2.822, 'GEOMETRIC_RMSE_MODEL_Y': 2.586, 'GEOMETRIC_RMSE_VERIFY': 1.342, 'GRID_CELL_SIZE_REFLECTIVE': 30, 'GRID_CELL_SIZE_THERMAL': 30, 'GROUND_CONTROL_POINTS_MODEL': 1015, 'GROUND_CONTROL_POINTS_VERIFY': 244, 'GROUND_CONTROL_POINTS_VERSION': 5, 'IMAGE_QUALITY_OLI': 9, 'IMAGE_QUALITY_TIRS': 9, 'L1_DATE_PRODUCT_GENERATED': '2024-07-08T23:15:25Z', 'L1_LANDSAT_PRODUCT_ID': 'LC09_L1TP_039035_20240708_20240708_02_T1', 'L1_PROCESSING_LEVEL': 'L1TP', 'L1_PROCESSING_SOFTWARE_VERSION': 'LPGS_16.4.0', 'L1_REQUEST_ID': 190852400045, 'LANDSAT_PRODUCT_ID': 'LC09_L2SP_039035_20240708_20240709_02_T1', 'LANDSAT_SCENE_ID': 'LC90390352024190LGN00', 'LICENSE': 'https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC09_C02_T1_L2#terms-of-use', 'MAP_PROJECTION': 'UTM', 'NADIR_OFFNADIR': 'NADIR', 'ORIENTATION': 'NORTH_UP', 'OVR_RESAMPLING_ALG': 'NEAREST', 'PROCESSING_LEVEL': 'L2SP', 'PROCESSING_SOFTWARE_VERSION': 'LPGS_16.4.0', 'REFLECTANCE_ADD_BAND_1': -0.2, 'REFLECTANCE_ADD_BAND_2': -0.2, 'REFLECTANCE_ADD_BAND_3': -0.2, 'REFLECTANCE_ADD_BAND_4': -0.2, 'REFLECTANCE_ADD_BAND_5': -0.2, 'REFLECTANCE_ADD_BAND_6': -0.2, 'REFLECTANCE_ADD_BAND_7': -0.2, 'REFLECTANCE_MULT_BAND_1': 2.75e-05, 'REFLECTANCE_MULT_BAND_2': 2.75e-05, 'REFLECTANCE_MULT_BAND_3': 2.75e-05, 'REFLECTANCE_MULT_BAND_4': 2.75e-05, 'REFLECTANCE_MULT_BAND_5': 2.75e-05, 'REFLECTANCE_MULT_BAND_6': 2.75e-05, 'REFLECTANCE_MULT_BAND_7': 2.75e-05, 'REFLECTIVE_LINES': 7721, 'REFLECTIVE_SAMPLES': 7591, 'REQUEST_ID': 190888500045, 'ROLL_ANGLE': 0, 'SATURATION_BAND_1': 'N', 'SATURATION_BAND_2': 'N', 'SATURATION_BAND_3': 'N', 'SATURATION_BAND_4': 'Y', 'SATURATION_BAND_5': 'Y', 'SATURATION_BAND_6': 'Y', 'SATURATION_BAND_7': 'Y', 'SATURATION_BAND_8': 'N', 'SATURATION_BAND_9': 'N', 'SCENE_CENTER_TIME': '18:14:55.5609230Z', 'SENSOR_ID': 'OLI_TIRS', 'SPACECRAFT_ID': 'LANDSAT_9', 'STATION_ID': 'LGN', 'SUN_AZIMUTH': 119.00639877, 'SUN_ELEVATION': 66.21066536, 'system-asset_size': 944562046, 'system-index': 'LC09_039035_20240708', 'system-time_end': 1720462495560, 'system-time_start': 1720462495560, 'TARGET_WRS_PATH': 39, 'TARGET_WRS_ROW': 35, 'TEMPERATURE_ADD_BAND_ST_B10': 149, 'TEMPERATURE_MAXIMUM_BAND_ST_B10': 372.999941, 'TEMPERATURE_MINIMUM_BAND_ST_B10': 149.003418, 'TEMPERATURE_MULT_BAND_ST_B10': 0.00341802, 'THERMAL_LINES': 7721, 'THERMAL_SAMPLES': 7591, 'UTM_ZONE': 11, 'WRS_PATH': 39, 'WRS_ROW': 35, 'WRS_TYPE': 2, 'AREA_OR_POINT': 'Area', '_FillValue': np.float32(-inf), 'scale_factor': 1.0, 'add_offset': 0.0, 'long_name': ('SR_B1', 'SR_B2', 'SR_B3', 'SR_B4', 'SR_B5', 'SR_B6', 'SR_B7')}

13.5.2. Checking CRS and Transform Information#

rioxarray integrates CRS and affine transform metadata into the Xarray object:

# Check the CRS of the dataset
data.rio.crs
CRS.from_epsg(32611)
# Check the affine transformation (mapping pixel coordinates to geographic coordinates)
data.rio.transform()
Affine(90.0, 0.0, 582390.0,
       0.0, -90.0, 4105620.0)

Sometimes, raster data may not have a CRS, or the CRS could be incorrect. You can assign a CRS manually if necessary:

# If the CRS is missing or incorrect, assign a CRS
data = data.rio.write_crs("EPSG:32611", inplace=True)

13.6. Basic Geospatial Operations#

13.6.1. Reprojecting a Dataset#

Reprojecting raster data to another CRS is common in geospatial analysis. For example, you may want to reproject the dataset from its native projection to the WGS84 geographic coordinate system (EPSG:4326):

# Reproject the dataset to WGS84 (EPSG:4326)
data_reprojected = data.rio.reproject("EPSG:4326")
print(data_reprojected.rio.crs)
EPSG:4326

13.6.2. Clipping a Raster#

Clipping a raster dataset is useful when you only want to focus on a specific geographic area. You can clip a dataset using a bounding box in the same CRS as the data:

# Define a bounding box (in the same CRS as the dataset)
bbox = [-115.391, 35.982, -114.988, 36.425]

# Clip the raster to the bounding box
clipped_data = data_reprojected.rio.clip_box(*bbox)
clipped_data.shape
(7, 492, 447)

Alternatively, you can clip the raster using a vector dataset containing polygon geometries:

import geopandas as gpd

# Load a geojson with regions of interest
geojson_path = "https://github.com/opengeos/datasets/releases/download/places/las_vegas_bounds_utm.geojson"
bounds = gpd.read_file(geojson_path)

# Clip the raster to the shape
clipped_data2 = data.rio.clip(bounds.geometry, bounds.crs)
clipped_data2.shape
(7, 522, 514)

13.7. Working with Spatial Dimensions#

rioxarray supports operations on spatial dimensions (latitude/longitude or x/y coordinates) like resampling, reducing, or slicing.

13.7.1. Resampling#

To resample the raster dataset to a different resolution (e.g., 1 km), use the rio.resample method:

# Resample to 1km resolution (using an average resampling method)
resampled_data = data.rio.reproject(data.rio.crs, resolution=(1000, 1000))
resampled_data.shape
(7, 231, 224)

13.7.2. Extracting Spatial Subsets#

You can extract spatial subsets of the dataset by selecting specific coordinate ranges:

# Select a subset of the data within a lat/lon range
min_x, max_x = -115.391, -114.988
min_y, max_y = 35.982, 36.425
subset = data_reprojected.sel(
    x=slice(min_x, max_x), y=slice(max_y, min_y)
)  # Slice y in reverse order
subset.shape
(7, 491, 447)

13.8. Visualization of Georeferenced Data#

Once you have performed operations on the data, you can visualize it using matplotlib. For example, to plot a multi-band image using bands 4, 3, and 2:

# Plot the raster data
plt.figure(figsize=(8, 8))
data_reprojected.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.3)
plt.title("Landsat Image covering Las Vegas")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
../../_images/125a2a9dea5fb975919a040430486f08d185773fba5b05cf11a6410bf11fee10.png

You can also visualize clipped or masked data in the same way:

# Plot the raster data
plt.figure(figsize=(8, 8))
clipped_data.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.3)
plt.title("Landsat Image covering Las Vegas")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
../../_images/0a464cfab016e81ebb7ca9ec75c30938bfcd67a84a9137a43d262e4ac0594dec.png

For more advanced plots, such as overlaying a vector dataset on the raster data, you can combine rioxarray with geopandas and matplotlib:

# Plot raster with GeoJSON overlay
fig, ax = plt.subplots(figsize=(8, 8))
data.attrs["long_name"] = "Surface Reflectance"  # Update the long_name attribute
data.sel(band=4).plot.imshow(ax=ax, vmin=0, vmax=0.4, cmap="gray")
bounds.boundary.plot(ax=ax, color="red")
plt.title("Raster with Vector Overlay")
plt.show()
../../_images/79ee821c008a74947587075c76bb05c0941724b0ee59513192a2f477b6f56d7d.png

13.9. Saving Data#

Just like loading data, you can export rioxarray datasets to disk. For example, you can save the modified or processed raster data as a GeoTIFF file:

# Save the DataArray as a GeoTIFF file
data.rio.to_raster("output_raster.tif")

13.10. Handling NoData Values#

If your dataset contains NoData values, you can manage them using the following functions:

# Assign NoData value
data2 = data.rio.set_nodata(-9999)

# Remove NoData values (mask them)
data_clean = data2.rio.write_nodata(-9999, inplace=True)

13.11. Reproject to Multiple CRS#

You can reproject the dataset to multiple CRS and compare them. For instance:

# Reproject to WGS 84 (EPSG:4326)
data = data.rio.reproject("EPSG:4326")
print(data.rio.crs)
EPSG:4326
# Reproject to EPSG:3857 (Web Mercator)
mercator_data = data.rio.reproject("EPSG:3857")
print(mercator_data.rio.crs)
EPSG:3857
# Plot the raster data in WGS84
plt.figure(figsize=(6, 6))
data.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.3)
plt.title("EPSG:4326")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
../../_images/63513d7ca8b786c6205bb6cc4d888bb92bb65ac4bb91a87cc2d35e3c5a8374de.png
# Plot the raster data in Web Mercator
plt.figure(figsize=(6, 6))
mercator_data.sel(band=[4, 3, 2]).plot.imshow(vmin=0, vmax=0.3)
plt.title("EPSG:3857")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
../../_images/d1057f824fd9b3507e7fe748c054ffaf2df2bbb1336cb4a6381076cebffea705.png

13.12. Basic Band Math (NDVI Calculation)#

Band math enables us to perform computations across different bands. A common application is calculating the Normalized Difference Vegetation Index (NDVI), which is an indicator of vegetation health.

NDVI is calculated as:

NDVI = (NIR - Red) / (NIR + Red)

We can compute and plot the NDVI as follows:

# Select the red (band 4) and NIR (band 5) bands
red_band = data.sel(band=4)
nir_band = data.sel(band=5)

# Calculate NDVI
ndvi = (nir_band - red_band) / (nir_band + red_band)
ndvi = ndvi.clip(min=-1, max=1)  # Clip values to the range [-1, 1]
ndvi.attrs["long_name"] = "NDVI"

To visualize the NDVI, we can plot it using matplotlib:

# Plot the NDVI values
ndvi.plot(cmap="RdYlGn", vmin=-1, vmax=1)
plt.title("NDVI of the Landsat Image")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
../../_images/5e5a0f008e666ca1a79f9468fea44f045914ce4ca13c3f1751bc11483fc9c093.png

You can also mask out non-vegetated areas or areas with invalid NDVI values (such as water or urban regions) by applying a threshold:

# Mask out non-vegetated areas (NDVI < 0.2)
ndvi_clean = ndvi.where(ndvi > 0.2)
ndvi_clean.plot(cmap="Greens", vmin=0.2, vmax=0.5)
plt.title("Cleaned NDVI (non-vegetated areas masked)")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
../../_images/75f40f282a6770053bfb099fb03d015beb44951fbd22f9b9b26e0c99e1b58314.png

13.13. Exercises#

13.13.1. Sample Dataset#

For the exercises, we will use a sample GeoTIFF raster dataset of Libya, which is available at the following URL: opengeos/datasets

13.13.2. Exercise 1: Load and Inspect a Raster Dataset#

  1. Use rioxarray to load the GeoTIFF raster file.

  2. Inspect the dataset by printing its dimensions, coordinates, and attributes.

  3. Check and print the CRS and affine transformation of the dataset.

13.13.3. Exercise 2: Reproject the Raster to a New CRS#

  1. Reproject the loaded raster dataset from its original CRS to EPSG:4326 (WGS84).

  2. Print the new CRS and check the dimensions and coordinates of the reprojected data.

  3. Plot the original and reprojected datasets for comparison.

13.13.4. Exercise 3: Clip the Raster Using a Bounding Box#

  1. Define a bounding box (e.g., xmin, ymin, xmax, ymax) that covers the land area of Libya.

  2. Clip the raster dataset using this bounding box.

  3. Plot the clipped data to visualize the result.

13.13.5. Exercise 4: Mask the Raster Using a Vector Dataset#

  1. Load the GeoJSON file at opengeos/datasets using geopandas.

  2. Use the GeoJSON to mask the raster dataset, keeping only the data within the GeoJSON boundaries.

  3. Plot the masked raster data.

13.13.6. Exercise 5: Resample the Raster to a Different Resolution#

  1. Resample the raster dataset to a 3m resolution, using an average resampling method.

  2. Check the new dimensions and coordinates after resampling.

  3. Save the resampled raster dataset as a new GeoTIFF file.

13.14. Summary#

In this lecture, we have explored the basic functionality of rioxarray, a powerful extension of Xarray designed for geospatial raster data. Key points include:

  • Loading and inspecting geospatial raster data with CRS and transform metadata.

  • Performing essential geospatial operations, such as reprojection, clipping, and masking.

  • Visualizing and exporting raster data.

  • Working with spatial dimensions (x, y) using slicing, resampling, and other operations.

By integrating Xarray’s multi-dimensional data handling capabilities with rasterio’s geospatial features, rioxarray makes it easier to manage and analyze geospatial raster datasets. It is a versatile tool for anyone working with geospatial data in scientific computing, environmental analysis, or remote sensing.