Benchmark Dataset Fills the Gaps in Glacial Mass Modeling

Calving might be the most important natural phenomenon you’ve never heard of. It may sound like an exercise you missed on leg day, but don’t let the unassuming moniker deceive you. It’s a glacial process, yet it is anything but slow! Understanding calving is essential to modeling Earth’s climate systems and monitoring climate change.

Using computer vision, a team of researchers from Friedrich-Alexander-Universität Erlangen-Nürnberg in Germany is pushing the boundaries of our ability to map literal glacial boundaries (known as calving fronts) with their new dataset, CaFFe. (Not to be confused with the deep learning framework with a similar name!)

In this article, we’ll cover:

What is calving
The challenges of mapping calving fronts
Overview of the CaFFe Dataset
How to work with the CaFFe Dataset

Glacial Calving

Credit: Wikipedia - Marianocecowski

Calving, also known as ice calving or glacial calving is a process in which chunks of ice at the edge of a marine-terminating (as opposed to a land-terminating) glacier break off. These renegade chunks of ice can vary wildly in size — the Antarctic ice shelves can calve icebergs that themselves are more than 80 kilometers long!

Calving events can also be violent. Even a relatively small initial calving can set off a chain reaction, leading larger icebergs to split off in rapid succession. When the chunks of ice split off from a marine-terminating glacier and fall into the ocean, their sheer mass and volume can trigger enormous waves. These events can be jaw-dropping. Here is just one instance that was caught on camera.

https://youtu.be/hC3VTgIPoGU Credit: "CHASING ICE" captures largest glacier calving ever filmed - OFFICIAL VIDEO - Exposure Labs Along with melting, sublimation, and evaporation, calving is one of the core processes that constitute ice ablation, or the loss of mass in glaciers. Indeed, calving alone may account for up to 30% of the ice lost at glacial fronts! This makes glacial calving a major factor of sea level rise contribution by ice mass loss. Modeling climate systems is a notoriously difficult task, and glacial calving — and its contribution to sea level rise — is no exception. For example, Marine Ice Sheets, which are grounded below sea level, are thought to be susceptible to an instability that could lead to accelerated sea level rise. In order to accurately predict sea level rise and track changes in our climate, it is therefore essential to accurately measure glacial calving over time.

Finding the Fronts

To parameterize their computer models for glacial behavior, scientists look at the calving fronts of glaciers, which are essentially the boundaries between glaciers undergoing calving processes, and the surrounding water. By tracking the positions of these fronts over time, scientists can measure the rate at which calving occurs, and how the shape of the front evolves as a result of these processes. For the last few decades, climate scientists have employed optical satellite imagery (taken with cameras that operate on the visible light spectrum), in order to document the glacial changes. Expert annotators are tasked with delineating calving fronts in these typically high-resolution images, and this annotated data is used to train a machine learning model. The trained models can then be applied to images taken over time to quantify frontal ablation. This approach has been demonstrated to work for multiple glaciers! However, automated pipelines based on optical imagery are inherently limited by one essential ingredient: the sun’s illumination. Because optical satellites rely on light, they are unable to capture meaningful information at night or during the polar night. These satellites are also largely impeded by cloud coverage, snow cover, and other weather conditions. Synthetic aperture radar (SAR) provides an intriguing alternative to optical imagery which comes with its own set of pros and cons. While it is typically lower resolution, and has lower spatial precision than optical satellite imagery, the principle on which SAR operates — radio waves — is immune to changes in literal brightness. Consequently, SAR images can be taken with greater frequency and regularity, capturing data year-round.

CaFFeinating the Search for Calving Fronts

With global stores of SAR imagery proliferating in recent years due to satellites like Sentinel-1, the allure of SAR-centered calving front ML pipelines has only grown. And as with any new machine learning or computer vision task, evaluating model performance is only possible when there is a high-quality benchmark to evaluate against. In Calving fronts and where to find them, Nora Gourmelon et al. introduce a new benchmark dataset for calving front delineation in SAR images: CaFFe. CaFFe consists of 681 SAR images of marine-terminating glaciers in the Arctic, Greenland, and Antarctic ice sheets. The images, which are taken by multiple satellites over multiple missions, span two and a half decades, from 1995 to 2020. The images also vary in spatial resolution and include information about which orbit the satellite was in at the time of capture.

Images of Maple glacier taken by TSX, sorted by date taken

CaFFe is also diverse in a variety of other ways. According to the dataset’s authors, “it is the first dataset to provide long-term calving front information...using SAR imagery from multiple satellites which introduces new challenges such as different penetration depths or sensitivity to surface changes; different signal-to-noise ratios; and different geometries, topographic effects, shading, and overlay effects.”

SAR images from CaFFe filtered by spatial resolution

In addition to the SAR images and this associated metadata, CaFFe comes primed with segmentation masks for different types of zones (“ocean”, “rock outcrops”, “glacier area”, and “no information available” ), as well as single-pixel width segmentation masks delineating the calving fronts in each image.

Annotated zones and calving fronts projected onto SAR images in the CaFFe dataset.

How to Work with the CaFFe Dataset

If you’d like to explore the dataset in your browser (no install required!) log into: try.fiftyone.ai/datasets/caffe! When you’re ready to download the dataset and explore it locally, click on the “Download dataset” link here to download the zip file, and unzip it. The resulting data_raw folder will be a little under 3 GB and have four subfolders:

bounding_boxes: bounding boxes encapsulating the region with the dynamic calving front
fronts: full-image binary masks (as PNGs) delineating the calving fronts of glaciers
sar_images: the synthetic aperture radar (SAR) images themselves
zones: semantic segmentation masks classifying regions as ocean, rock outcrops, glacier area, and other

To manage and visualize the data, we will be using the open source library FiftyOne. If you haven’t already done so, you can install FiftyOne with the following command:

pip install fiftyone

In a Python process, let’s import all of the libraries we will need:

from datetime import date
from glob import glob
import numpy as np
from PIL import Image
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
from fiftyone import ViewField as F

Now let’s create an empty `Dataset` named “CaFFe” and make it persistent so that the data is persisted to the underlying database:

dataset = fo.load_dataset("CaFFe", persistent=True)

Loading the Raw Glacier Images

The next step is to add our images to the dataset as samples. If sar_images was just a plain old directory of images, we could add these to our dataset with:

dataset.add_images_dir("sar_images")

However, in the CaFFe dataset, information about a sample is encoded in its filepath! We want to capture this rich information. Let’s look at an example. The first filepath in sar_images/test is:

"data_raw/sar_images/test/COL_2011-06-18_TDX_7_1_024.png"

From this filepath, we can extract the split (test or train), the glacier where the image was taken, the satellite used to take said image, and date on which the image was taken, among other attributes. This function creates a Sample in FiftyOne from the filepath:

def create_sample_from_filepath(filepath):
split = "test" if "test" in filepath else "train"
fname = filepath.split("/")[-1].split(".")[0]
parts = fname.split("_")
glacier, satellite, orbit = parts[0], parts[2], parts[5]
date_str = parts[1]
date_taken = date(*map(int, date_str.split("-")))
spatial_resolution = int(parts[3])
quality_factor = int(parts[4])
if len(parts) == 7:
modality = parts[6]
else:
modality = None
sample = fo.Sample(
filepath = filepath,
glacier = glacier,
satellite = satellite,
orbit = orbit,
date_taken = date_taken,
spatial_resolution = spatial_resolution,
quality_factor = quality_factor,
modality = modality,
tags = [split]
)
return sample

We can then loop over files in the sar_images folder, adding the samples created from each filepath to the dataset:

samples = []
for fp in glob("data_raw/sar_images/*/*.png"):
sample = create_sample_from_filepath(fp)
samples.append(sample)
dataset.add_samples(samples)

Now launch the FiftyOne App to view these images, before adding in fronts and masks:

session = fo.launch_app(dataset)

Adding Front and Zone Masks

CaFFe’s regular structuring of the raw data makes it easy to retrieve the filepaths for the front and zone mask images — just replace “sar_images” with “fronts” (or “zones”) and tack an “_front” (or “_zones”) onto the end of the filename before the file extension. From there, we can use PIL.Image to open the mask images, and NumPy to convert them to arrays for usage in FiftyOne Segmentation labels. For simplicity, I’ve split the code into two functions, get_front(), and get_zones(), which we can use to retrieve and format the segmentation masks we will add to our samples:

## Fronts
def get_front(sample):
split = sample.filepath.split("/")[-2]
filename = sample.filename
front_filepath =f"data_raw/fronts/{split}/{filename.replace('.png','_front.png')}"
front_mask = np.array(Image.open(front_filepath)) // 255
return fo.Segmentation(mask = front_mask)

## Zones
def get_zones(sample):
split = sample.filepath.split("/")[-2]
filename = sample.filename
zone_filepath = f"data_raw/zones/{split}/{filename.replace('.png', '_zones.png')}"
zone_mask = np.array(Image.open(zone_filepath))
return fo.Segmentation(mask = zone_mask)

Then we can iterate over our samples, adding these masks as front and zones fields, respectively, and automatically saving each sample as we go:

for sample in dataset.iter_samples(autosave=True, progress=True):
sample["front"] = get_front(sample)
sample["zones"] = get_zones(sample)

💡For ultimate efficiency, you can also add these masks in the initial sample creation so you don’t need to loop through the samples in the dataset.

Adding Thumbnail Images

If you’ve worked with satellite imagery before, you know that SAR images can be massive. For CaFFe, the images have widths of up to 4,000 pixels, and heights of up to 4,000 pixels, leading to images that take up 8 million bytes! 💡Want to know how big your images are? Run dataset.compute_metadata()! No matter how nice of a GPU you have on your computer, rendering a grid of images this large will inevitably be slow. One thing you can do to accelerate your dataset visualization and exploration is to compute lower resolution “thumbnail” images for the samples in the dataset. When you scroll through the grid, these smaller images will load instead of the original large images, so loading time will be reduced, but when you click into the modal for an image, it will still render the full-resolution image. You can have the best of both worlds! FiftyOne has an image utility that makes thumbnail generation incredibly easy. First, let’s create the directory where we will store the thumbnails:

mkdir caffe_thumbnails

Then we import the utils, and use the transform_images() function to create an image with the same aspect ratio for each image in our dataset.

import fiftyone.utils.image as foui
foui.transform_images(
dataset,
size=(-1, 256),
output_field="thumbnail_path",
output_dir="caffe_thumbnails",
)

Here the size=(-1, 256) means that the thumbnails will have a height of 256 pixels, and whatever width they need to have in order to preserve aspect ratio. Last, all we have to do is tell the FiftyOne App to look at this new thumbnail_path field for the images to display in the grid. We do so by setting the grid_media_field attribute in the dataset’s App config:

# Modify the dataset's App config
dataset.app_config.media_fields = ["filepath", "thumbnail_path"]
dataset.app_config.grid_media_field = "thumbnail_path"
dataset.save() # must save after edits

Conclusion

Glaciers are an important feature of the natural environment, and we still have much to learn about them if we want to understand how climate systems are evolving over time. From traditional modalities like optical imagery to burgeoning modalities like synthetic aperture radar, computer vision promises to provide a powerful toolset in this quest. By establishing a benchmark for SAR calving front delineation, CaFFe gets us a step closer to this goal!

Talk to a computer vision expert