Tree counting from satellite imagery sounds simple until you actually try it. Three sensors, three continents, wildly different ground sample distances, and (the real headache) training labels that range from expert-annotated to "pretty noisy pseudolabel, use with caution." That's the exact problem TreeMatch (ECCV 2026) tackles: an optimal-transport method for learning tree density estimation from a mix of strong and weak point annotations, benchmarked on a new dataset called TinyTrees. In this post, we load TinyTrees and TreeMatch's pretrained checkpoints into FiftyOne to see what the benchmark table can't: where the model fails, what the label noise actually looks like, and how far predictions drift across sensors.
What a benchmark table can't show you about TreeMatch
This is the kind of problem FiftyOne was built for. FiftyOne is an open-source tool for annotating, visualizing, curating, and evaluating computer vision datasets and models — the layer between "I have a dataset and a model" and "I actually understand what's happening in them." Instead of writing one-off scripts to plot predictions or scroll through a folder of images, you get a real dataset object with a query language, an interactive App for browsing labels and predictions side by side, and a Brain for things like embeddings and similarity search.
TreeMatch posts the best RMSE on all three sensors, 60.6 on Gaofen-2 vs. 65.0 for DM-Count.
Research code gives you numbers. A benchmark table tells you TreeMatch beats DM-Count by some margin on some sensor. What it doesn't tell you is where the model fails, how noisy the weak labels actually look next to the real ones, or whether a model trained on 3m PlanetScope imagery has any business being pointed at 0.8m Gaofen-2 tiles.
FiftyOne closes that gap. Point annotations become Keypoints. Predicted density maps become Heatmap overlays sitting directly on top of the ground truth. Per-tile count error becomes a sortable field. Suddenly you're not reading an abstract — you're looking at the failure mode.
21GB, one zip — hosted as a single archive on Hugging Face, no partial downloads
Georeferenced — every tile is a real GeoTIFF with a coordinate reference system
CC BY-NC 4.0 — research and education only, not commercial
Rwanda, China, France — PlanetScope, Gaofen-2, and SPOT-6, unified into one dataset instead of three separate mental models
Ground truth + predictions, pixel-aligned — keypoints and density heatmaps built from the same tensor the model actually sees, so nothing drifts out of registration
Weak vs. strong, same region, one click — flip between pseudolabels and expert annotations on the same tiles and see the label noise the paper is actually about
count_abs_error, sorted — jump straight to the model's worst tiles instead of guessing
Cross-sensor embeddings — a UMAP view showing whether Rwanda/China/France cluster apart, i.e. domain shift you can actually point at
Saved views — nine of them, persisted on the dataset, ready for a live walkthrough with zero re-querying
None of this required touching TreeMatch's own code. It's the official data loaders and pretrained checkpoints, wrapped with a visualization layer.
Try it yourself: paper, code, dataset, and notebook
Scale past the demo subsample to the full ~21GB dataset, swap in the other TreeMatch baselines (DM-Count, CenterNet, P2PNet) as sibling prediction fields for a real model bake-off, or point the same pipeline at your own region's imagery and see what a domain-shifted model actually gets wrong before you retrain it.
Note: TinyTrees imagery is licensed CC BY-NC 4.0 — research and education use only.
FAQ: TreeMatch, TinyTrees, and FiftyOne
What is TreeMatch?
TreeMatch is an optimal-transport method for learning tree density estimation from satellite imagery, published at ECCV 2026. Its key contribution is training from a mix of strong and weak point annotations, so expert-labeled data and noisy pseudolabels can be used together.
What is the TinyTrees dataset?
TinyTrees is the benchmark dataset introduced with TreeMatch. It contains roughly 11.7 million point-annotated trees across three sensors and three continents: PlanetScope (Rwanda), Gaofen-2 (China), and SPOT-6 (France), with resolutions from 0.8m to 3m GSD. It ships as a single 21GB archive of georeferenced GeoTIFFs on Hugging Face.
What's the difference between strong and weak labels in TinyTrees?
Strong labels are expert-annotated tree points. Weak labels are pseudolabels derived from canopy height models (CHM), which are noisier. Handling both in one training setup is the core problem TreeMatch addresses.
Can I use TinyTrees for commercial projects?
No. TinyTrees imagery is licensed CC BY-NC 4.0, which permits research and education use only.
How does FiftyOne help evaluate a model like TreeMatch?
FiftyOne turns the benchmark into something inspectable: point annotations load as Keypoints, predicted density maps become Heatmap overlays on the ground truth, per-tile count error becomes a sortable field, and embeddings reveal cross-sensor domain shift. That surfaces where the model fails, not just the aggregate score.
Do I need to modify TreeMatch's code to explore it in FiftyOne?
No. The companion notebook uses TreeMatch's official data loaders and pretrained checkpoints, wrapped with a FiftyOne visualization layer.