Watching a Self-Driving Car Think: Building a Synced Frame + Map Plugin in FiftyOne

Jun 29, 2026
10 min read
Autonomous driving data is gloriously, frustratingly multimodal. A single moment in time is captured by six cameras, a spinning LiDAR, an HD map, a GPS trace, and a swarm of 3D object annotations — all stamped to the same instant, all telling a different part of the same story. This post walks through a FiftyOne plugin we built to see that data the way the car saw it: camera frame, bird's-eye LiDAR view, and a live map of every agent, all playing back in lockstep.
This post is about a small plugin we built in FiftyOne to do exactly that: step through a driving clip and watch the camera frame, a bird's-eye LiDAR view, and a live map of the ego vehicle and every detected agent all move in lockstep. It was directly inspired by an animation on the nuReasoning project page, and it turned into a nice showcase of why FiftyOne's plugin framework is such a quietly powerful thing.
Key Takeaways:
  • FiftyOne supports grouped datasets, where one logical sample holds multiple sensor slices from the same moment in time. This makes it possible to query and visualize camera images, LiDAR, and metadata together without stitching anything manually.
  • The FiftyOne plugin framework lets you build custom panels and operators that extend the App with bespoke visualizations. Panels are React components; operators handle backend logic. Together they're how the Frame + Map panel stays in sync across three views.
  • The Frame + Map panel syncs a camera image, a bird's-eye LiDAR view, and a live GPS map — advancing all three together keyframe by keyframe. It lives inside the FiftyOne App, so synchronized playback is one step away from your existing filters and queries.
  • The synced multimodal playback pattern extends well beyond autonomous driving. Robotics, drones, aerial mapping, and any sensor rig where spatial and temporal alignment matters can use the same approach.
  • A reproducible notebook is available to build the panel yourself. Follow along with the full tutorial notebook to build the Frame + Map panel from scratch.

What is FiftyOne?

FiftyOne is an open-source tool for building high-quality datasets and computer vision models. If you work with image, video, or 3D data, it's the layer that sits between "a folder of files and some annotations" and "a dataset you actually understand."
You load your data into a Dataset, and FiftyOne gives you a queryable Python SDK plus a rich visual App for exploring it. You can filter, slice, sort, and search your samples; overlay labels like detections, polylines, and 3D cuboids; evaluate model predictions against ground truth; surface duplicates, mislabels, and edge cases; and do it all across modalities — images, video, point clouds, and metadata — in one place. It supports Python 3.10–3.12 and installs with a single pip install fiftyone.
Crucially for autonomous driving, FiftyOne has first-class support for grouped datasets: a single logical "sample" can contain multiple slices — say, six camera images plus a LiDAR point cloud — that represent the same moment from different sensors. That grouping is the backbone of everything below.

Your app, your rules: the FiftyOne plugin framework

Out of the box, the FiftyOne App is great. But the reason it scales to wildly different domains — from retail to robotics to medical imaging — is that you're not limited to what ships in the box. The plugin framework lets you extend the App with your own functionality, written in Python, JavaScript, or both.
There are two building blocks worth knowing:
Operators are units of logic. They can read the current view, the selected samples, dataset info, and more, then compute something and hand it back to the App. Think of them as the bridge between your Python backend and the browser.
Panels are custom UI surfaces that live right inside the App — in the grid or the sample modal. A panel is a full JavaScript/React component with access to the App's state, so you can render anything the web can render: charts, maps, 3D scenes, custom interactive tools.
The combination is what's powerful. A panel can call an operator to fetch exactly the data it needs (per-frame image URLs, GPS points, projected boxes), and then render a completely bespoke experience on top of it. As the Voxel51 team likes to put it: if you can dream it up, you can build it in the FiftyOne App.
That's not marketing fluff — it's the literal reason the next part was possible.

The inspiration: nuReasoning's long-tail driving clips

nuReasoning is a recent, large-scale dataset from Motional — the team behind nuScenes and nuPlan. It contains 20,000 real-world driving clips, each about 20 seconds long, totaling roughly 105 hours of curated long-tail driving data collected across multiple U.S. cities including Las Vegas, Pittsburgh, Los Angeles, and Boston. "Long-tail" is the key word: these are the rare, weird, decision-critical edge cases — unusual pedestrian behavior, ambiguous right-of-way, the situations that are hard to anticipate and even harder to train for. Each clip ships with synchronized multi-camera images, LiDAR, HD maps, object annotations, ego state, and human-verified reasoning annotations.
What caught our eye, though, was the animation on their landing page. It plays a clip with all the cameras tiled together, an "ego state" readout in the middle, and — off to the right — a clean top-down map showing the ego vehicle, the lanes, the crosswalks, and every surrounding agent as a colored box, all moving in perfect sync. It's a beautiful, legible way to watch a scene unfold from the car's perspective.
We wanted that experience, but living inside FiftyOne, driven by a real grouped dataset, where the synced view is one click away from all the filtering, querying, and label inspection the App already gives you.

Riding shotgun: the frame + map panel

Using the FiftyOne plugin framework and a grouped nuScenes dataset (the open lineage that nuReasoning builds on — six cameras plus a top LiDAR per keyframe), we built a panel called Frame + Map. When you open a driving scene, the panel shows three synchronized views side by side:
The camera frame, with 3D object cuboids projected down onto the image and labeled by class — so you see the truck, the pedestrians, the construction vehicle exactly where the perception stack sees them.
A bird's-eye LiDAR view (BEV), rendering each detected object as an oriented box with heading, over range rings at 20/40/60 meters and an ego triangle at the center — the top-down "what's around me right now" view.
A live map (via Mapbox) showing the ego vehicle's GPS trajectory as a route line, an animated marker for the car's current position, the crosswalks and lanes near the route, and a colored dot for every agent detected in the current keyframe.
Hit Play, and all three advance together, keyframe by keyframe, with Pause and Reset controls. The car slides along its route on the map while the camera frame, the surrounding-agent dots, and the BEV boxes all update to that same instant. A "show all-scene" toggle overlays the full-clip detection cloud as faint context when you want to see the whole drive at once.
Under the hood, this is the operator-plus-panel pattern doing its job: a Python operator computes the per-keyframe payload — ordered frames, image URLs, projected boxes, GPS points, map overlays, and per-keyframe agent positions converted to latitude/longitude — and the React panel renders the three synchronized views and runs the playback loop.

The payoff: from pretty playback to real debugging

The "wow" of synchronized playback is nice, but the practical value is in what it lets you do:
Sanity-check sensor fusion and calibration. When cuboids are projected onto the camera image and the same objects appear as oriented boxes in the BEV and as dots on the map, mismatches jump out immediately. A box floating off its object, an agent on the map that isn't where the camera says it is — these are calibration and projection bugs you'd otherwise hunt for in spreadsheets.
Understand long-tail scenes the way nuReasoning intends. Reasoning over a rare scenario requires understanding spatial relationships and agent interactions. Seeing the camera, the BEV, and the map together is exactly the context a human (or a model) needs to reason about why a decision was made.
Debug tracking and detection over time. Because everything is keyed to the keyframe and plays back in sync, you can watch an agent's dot move along the map while its box moves through the camera and BEV — making it obvious when a track drops, jumps, or swaps identity.
Triage and curate datasets faster. The panel lives inside the App, so it's one step away from FiftyOne's filtering and search. Find the scenes you care about, then watch them, without exporting anything or switching tools.
The use cases that get the most value share a profile: multimodal, time-series, spatial data where the relationship between sensors matters. Autonomous driving is the obvious one, but the same pattern applies to robotics (arm camera + depth + world frame), drones and aerial mapping (camera + GPS track + terrain), maritime and rail, smart-city and traffic analysis, and any sensor rig where "what did all the sensors see at this instant, and where was the platform?" is the question you keep asking.

Limitations and design trade-offs

Two things worth being upfront about, because they shaped the design. First, the App's own 3D LiDAR viewer renders in its own pane and isn't something a panel can smoothly puppeteer frame-by-frame on every build — which is exactly why the plugin includes its own bird's-eye LiDAR view that it fully controls and keeps in sync. Second, the quality of the projected cuboids is bounded by the underlying dataset's localization; on a public mini-split, edge-of-frame boxes can look slightly off. Neither is a dealbreaker, and both are the kind of trade-off the plugin framework lets you make deliberately rather than fight.

Next steps

If you want to build something like this — or just start using FiftyOne — here's the on-ramp. You can find the notebook to reproduce the panel here.
Get started with FiftyOne
Learn the plugin framework
Work with this kind of data
Join the community
If you build a synced multimodal viewer of your own — for driving, robotics, or anything with a moving platform and a pile of sensors — we'd love to see it. The framework is ready for it. The rest is just deciding what you want to watch.

FAQ

What is FiftyOne?

FiftyOne is an open-source multimodal data platform for building high-quality datasets and evaluating visual AI models. It provides a Python SDK and a visual App for filtering, labeling, and inspecting image, video, point cloud, and metadata — all in one place.

What is a grouped dataset in FiftyOne?

A grouped dataset lets you represent a single moment in time as multiple "slices" — for example, six camera angles plus a LiDAR point cloud from the same keyframe. Each slice is linked within one logical sample, making it easy to query and visualize multi-sensor data together.

What is the FiftyOne plugin framework?

The plugin framework lets you extend the FiftyOne App with custom logic and UI. Operators handle backend computation (fetching data, running inference, computing projections), while Panels are React components that render inside the App and can call those operators to display bespoke visualizations.

How do you visualize autonomous driving data in FiftyOne?

Load your driving data as a grouped dataset with camera images, LiDAR, and metadata as slices. From there, you can use the built-in 3D viewer for point clouds, overlay 3D cuboid annotations on camera frames, and — as shown in this post — build a plugin panel that syncs all views with a live GPS map during playback.

Does this work with nuScenes or nuReasoning data?

Yes. The Frame + Map panel was built using a grouped nuScenes dataset (six cameras plus top LiDAR per keyframe). nuReasoning, which builds on nuScenes, uses the same sensor setup and format, making it directly compatible.

Can I build this myself?

Yes — follow the nuScenes trajectory panel tutorial notebook to reproduce the full Frame + Map panel.
Loading related posts...