Frequently Asked Questions

Question 1

What annotation types does FiftyOne support for video?

Accepted Answer

FiftyOne supports bounding boxes, SAM2-powered object tracking, temporal events, frame-level classifications, clip-level classifications, segmentation masks, and polygons for video data.

Question 2

What's the difference between frame-level and clip-level annotations?

Accepted Answer

Frame-level annotations are applied per-frame and can change across the timeline — for example, a bounding box that follows a moving object or a classification that tracks changing weather or lighting conditions. Clip-level annotations apply to the entire video — for example, tagging a scene type or classifying overall visibility. FiftyOne supports both in the same ontology.

Question 3

What are temporal events and when do I need them?

Accepted Answer

Temporal events label what happens and when — with a start timestamp and an end timestamp. They're the right label type for action recognition, activity detection, surgical workflow annotation, and any task where your model needs to understand event boundaries rather than just object location.

Question 4

What is Agentic Labeling?

Accepted Answer

Agentic Labeling lets you describe what you need labeled in natural language. FiftyOne generates example outputs, you review and refine the prompt, then save the configuration to auto-label datasets at scale as a background task.

Question 5

How does SAM2 tracking work for video annotation?

Accepted Answer

You initialize the object on one frame and SAM2 automatically tracks it across subsequent frames, assigning a persistent unique ID so your model learns to associate the same entity across time. If the model loses its lock due to occlusion or motion blur, you add a corrective keyframe and tracking resumes. Annotators focus on reviewing, rather than starting from scratch.

Question 6

How does FiftyOne handle occlusion for video annotation?

Accepted Answer

Occlusion happens when an object is temporarily hidden behind another object in the scene. The track stays active and is marked as occluded during those frames — teaching your model that the object is still present even when it's not visible. This is critical for training spatially aware models that need to handle real-world footage.

Question 7

How is FiftyOne video annotation different from other tools?

Accepted Answer

Most annotation tools treat video as a stack of images. FiftyOne is built around the temporal structure of video — tracking objects across time, labeling events with start/end precision, and integrating annotation directly with Smart Data Selection and model evaluation so your team can close the loop without switching platforms.

Frame-level precision for video annotation

SMARTER VIDEO LABELING

Go from raw video footage to training-ready datasets, faster

Automated tracking from a single keyframe

Bounding Boxes

Classifications

Segmentation Masks

Temporal events

Polygons

Agentic Video Labeling

Automate manual video labeling with agents

FiftyOne Video Annotation

Built for teams that want precise video labels at scale

ML Research

Auto-labeling rivals human performance

Data labeling platform

Improve model performance with an end-to-end video labeling platform

Video annotation project management with configurable review workflows

Standardize video annotation schemas and ontologies

Questions on video labeling?
We have answers.

Get started with FiftyOne Annotation