Physical AI teams are working with more data than ever, across more modalities and at greater complexities. Although large-scale data collection is often seen as a time and cost bottleneck to building high-performing AI systems, the bigger challenge is ensuring the right data gets labeled—and gets labeled correctly and efficiently.
To overcome wasted time and costs, a strategic shift in data annotation is necessary. Teams need to treat annotation as a data flywheel that reduces rework and delivers higher-quality labels with every iteration. This shift helps teams discover issues earlier and ship AI faster, with fewer resources and lower costs.
Voxel51 is releasing comprehensive labeling capabilities in FiftyOne Annotation. With this addition, annotation now lives alongside data curation and model evaluation in a single platform, giving annotation teams and ML engineers a shared understanding of the data without lengthy handoffs or lost context.
Project management with configurable multi-stage labeling workflows to coordinate annotation teams at scale.
2D, 3D, and video labeling across the full range of label types — bounding boxes, polygons, polylines, segmentation masks, SAM2 click-to-segment, lidar cuboids, video object tracking, and temporal events.
Agentic Labeling to automate the first pass using VLMs, so human effort can focus on review and correction.
Smart Data Selection to select and prioritize the most impactful data for labeling.
Intelligent Review, powered by embeddings and model evaluation to QA annotation mistakes before they reach training.
In this post, we’ll walk through how to use these capabilities together, from selecting the right data to label, setting up the labeling workflow, to closing the loop with model analysis and data improvement to achieve desired model outcomes.
Create annotation workflows that drive high-throughput and high-quality labels in FiftyOne
For teams of annotators and reviewers working in parallel across large datasets, workflow structure determines the consistency and quality of final labels.
FiftyOne includes project management and multi-stage labeling workflows built for annotation teams at scale.
Configurable multi-stage workflows define how samples move through the annotation pipeline. A standard workflow routes samples from an annotation stage to a review stage, with rejected samples looped back to the annotator. Workflows can be extended to include intermediate QA stages or agentic labeling passes before human review begins.
Ontologies define the label schema once and apply it consistently across the project. Classes, attributes, and annotation types are configured centrally, so every annotator works from the same specification. Reusable ontologies can be applied across multiple projects, useful for teams running recurring annotations.
Role-based task assignment routes work to the right people. Annotators see only the samples assigned to their task. Reviewers see only samples that have been submitted for review. Project managers have visibility across the full workflow, including per-stage progress.
Project managers can view progress at every annotation stage.
Reusable custom workflow templates let teams clone proven workflow configurations for reuse. An automotive detection workflow, a surgical video annotation workflow, and a lidar labeling workflow can each be cloned and instantiated for new datasets without rebuilding the pipeline each time.
Select and prioritize the right data for annotation
The most expensive annotation mistake is labeling data that does not improve your model. Despite significant investment in labeling, 1 in 3 organizations report that more than half the data they pay to annotate never reaches production.
Source: 2026 State of Visual and Physical AI Report
Traditional workflows sample randomly or label everything. Both waste budget on redundant samples while missing the rare, high-signal examples that move model performance. Teams applying a curate-first approach routinely ship models with 60 to 80 percent less annotated data than those that label without selection.
FiftyOne's Smart Data Selection techniques help teams identify which unlabeled samples are worth labeling before annotation begins. Several complementary methods work together:
Embedding-based workflows map your dataset into a visual embedding space, making it easy to see the data clusters and potential gaps. Samples that sit far represent novel information and are the ones worth prioritizing.
Auto-tagging and metadata-based sampling enrich raw data with semantic attributes such as scene types, weather conditions, or object presence, using zero-shot or VLM models. Teams can filter and balance across these attributes to ensure the labeled dataset reflects the distribution the model needs to learn.
Similarity search lets you query by visual content, so you can bring specific scenarios into your annotation queue. Even after a first run of model evaluation surfaces a failure mode, you can search for visually similar samples across your entire dataset and bring more examples of that scenario into the next annotation round.
Few-shot retrieval extends similarity search to edge cases: provide a handful of example images and retrieve visually similar samples at scale, without writing structured queries against metadata fields.
Zero-shot coreset selection applies a systematic approach to data selection. Using pre-trained foundation models, it scores each unlabeled sample based on the unique information it contributes and filters out redundant examples.
Set up Agentic Labeling in FiftyOne to train VLMs for label generation
For large datasets where manual annotation of every sample is not practical, agentic labeling automates the first pass. Agentic labeling in FiftyOne uses an AI agent powered by Visual Language Models (VLMs) to iteratively train a VLM agent to generate labels for your dataset.
Users describe what needs to be labeled in natural language and run it on a small batch of data. They review the generated examples and refine the prompt until the labeled data matches what they intend.
The trained VLM agent is then applied to the full dataset, populating labels across thousands of samples without the need to interact with every data sample.
Agentic labeling reduces manual and intensive labeling work. Human labelers are able to start from a labeled baseline rather than a blank canvas, concentrating their effort on reviewing and correcting the agent's output rather than drawing every label from scratch.
Annotate 2D, 3D, and video data in FiftyOne
With the workflow configured, annotation for different data modalities happens directly in FiftyOne using the same platform environment as the rest of the data work.
Video annotation
FiftyOne's video annotation provides frame-by-frame editing with tracked detections and object attributes that propagate across frames. Rather than labeling every frame individually, annotators mark keyframes at points where the object moves or changes, and the system interpolates the frames in between automatically.
Temporal event labels extend this further, marking the start and end of actions, activities, or scene states along the video timeline. This unlocks action recognition and temporal captioning workflows, useful for training vision-language-action models that need to understand not just what is in a frame, but what is happening across a sequence of frames.
2D annotation
FiftyOne supports the full range of label types for image data across classification, bounding boxes, and segmentation masks, supporting pixel-level precision, especially in high-stakes environments such as medical imaging or dense scenes where exact object boundaries are critical.
Generating segmentation masks is now faster and more accurate. Use SAM2 click-to-segment to generate pixel-accurate instance segmentation masks with a single click, running directly in the browser. Teams can also bring fine-tuned SAM2 models to adapt segmentation behavior to their specific domain and object types.
Annotate irregularly shaped objects and linear structures such as lane markings, road boundaries, or pipeline routes using polygons and polylines.
3D annotation
FiftyOne's 3D annotation enables creation of cuboid and polyline labeling on lidar point clouds, with depth-of-field controls and real-time rendering. Annotators can move between 2D camera views and 3D point cloud views within the same interface, with full spatial context preserved across both. This is especially important for tasks where label accuracy depends on understanding the relationship between what the camera sees and what the lidar captures.
Intelligent Review and QA: Find mistakes before they reach training
Labeling errors are inevitable, but efficient QA can maximize label quality.
FiftyOne's Intelligent Review uses embeddings, model predictions, and auto-label comparison to surface annotation errors automatically. Rather than inspecting labels one by one, teams can identify where mistakes are concentrated across entire batches, prioritize the highest-risk labels for human review, and resolve quality issues in bulk. Systematic errors that would otherwise reach training are caught at the source. When a mistake is found, annotators fix it directly in FiftyOne without any exporting or loss of context.
Embedding-based review
Within a well-annotated dataset, labels of the same class cluster together in embedding space because they share visual characteristics. Outliers, i.e., samples that land far from their class cluster, often signal mislabeled data. FiftyOne's embeddings panel visualizes these clusters directly, letting reviewers lasso outlier groups and send only those samples to the review queue and fix the labels. A single outlier found by visual inspection can become a filter that surfaces dozens of similar errors across the full dataset.
Model evaluation scenario-based QA
Comparing labels against a reference model's predictions adds a second, independent signal. Where the model and the label disagree significantly, one of them is likely wrong. FiftyOne's scenario analysis tools let teams slice this disagreement by scene type, object class, lighting condition, or any metadata attribute, surfacing systematic labeling failures.
Human vs. auto-label comparison
When agentic labeling generates the first pass, comparing those labels against a human-annotated reference subset directly identifies where the agent's behavior diverges from team guidelines. Samples where the two disagree point to prompts that need refinement, or to classes where the agent's interpretation does not yet match the annotation specification. Resolving those discrepancies in the prompt produces a better-configured agent for the next labeling run.
Close the loop with model evaluation
Annotation is not a one-time event. Every model training run produces new information about which scenarios the model handles well and which it fails on. That information is the input to the next round of data selection and annotation. This iterative loop only produces compounding returns if the evaluation connects back to the data.
FiftyOne's model evaluation capabilities surface exactly where a model fails. Scenario analysis slices performance metrics by scenarios such as scene type, weather condition, object size, occlusion level, or any metadata attribute, exposing the specific data distributions the model struggles with.
From a failing scenario view, teams navigate directly to the samples causing the failures, identify what those samples have in common, and use similarity search or embedding analysis to find more examples of the same scenario in unlabeled data.
Those samples become the input to the next curation pass. They feed the next annotation round, which produces the training data that addresses the identified gap.
This tight loop is exactly how ML engineers and data annotation teams need to operate to increase the quality and coverage of data to ship models faster.
Data flywheel: Curate, annotate, and evaluate your data and models in a single unified platform
What this means for your team
Bringing data visualization, selection, annotation, and model evaluation into a single platform changes what annotation teams and ML engineers can accomplish at every stage of model development.
Produce high-quality labels. Intelligent Review identifies incorrect labels in bulk and prioritizes the samples that need human attention, so reviewers focus where it matters. The result is high-quality data labels created.
Label data faster. Agentic labeling workflows powered by VLMs automate the first pass across your dataset, so annotation teams start from a labeled baseline instead of starting from scratch.
Reduce wasted annotation spend. Smart data selection ensures annotation budget targets samples that carry new and valuable information for the model. Teams routinely achieve equivalent model performance with a fraction of the labeled data compared to undirected labeling.
Scale annotation without losing consistency. Configurable workflows, shared ontologies, and role-based task assignment give annotation project managers the structure needed to coordinate annotation teams without managing every label individually.
Catch label errors early and iterate faster. Embedding-based QA and model comparison surface systematic labeling mistakes before training begins, preventing the compounding cost of retraining on bad data. When curation, annotation, QA, and evaluation share a platform, failure modes discovered in model evaluation feed directly into the next annotation pass — tightening the feedback loop with every iteration.
To try all the capabilities, including annotation project management and multistage workflows, agentic labeling, available in the enterprise version of FiftyOne, talk to our computer vision experts about your annotation and ML pipeline.