When we launched Verified Auto Labeling we kept hearing the same two questions:

“How much cheaper is auto-labeling than hiring humans for the same job?”
“Where do those numbers actually come from?”

To answer both at once, we published an open-source spreadsheet and a web-based Annotation Savings Estimator. This post provides a walkthrough of how the estimator works: every user input, assumption, and formula that powers the calculator, so you can audit or adapt the model for your own projects.

The inputs: what drives labeling cost?

The Annotation Savings Estimator asks for three key inputs:

Number of images to label (to determine scale for costs and throughput)
Task type (Classification, Detection, or Segmentation)
Task complexity (Simple, Moderate, Complex, or Custom) based on the number of objects per image

The task complexity is defined as the following:

1–5 objects/image → Simple task
Example: binary or small multi-class detection of basic objects, such as in CIFAR-10 dataset
6–20 objects/image → Moderate task
Example: multi-label objects and scenes, such as in Cityscapes or Pascal VOC datasets
21-200+ objects/image → Complex task
Example: multi-label objects with fine-grained objects in highly cluttered scenes or requiring specific domain knowledge, such as medical imaging, aerial views. Example datasets include Open Images V4, LVIS

The estimator also provides a “Custom” input field to enter the average number of objects per images if the user has a good idea of that for their dataset. The tool maps that to a complexity tier based on the object density for a more precise estimation.

Benchmark data sources

All references are enumerated in-sheet so you can trace every calculation.

Human labeling base cost and time per annotation

*Source: Base time benchmark: Research paper

*Source: Base price benchmark: AWS Mechanical Turk ground truth pricing

Auto-labeling compute cost

The experiments conducted for the Verified Auto Labeling research paper use NVIDIA L40s GPU for compute. The cost for renting the GPUs is $0.93/hour.

The foundation models used for benchmarking label generation are YOLOE, YOLO-World, and Grounding DINO.

Human labeling estimator model

For each task type and complexity level, we use benchmarked human annotation time and standard labeling service costs to estimate:

Time per label (in seconds)
Total number of objects to annotate
Total human labeling hours
Total human labeling cost

For classification tasks

Human Labeling time (hrs) = base_time x number of images to label / 3600

Human Labeling cost (USD) = base_price x number of images to label

For detection tasks

Based on task complexity (simple, moderate, complex) we can use the lower and upper bounds for calculating the number of objects per image and then calculate the labeling time and cost range. If the user provides the average number of objects/image we use that in the equation. See the estimated obj/image for each tier of task complexity in the benchmark data sources section above.

Total number of objects = number of objects per image x number of images to label

Human labeling time (hrs) = total number of objects x base_time / 3600

Human labeling cost (USD) = total number of objects x base_cost

For segmentation tasks

The effort for instance and semantic segmentation not only depends on the number of images to label but also needs to factor in the scene complexity. This complexity depends vastly on how dense the scene is, intricate textures, whether the objects in the scene are well-defined, whether there is any background noise, or varying lighting conditions, etc.

We derive heuristic estimates of the scene complexity factor based on image segmentation algorithm research studies and overall empirical observations.

If the user provides an estimated number of obj/img, we use that to tier the scene complexity.

Human Labeling time (hrs) = base_time x number of images to label x scene_complexity / 3600

Human Labeling cost (USD) = base_price x number of images to label x scene_complexity

Verified Auto Labeling estimator model

We use the benchmark data for object detection from our Verified Auto Labeling research paper. The paper covers auto-labeling benchmarks on datasets of varying classes, number of images, and complexities. We also use additional segmentation experiments conducted by the Voxel51 ML researchers to benchmark the auto-labeling time and costs.

Here’s a summary of the numbers from the paper.

Using this as a baseline, we can now compute time and costs given user inputs.

For classification and detection tasks

We observed that for classification and detection tasks, the cost per image scales almost linearly with dataset size. We fit a simple least-squares fit across the four test sets and summarized that into an equation to calculate cost and time for these tasks.

Auto-Labeling cost (USD) = 3.8541 *10^-6 x number of images to label + 0.0011187

For segmentation tasks

For segmentation tasks, using the data above, we use a similar estimation and bucketize it for smaller and larger datasets with a threshold of 20,000 images. This is done so that costs for a smaller number of user images (< 20,000) can be accurately estimated.

For small datasets <= 20,000 images

Auto-Labeling cost (USD) = 0.309*10^-5 x number of images to label

For larger datasets > 20,000 images

Auto-Labeling cost (USD) = 0.309*10^-5 x number of images to label

Putting it together: an example

Let’s walk through an example so you can see the comparison of human labeling versus auto-labeling costs side by side.

Say your task is to annotate a driving dataset by drawing bounding boxes around several city street objects consisting of classes such as cars, trucks, bicycles, stop signs, pedestrians, traffic signs, …

User inputs:

Number of images to label: 100,000

Task type: Detection

Task Complexity: Custom

Avg num of obj/img: 12 (moderate task complexity)

Total number of objects = number of objects per image x number of images to label

Total number of objects to label = 1,200,000

Human labeling cost = total number of objects x base_cost

= 1,200,000 x $0.036

Auto-labeling cost = 3.8541 *10^-6 x number of images to label + 0.0011187

= $0.3865

Human labeling cost = $43,200

Verified Auto Labeling cost = $0.3865

Savings factor = 111,781x

The hidden costs of human annotation

Human annotation carries several hidden costs beyond the visible per-label pricing. These include onboarding annotators, designing detailed labeling guidelines, implementing quality assurance (QA) and rework cycles, managing communication and oversight, and building or maintaining annotation tools and infrastructure. Especially in complex tasks, these overheads can double or even triple the base cost of labeling.

To keep it simple, our estimator does not include these hidden costs, but know that they are there and can add another factor to the true cost of human data annotation.

Takeaways

Manual annotation becomes disproportionately expensive at scale and complexity.
Verified Auto Labeling provides substantial time and cost savings—up to 100,000x lower cost and 5,000x lower time, depending on the number of labels.

Whether you’re labeling 1,000 images or 1 million, the estimator can help you get ballpark ROI numbers to justify investment in automation.

Try it out and see how much time and money you could save with Verified Auto Labeling.

Talk to a computer vision expert

The inputs: what drives labeling cost?

Benchmark data sources

Auto-labeling compute cost

Human labeling estimator model

For classification tasks

For detection tasks

For segmentation tasks

Verified Auto Labeling estimator model

For classification and detection tasks

For segmentation tasks

Putting it together: an example

The hidden costs of human annotation

Takeaways

Talk to a computer vision expert

Related posts

Related posts