Skip to content

Announcing the Voxel51 & V7 Partnership 

Two data-centric AI providers integrate, enabling AI engineers to get the most out of their data

Nothing limits computer vision model performance more than bad data. But datasets today are huge, reaching hundreds of millions or even billions of samples, which means it’s impossible to look through them all to catch errors quickly and efficiently. Moreover, improving the quality of a dataset largely depends on two important components: high-quality ground truth annotations, and the ability to curate datasets of the highest quality with class balance and representative coverage of your data distribution. The good news is that there are tools to help you combat bad data by improving and optimizing your datasets so you can deliver exceptional AI products into production.

We’re excited to announce a new partnership between Voxel51 and V7 to empower customers to build smart, scalable, and high-quality dataset management pipelines!

Who are Voxel51 and V7?

If you’re not yet familiar, we are Voxel51, the company behind FiftyOne, the leading toolkit for building high quality datasets and computer vision models. FiftyOne is where real AI work happens. AI teams around the world rely on FiftyOne to visualize, curate, manage, and QA data, and automate the workflows that make enterprise machine learning possible. Plus, FiftyOne was designed with extreme flexibility and extensibility in mind, which includes the ability to integrate naturally with other AI/ML tools you know and love.

V7 is a powerful AI data engine enabling better AI products to reach the market faster. Used by enterprise customers worldwide, including Continental, Wanzl, and Boston Scientific, V7’s unique workflows enable 10x faster labeling. Features such as auto-annotation, model visualization, advanced video labeling, bespoke workflow design, intelligent QA, and elite labeling task forces converge to offer a scalable solution that prioritizes impactful AI development.

Partnership highlights

At the heart of the partnership is the integration between Voxel51’s FiftyOne and V7 Darwin. The integration is currently in beta. Continue reading to learn how these two platforms provide customers with cutting-edge solutions primed to deliver top-tier AI products.

Dataset curation for smarter annotation

For most machine learning projects, the first step is to collect a suitable dataset for the task. In addition, datasets need labeling and annotating in order to continue through the ML pipeline. With large collections containing millions or billions of samples, annotating can quickly become cost prohibitive. The question is: how can you create smaller, carefully curated data subsets for annotation that are the most impactful for your ML project in order to get the most out of your annotation budget while boosting model performance?

FiftyOne provides a variety of cutting-edge tools and workflows that enable you to:

  • explore and balance your datasets by class and metadata distribution
  • visualize, de-duplicate, sample, and pre-label your data distributions using embeddings
  • perform automated pre-labeling with off-the-shelf or custom models
  • and more!

The new integration between FiftyOne and V7 Darwin allows users to send subsets of their datasets from FiftyOne to V7 Darwin for annotation. The annotated data from V7 can then be imported back into FiftyOne for review and refinement, before ultimately being used to train your model.

Annotation review & QA

In many ML projects, a dataset already exists and is being used to train models. In such cases, the best use of time is likely to improve the quality of the dataset, which often provides greater performance gains than similar effort put into optimizing the model architecture.

FiftyOne enables powerful image- and object-level annotation review and QA workflows. Use FiftyOne’s embeddings visualization, compatible with both off-the-shelf as well as custom models, to highlight likely annotation mistakes and outliers. Use FiftyOne’s sample- and label-level tags, as well as saved views, to easily mark samples for reannotation back in V7.

Model evaluation

Once a model is trained, you can easily run inference, load the model predictions back into FiftyOne, and evaluate them (including regressions, classifications, detections, polygons, instance and semantic segmentations, on both image and video datasets) against the ground truth annotations. This makes it possible to highlight areas of improvement for your annotations, as well as identify classes of difficult samples for training set augmentation. For targeted dataset augmentation, FiftyOne’s built-in similarity search functionality can be leveraged with a variety of vector database backends.

Seamless data transfers

The integration makes it smooth and easy to send data back and forth between FiftyOne and V7 Darwin via an API. The integration also allows for seamless conversion of all data formats, retaining all existing annotations (including labels made in other tools).

Additional capabilities for FiftyOne Teams customers

FiftyOne Teams combines the features you know and love in open source FiftyOne with additional capabilities for secure, real-time multi-user collaboration—all backed by world-class customer support. With the Voxel51 and V7 partnership, there are three additional capabilities to highlight for FiftyOne Teams customers.

Dataset versioning

With dataset versioning in FiftyOne Teams, every annotation and model run can now be captured and versioned in a history of dataset snapshots. No more complex naming conventions or manual tracking of versions—dataset snapshots in FiftyOne Teams can be created, browsed, linked to, and re-materialized with ease in the App or SDK.

Loading cloud-backed media

If you’re a FiftyOne Teams customer and work with cloud-backed media, you will be able to connect your cloud-backed media to FiftyOne Teams and V7 Darwin in order to directly load items from your cloud storage.

Collaborating with humans in the loop

Because training datasets are generally too large for a single person to process, teams of data annotators and QA professionals often come together to do this work and ensure dataset quality. However, without the right tooling, it’s hard to visualize a large amount of data for different tasks of various shapes and sizes, and difficult to leave meaningful feedback that can easily be acted on. 

FiftyOne Teams makes it possible to safely and securely collaborate both inside and outside your organization. Create and share datasets and dataset views within and across QA teams, instantly load up datasets in your browser, easily scroll through samples, and leave sample- and label-level tags on any aspects of the dataset that are not of sufficient quality (such as an incorrect annotation or a blurry image).

How to get started with FiftyOne and V7 Darwin

We collaborated with the amazing team over at V7 to create this getting started tutorial for the FiftyOne and V7 Darwin integration.

In the tutorial you’ll learn how to:

  • Set up FiftyOne
  • Configure V7 Darwin
  • Load example data in FiftyOne
  • Annotate the data in V7
  • Continuously improve your training dataset to boost model performance

Check it out!

Conclusion and next steps

We’re excited to partner with V7 to bring value to joint customers, and we’d love to hear what you think!

Here are a few next steps to take if you’re interested in the integration: