How Zefr worked with half billion samples in FiftyOne

Key Results

2x the number of classification models shipped per quarter.
Half a billion plus data samples indexed and searchable through the integrated FiftyOne-Qdrant workflow
Rapid dataset curation from hundreds of millions of social media impressions per day
Higher annotation velocity with bulk labeling for faster throughput
Streamlined workflow that consolidates vector search, visualization, curation, annotation, and evaluation capabilities in one platform

"Before deploying FiftyOne, we lacked the right tool that allowed us to visualize, track datasets, and find the relevant media to train classifiers. Everything was slow and painful. FiftyOne solved this for us at scale and enabled us to work with more than half a billion samples." — Zachary McPherson, Director of Machine Learning, Zefr

Introduction: Zefr

Zefr is a leading technology provider for contextual advertising and brand safety for wall-gardened platforms. Their platform ensures that ads appear alongside safe, suitable, and contextually relevant content across major social media channels. As a trusted partner to Meta, YouTube, TikTok, and Snapchat, Zefr's AI system manages advertising risk in real time by processing hundreds of millions of social media impressions and content types per day. With Zefr, brands maintain a 99%+ safety rating on social media platforms.

Challenge: Accelerating classifier development with massive data volumes

Zefr measures campaign performance and identifies brand risk through advanced machine learning classifiers. These classifiers detect sensitive or high-risk content such as weapons, violence, military conflict, misinformation, unsafe environments, and other platform-specific risk vectors.

Before adopting FiftyOne, Zefr's ML team faced major hurdles as they scaled their brand safety models.

Slow data discovery: Hundreds of millions of social media impressions per day created massive amounts of visual content to filter, curate, and prepare for training. Finding examples of weapons, conflict imagery, or other brand risk categories within their massive data lake was like searching for needles in a haystack.

Low annotation throughput: Legacy annotation workflows processed images one by one, severely limiting the speed at which the team could create training datasets.

Lack of visualization and dataset understanding: It was hard to see what their dataset actually contained and check if the data matched the rules each classifier was supposed to follow. Script-based approaches made it hard to get quick visual feedback on dataset quality.

Before integrating vector search capabilities with FiftyOne, the team lacked an efficient way to surface semantically similar content from their vast data repositories. The process of sourcing training data, managing datasets, and tracking model versions across multiple tools created operational overhead and slowed development.

Why Zefr Chose FiftyOne

When evaluating solutions, Zefr's team prioritized tools that could address their specific pain points while scaling with their growth. Several key factors drove the decision to adopt FiftyOne:

Easy visual dataset understanding: FiftyOne made it simple to visualize massive datasets and quickly understand whether the content belonged to the model classifier’s scope (e.g., weapons, military conflict, misinformation)

Bulk annotation velocity: Unlike their previous tool's one-at-a-time approach, FiftyOne's bulk annotation capabilities for classification tasks gave Zefr's internal labeling team the speed boost they needed to keep pace with new classifiers.

Flexibility and extensibility: The mature SDK and plugin framework allowed Zefr to customize FiftyOne for their unique workflows. Integration with vector search databases such as Qdrant, training tools such as PyTorch/ONNX, experiment tracking, and cloud storage for data lineage were key factors.

"Before deploying FiftyOne, we lacked the right tool that allowed us to visualize, track datasets, and find the relevant media to train classifiers. Everything was slow and painful. FiftyOne solved this for us at scale and enabled us to work with more than half a billion samples." — Zachary McPherson, Director of Machine Learning, Zefr

Solution: A unified visual data pipeline for computer vision operations

Zefr implemented FiftyOne as the central hub for ingesting, annotating, and evaluating image-based datasets. They integrate with the Qdrant vector database and PyTorch to create a powerful ML pipeline from discovery to deployment.

Zefr ingests new impression data from social media streams in real time. They maintain a sliding window of fresh media content by taking a sample of images, generating embeddings, and indexing them into Qdrant. This ensures the classifiers train on current, representative data that reflects evolving content trends across platforms.

How Zefr uses FiftyOne

Vector-powered data discovery: FiftyOne connects to Zefr's Qdrant vector database to perform image-to-image and text-to-image similarity search from half a billion indexed impressions. For example, when building a weapons classifier, users can query "weapons" directly from the FiftyOne app to retrieve the most semantically similar images from their massive corpus of data.

Deduplication workflow: Social content that goes viral gets reposted thousands of times, which creates multiple duplicates. Zefr uses FiftyOne to identify and remove duplicates and prevent model overfitting.

Model evaluation and performance analysis: After training classifiers using PyTorch/ONNX, Zefr uses FiftyOne to evaluate model performance. By comparing predictions against ground truth, the team analyzes and improves key performance metrics.

"If we didn't have FiftyOne, we'd need to rebuild a ton of tooling just to keep up. FiftyOne removes that burden and lets us focus on building the models that matter." — Zachary McPherson, Director of Machine Learning, Zefr

Results: 2x more models shipped

Since implementing FiftyOne, Zefr's computer vision pipeline has enabled it to serve clients with more comprehensive brand safety coverage.

Doubled the number of classification models shipped per quarter.

Scaled to 500M+ image embeddings: With daily updates from fresh social media impressions, the FiftyOne–Qdrant integration manages half a billion images to ensure comprehensive brand safety coverage across platforms.

Annotation velocity: The shift from one-image-at-a-time annotation to bulk labeling for classification tasks in FiftyOne cut down sourcing and annotation time, directly contributing to faster dataset curation and model development.

"With FiftyOne, we've been able to ship 2x more model classifiers per quarter. It makes it really easy to go through a dataset and get a sense of whether it aligns with the policy of a classifier or not." — Zachary McPherson, Director of Machine Learning, Zefr

Integrating FiftyOne into the computer vision operation has enabled Zefr to build a scalable, efficient system for developing brand safety. With technology that protects major brands across the world's largest social media platforms, Zefr has improved the detection of unsuitable content and maintained a 99%+ brand-safety rate.

Talk to a computer vision expert