TLDR; Unlabeled video search is notoriously hard. This post outlines a novel workflow that combines FiftyOne, Twelve Labs, and Databricks Vector Search to unlock fast, semantic video search and exploration, no manual labels required.

Find the exact video clip in seconds

Navigating massive video datasets to find the right clip for your Machine Learning problem can feel like an unattainable task. Most videos come without labels or metadata, making it nearly impossible to locate the samples you need, especially when dealing with thousands or even millions of hours of footage. And traditional tools fall short when it comes to video search and curation.

What if there were a smarter way to search videos? Something that understands content without needing manual labels or tags and surfaces the exact clip you’re looking for?

Recent advances in AI and data infrastructure are making this possible by combining:

Rich video embeddings from foundation models that capture semantic content
Fast similarity search over those embeddings to find matches across large datasets
User-friendly visualization for exploring and curating the results

In this blog, we introduce a high-level workflow that addresses these challenges by combining three tools: FiftyOne, Twelve Labs, and Databricks Vector Search. This stack streamlines video understanding in ML workflows and enables users to index video content, search it by example or by natural language, and visualize the findings with ease. The result is a single, smooth pipeline for working with video data at scale, from ingestion and curation to semantic search and analysis.

Before we go into the details, let’s get a bit more familiar with these tools.

FiftyOne – visualize and curate your video data

FiftyOne is a toolset for building high-quality computer vision datasets and models from Voxel51. It acts as a refinery for visual data so development teams can analyze model performance, uncover data gaps, find edge cases, and curate better datasets.

With FiftyOne, users can interactively load and explore videos, inspect frames, and refine datasets with precision. It supports vector search to run semantic queries directly in the UI and integrates with embedding tools and vector DBs, making it the perfect hub for this workflow.

Twelve Labs – multimodal embeddings

Twelve Labs is a platform that provides state-of-the-art multimodal video understanding through its foundation models and APIs. It turns raw videos into rich vector embeddings that capture the full semantic essence, visuals, actions, audio, and even text of your data.

With TwelveLabs, you can search videos with natural language or images. Looking for clips of “a car stopping at a red light”? You can just type in this query. Twelve Labs handles the understanding and returns the necessary output, without the need for a user to train anything.

Databricks Vector Search – scalable indexing

Databricks Vector Search, offers a managed, serverless way to store and query vector embeddings at scale.

Fully integrated into the Databricks Data Intelligence Platform, it serves as the scalable embedding index and lets you feed in embeddings (such as video vectors from TwelveLabs) to retrieve the most relevant matches by querying with a vector.

Putting it together in a workflow

In our workflow:

Twelve Labs provides the multimodal intelligence needed to understand video data and handles the heavy lifting of analyzing frames, audio, and text in videos to return embeddings.
Databricks Vector Search is where the Twelve Labs embeddings go and serves as the scalable backend, indexing those embeddings for fast similarity search.
FiftyOne acts as the central hub, orchestrating the entire workflow—from loading and curating video datasets to querying results to Databricks Vector Search and visualizing search matches.

One of the most impressive aspects of this workflow is its setup simplicity, thanks to the tight integration of the three tools. In the past, implementing something like this might have required stitching together code from multiple libraries, setting up your own vector database, and dealing with a lot of glue logic. Here, much of the heavy lifting is abstracted away.

The section below provides an overview of all the steps, but you can find detailed step-by-step instructions in this example notebook.

Connect to Databricks and create a similarity index
Establish a connection from your environment (e.g. a notebook or script) to your Databricks workspace where vector search is enabled. In Databricks, you create a new vector search endpoint (index) that will store the video embeddings. Refer to Databricks Vector Search integration docs.

Load your video dataset into FiftyOne
Using FiftyOne’s dataset abstractions, you import the videos and any available metadata or labels. In FiftyOne, you can explore the videos, inspect frames, and do basic curation (filtering, viewing distributions of any labels etc)

Generate video embeddings with Twelve Labs
Using the Twelve Labs integration, it only takes a single command or function call to compute embeddings for all samples in the FiftyOne dataset. They can later be compared with embeddings of text queries or images, enabling cross-modal search. It's super easy to retrieve embeddings in code, too.

Index those vectors in Databricks
Create a similarity index in Databricks and upload the vectors through FiftyOne’s API. See docs for more details. From here on, you don’t need to loop over all videos for search; you can simply query the index!

Search by image or text
With the index in place, you can now perform semantic search queries against your video data by image or text. Pick a sample frame and find similar clips across your dataset or search using natural language like “person walking a dog at night.”

Visualize and refine results in FiftyOne
The final step brings everything together in FiftyOne’s interactive UI, where you can review, refine, and iterate on search results. Extract visual data insights by flag false positives, adjust queries, and re-run all these steps in one place.

Check out the details in this notebook.

Next steps

If working with video data has been a challenge, try this integrated approach.

Here’s how you can get started.

Clone the example notebook
Get your free API keys from Twelve Labs
Load your dataset into FiftyOne
Run your first query

Conclusion

Working with video at scale doesn’t have to be overwhelming. With the combined power of FiftyOne, Twelve Labs, and Databricks, you can easily search and explore video content. Try the workflow, check out the docs, and see how it can level up your video understanding.

We’re excited to see what the community will build and discover!

Talk to a computer vision expert