Skip to content

A Google Search Experience for Computer Vision Data

How to Use Vector Search Engines, NLP, and OpenAI’s CLIP in FiftyOne

Image, object, and natural language similarity search in FiftyOne
Image, object, and natural language similarity search in FiftyOne

Have you ever wanted to find the images most similar to an image in your dataset? What if you haven’t picked out an illustrative image yet, but you can describe what you are looking for using natural language? And what if your dataset contains millions, or tens of millions of images?

With today’s release of FiftyOne .20, you can now natively search through your computer vision data, at scale, with either images or text! What’s more, these new querying capabilities interface seamlessly with FiftyOne’s existing semantic slicing operations. Restrict your queries to filtered and matched views of your data with ease, and compose queries however you’d like.

FiftyOne .20 brings these exciting vector search capabilities:

  • Native integration with Pinecone vector search database
  • Native integration with Qdrant vector search engine
  • SDK support for querying by sample ID, query vector, or text prompt
  • UI support for querying with images extended to use vector search engines
  • UI support for querying with natural language with vector search engines

In this blog post we show you how to apply the new vector search functionality across these workflows:

Read on to learn how you can take your dataset exploration, understanding, and curation to the next level with vector search!

FiftyOne has long supported tasks that help you understand relationships between images in your dataset. With FiftyOne, you can 

At the heart of these methods lie embeddings. Because embeddings are foundational to all of the new vector search features in FiftyOne .20, it’s worth taking a moment to summarize what embeddings are and how you can work with them in FiftyOne.

Embeddings are vectors (typically between 512 and 2048 entries) that are generated one-to-one as a smaller representation of your data. For example, a 512 dimensional vector embedding can represent a 12MP image. To compute the similarity between images – and perform the subsequent similarity search – you must specify how to embed your samples.

In FiftyOne, you can do so by passing either the embedding vectors themselves, or the model with which to perform the embedding, into compute_similarity() which creates the “similarity index”, or vector index, containing embeddings for your images.

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

## get 5000 random images from Open Images V7
dataset = foz.load_zoo_dataset(
    "open-images-v7",
    max_samples = 5000,
    shuffle = True
)

## view dataset in the FiftyOne App
session = fo.launch_app(dataset)

model_name = 'mobilenet-v2-imagenet-torch'
model = foz.load_zoo_model(model_name)

Any of the following options work:

  • Store embeddings in a field: compute embeddings in a field on your samples, and point to that field.
dataset.compute_embeddings(
	model=model,
        embeddings_field = "embeddings"
)
fob.compute_similarity(dataset, embeddings = "embeddings") 
  • BYO embeddings: compute embeddings however you like, and pass these in as a two-dimensional array.
embeddings = dataset.compute_embeddings(model)
fob.compute_similarity(dataset, embeddings = embeddings)
  • Pass in a model from the FiftyOne Model Zoo
fob.compute_similarity(dataset, model = model)
  • Pass in a model name
fob.compute_similarity(dataset, model = model_name)

Once you have generated your similarity index, you can query your dataset by sample ID in Python with sort_by_similarity(). To find the 25 most similar images to the first sample in the dataset, for instance:

query = dataset.first().id
top_k_view = dataset.sort_by_similarity(
	query,
	k = 25,
	brain_key = "mobilenet"
)

session.view = top_k_view

Alternatively, we can perform the same operation in the FiftyOne App. When we tick the checkbox in the upper left corner of the first sample, the menu bar above it expands with new options, including a new image icon. Click on this icon and hit enter. It really is that easy!

Image similarity search for 25 most similar images
Image similarity search for 25 most similar images

If we wanted a view containing just the ten most similar images, we could set this by clicking on the gear icon and replacing 25 (the default) with 10.

Image similarity search for 10 most similar images
Image similarity search for 10 most similar images

When creating our similarity index, we can also choose the metric for determining closeness with the metric keyword. Additionally, what we specify as the brain_key can then be used to retrieve the similarity index. If we have used: 

fob.compute_similarity(
    dataset, 
    model = model,
    brain_key = "mobilenet"
)

We could instantiate the similarity index with:

sim_index = dataset.load_brain_results("mobilenet")

When you instantiate a similarity index, you can then use that index to find unique samples with sim_index.find_unique(), or find duplicate samples, with sim_index.find_duplicates().

Building on our recent blog post, where we showed how you can “find images with words” with CLIP embeddings, natural language queries are now natively supported, both in the FiftyOne software development kit and in the FiftyOne App. 

Any model in the FiftyOne Model Zoo that can embed both language and images (such as CLIP) can be used to search through an image dataset with text prompts. Simply pass the name of the model into your call to compute_similarity()

fob.compute_similarity(
    dataset,
    model="clip-vit-base32-torch",
    brain_key = "clip",
    metric="cosine",
)

Here, we use cosine similarity as our metric for determining closeness, and we store the similarity index with brain_key=”clip”.

In Python, we can now query our dataset with a text prompt by passing a string into sort_by_similarity() as a query. The following Python query creates a DatasetView with the 25 images that most resemble the prompt “a piece of pie”, as determined by our similarity metric:

pie_view = dataset.sort_by_similarity(
	"a piece of pie",
	k = 25,
)

## view these images in the FiftyOne App
session.view = pie_view

Alternatively, we can perform the same search in the FiftyOne App without writing a single line of code. If we reset the session (session = fo.launch_app(dataset)), we can recreate pie_view by clicking on the magnifying glass, typing our query directly into the search field that appears, and hitting enter. We can then click the bookmark icon to convert this into a view.

Natural language image search in the FiftyOne App
Natural language image search in the FiftyOne App

If you reset the view (by clicking on the magnifying glass and then clicking the reset button), and instead click on the gear icon after the magnifying glass, you will see a radio button with a single option: “clip”. This is telling you that FiftyOne is performing the natural language query using the similarity index you stored with brain_key=”clip”

In this example, we only have one similarity index on the dataset with a model that supports prompts, but we have a great deal of freedom in how we name and populate these similarity indexes. We can have different indexes for the same model with different metrics, multiple models that support text prompts, or both. If you click on the gear icon after the image icon, you will see that the radio button has two options, because there are now two similarity indexes that support image similarity searches. You can also load a model from the FiftyOne Model Zoo with custom weights, or even add your own model to the zoo!

By default, FiftyOne uses Scikit-learn (Sklearn) to compute the nearest neighbors of embedding vectors and construct the similarity index. However, the larger a  dataset gets, the harder it becomes to find the best matches to a particular vector search query. It can become so difficult, in fact, that there are no exact methods known with favorable scaling in preprocessing time, search time, and memory consumption. When your datasets start to reach millions of samples, exact nearest neighbor search can become infeasible.

But often – especially when the datasets become that large – retrieving approximately the best matches is all you need. That’s where vector search engines like Pinecone and Qdrant come in with approximate nearest neighbor (ANN) search. In particular, these vector search engines implement the hierarchical navigable small world (HNSW) algorithm.

In Finding Images with Words and Nearest Neighbor Embeddings Classification with Qdrant, we just scratched the surface of what FiftyOne plus Pinecone or Qdrant can do for your computer vision workflows. With today’s FiftyOne .20 release, you can now use these backends to generate your similarity indexes in FiftyOne, accelerating your dataset exploration and evaluation.

You can now pass a backend argument into compute_similarity() to tell FiftyOne what vector search engine to use to create the similarity index: backend=”qdrant” for Qdrant, and backend=”pinecone” for Pinecone. 

Depending on the backend, there are a variety of optional arguments you can pass in, which allow you to customize the replication factor, sharding, and the degree of approximation in approximate nearest neighbor search. For a complete discussion of these arguments, see our Qdrant integration and Pinecone integration docs. You can also write these kwargs – including your credentials – to your FiftyOne Brain config file, instead of passing them in to compute_similarity() each time you want to create a new similarity index.

Qdrant backend

To get started using Qdrant in FiftyOne, pull the pre-built Docker image from DockerHub and run the container:

docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant

You’ll also need to install the Qdrant Python client:

pip install qdrant-client

Then create a similarity index, passing in the name of the Qdrant collection to be created:

qdrant_index = fob.compute_similarity(
    dataset, 
    model = "clip-vit-base32-torch", 
    brain_key = "qdrant_clip", 
    backend="qdrant",
    metric="cosine",
    collection_name = "open-images-5000-clip"
)

Pinecone backend

To get started using Pinecone in FiftyOne, set up an account here if you don’t have one already, and copy an API key from here. Then install the Pinecone Python client:

pip install -U pinecone-client

Just like that, you can use Pinecone vector search as a backend, specifying an index_name if desired:

pinecone_index = fob.compute_similarity(
    dataset,
    model = "mobilenet-v2-imagenet-torch", 
    brain_key = "pinecone_mobile", 
    backend="pinecone",
    metric="cosine",
    index_name = "open-images-5000-mobilenet",
    api_key = "MY_API_KEY",
    environment="us-west1-gcp"
)

Note: Pinecone’s free tier only allows users a single vector index at a time. You may need to delete your existing vector index before creating one for your FiftyOne dataset.

Mix, match, frame, and patch

Native vector search on your computer vision data becomes even more valuable when you can interleave these queries with other semantic slicing operations like matching, filtering, sorting, and shuffling your data. In FiftyOne, all of these logical operations – including vector search queries – are represented as view stages, and can be combined in whatever order you’d like! 

Here are just a few examples of what you can do with vector search in FiftyOne:

Find race car tires in a general purpose dataset

There are many workflows you could use for this, for instance:

  1. Filter the dataset for images containing positive Car labels
  2. Visually identify a race car and use image similarity search to find other images with race cars
  3. Convert to object patches
  4. Use natural language search to query for object patches with the text prompt “tire”

With the random subset of Open Images, all we need to do to run this workflow is run compute_similarity() on the detection object patches with a model that supports prompts, such as CLIP:

fob.compute_similarity(
    dataset, 
    model="clip-vit-base32-torch",
    patches_field="detections",
    brain_key = "qdrant_clip_patches", 
    backend="qdrant",
    metric="cosine",
    collection_name="fiftyone-patches"
)
Using image, object, and text-based search in FiftyOne to find race car tires
Using image, object, and text-based search in FiftyOne to find race car tires

Find video frames with cars in an intersection

Again, there are many ways to do this. Perhaps the simplest is to use FiftyOne’s ToFrames ViewStage to generate a view containing one image per frame across all video samples. You can then construct a similarity index for this view using compute_similarity(), and query with the text prompt “cars in an intersection”:

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-video")
res = fob.compute_similarity(
    dataset.to_frames, 
    model = "clip-vit-base32-torch", 
    brain_key = "qdrant_clip_video", 
    backend="qdrant",
    metric="cosine",
)

session = fo.launch_app(dataset)
Natural language video search in FiftyOne to find video frames with cars in an intersection
Natural language video search in FiftyOne to find video frames with cars in an intersection

Conclusion

In this blog post, we’ve introduced just a few of the myriad ways you can leverage vector search natively in FiftyOne to accelerate your computer vision workflows. Whether you want to query your computer vision data with images, natural language, or raw numerical vectors, FiftyOne has you covered. 

We’ve made it easy to search at scale with Qdrant or Pinecone, and you can query images, video frames, or object patches, intertwining vector search queries with filtering and matching operations to your heart’s desire. 

Explore your computer vision data like never before!

Join the FiftyOne community!

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!