Skip to content

The Computer Vision Interface for Vector Search

Search through a billion images with a single line of code

There’s too much data. Data lakes and data warehouses; vast pastures of pixels and oceans teeming with text. Finding the right data is like searching for a needle in a haystack!

Vector search engines solve this problem by transforming complex data (raw pixel values of an image, characters in a text document) into entities called embedding vectors. These numerical vectors are then indexed, so that you can efficiently search against the raw data. It’s no surprise that vector search engines like Qdrant, Pinecone, LanceDB, and Milvus have become essential components in almost any new AI application.

If you’re working with image or video data and you want to incorporate vector search into your workflows, there can be quite a bit of overhead: 

  • How do you implement cross-modal retrieval like searching for images with text? 
  • How do you incorporate traditional search filters like confidence thresholds or class labels? 
  • What about searching over the objects (people, cats, dogs, cars, bikes, …) within your images? 

These are just a few of the many challenges you will encounter.

Wait. Stop. Hold your horses. There’s a better way…

FiftyOne is the computer vision interface for vector search. The FiftyOne open source toolkit now features native integrations with Qdrant, Pinecone, LanceDB, and Milvus so you can use your preferred vector search engine to efficiently search your visual data in a single line of code.

Want to find 25 images most similar to the second sample in your dataset with one click? Want to find images of traffic that contain at least one person and one bicycle with a click? You can!

How Does It Work?

1. Load your dataset.

For the purposes of illustration, we’ll load a subset of the MS COCO validation split.

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset(
    "coco-2017", 
    split='validation', 
    max_samples = 1000
)
session = fo.launch_app(dataset)

2. Generate the similarity index.

In order to search against our media, we need to index the data. In FiftyOne, we can do this via the compute_similarity() function. Specify the model you want to use to generate the embedding vectors, and what vector search engine you want to use on the backend. You can also give the similarity index a name, which is useful if you want to run vector searches against multiple indexes.

## setup lancedb
pip install lancedb

## generate a similarity index
## with default model embeddings
## using LanceDB backend
fob.compute_similarity(
    dataset,
    brain_key="lancedb_index",
    backend="lancedb",
)

## setup milvus
## download and start docker container +
pip install pymilvus

## generate a similarity index
## with CLIP model embeddings
## using Milvus backend
fob.compute_similarity(
    dataset,
    brain_key="milvus_clip_index",
    backend="milvus",
    metric="dotproduct"
)

3. Search against the index.

Now you can run image searches across your entire dataset with a single line of code using the sort_by_similarity() method. To find the 25 most similar images to the second image in our dataset, we can pass in the ID of the sample, the number of results we want returned, and the name of the index we want to search against:

## get ID of first sample
query = dataset.skip(1).first().id

## find 25 most similar images with LanceDB backend 
sim_view = dataset.sort_by_similarity(
    query,
    k=25,
    brain_key="lancedb_index"
)

## display results
session = fo.launch_app(sim_view)

You can also do this entirely via UI in the FiftyOne App:

Semantic Search Made Simple

Gone is the hassle of handling multimodal data. If you want to semantically search your images using natural language, you can use the exact same syntax! Use a multimodal model like CLIP to create your index embeddings, and then pass in a text query instead of a sample ID:

## semantic query
query = "kites flying in the sky"

## find 30 most similar images with Milvus backend 
kites_view = dataset.sort_by_similarity(
    query,
    k=30,
    brain_key="milvus_clip_index"
)

## display results
session = fo.launch_app(kites_view)

This can be especially useful in unstructured data exploration, and digging deeper into your data than existing labels would otherwise allow.

This, as well, can be executed entirely in the FiftyOne App:

Pass Along the Prefilters

Running vector searches on specific subsets of your data typically involves writing complicated prefilters: filters which are passed into the vector search engine to be applied to the dataset prior to vector search.

FiftyOne’s vector search integrations take care of these details for you!

If you want to find images that look like “traffic”, but only want this search to be applied to images with a person and a bicycle, you can do this by calling sort_by_similarity() on the filtered view:

## create filtered view
view = dataset.match_labels(F("label").is_in(["person", "bicycle"]))

## search against this view
traffic_view = view.sort_by_similarity(
    "traffic",
    k=25,
    brain_key="milvus_clip_index"
)
session = fo.launch_app(traffic_view)

Get Your Things in Order

All of the aforementioned functionality also works out of the box with object detection patches! 

When generating a similarity index, all you need to do is pass in the patches_field argument — naming the label field where the “objects” can be found — and compute_similarity() will generate embedding vectors for each object across all of your images. The vector database indexes these patch embeddings so that you can sort these detections by similarity to a reference object, or a natural language query:

## setup qdrant
# pull and start docker container +
pip install qdrant-client

## create a similarity index for ground truth patches
## with CLIP model, indexed with Qdrant vector database
fob.compute_similarity(
    dataset,
    patches_field="ground_truth",
    model="clip-vit-base32-torch",
    brain_key="qdrant_gt_index",
    backend="qdrant"
)

## Search for the object that looks most like a tennis racket
tennis_view = dataset.to_patches("ground_truth").sort_by_similarity(
    "tennis racket",
    k = 25,
    brain_key= "qdrant_gt_index"
)

session = fo.launch_app(tennis_view)

Conclusion

No matter how many images or videos you have, you need to be using vector search. FiftyOne’s native vector search integrations will make your life easier. With FiftyOne, similarity searches are as straightforward as applying more traditional filter and query operations. Mix and match vector search queries with metadata queries to your heart’s content. 

Next Steps

If you’re interested in vector search and computer vision, come to the Virtual Computer Vision Meetup on July 13th at 10AM PT, which will be focused entirely on vector search! You can register here.

For comprehensive guides on working with each vector search backend, check out FiftyOne’s integration docs:

For general information about vector search in FiftyOne, check out sorting by similarity in the FiftyOne App, and the FiftyOne Brain User Guide on similarity.

If you like the open source machine learning library FiftyOne, show your support by giving the project a ⭐ on GitHub (3,900 stars and counting!) 

Thanks to the Qdrant and Pinecone teams for contributing integrations to FiftyOne 0.20, and thank you to Ayush Chaurasia and the LanceDB team, and Filip Haltmayer and the Milvus team for contributing vector search engine integrations to the FiftyOne ecosystem in Fiftyone 0.21.3!