Welcome to our weekly FiftyOne tips and tricks blog where we give practical pointers for using FiftyOne on topics inspired by discussions in the open source community. This week we’ll cover
embeddings.
Wait, What’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
[@portabletext/react] Unknown block type "externalImage", specify a component for it in the `components.types` prop
Ok, let’s dive into this week’s tips and tricks!
A primer on embeddings
In computer vision, embeddings are used to represent visual features of images or videos as numerical vectors. These vectors, which are typically lower in dimension than the raw source media, capture underlying characteristics of the visual data and can be used for downstream tasks such as image classification, object detection, and image retrieval.
In FiftyOne, embeddings are managed by the
FiftyOne Brain, a library that provides powerful machine learning techniques designed to transform how you curate your data from an art into a measurable science. Using embeddings makes it easy to analyze and visualize datasets, and accelerates your ML workflows.
Continue reading for some tips and tricks to help you master embeddings in FiftyOne!
Compute sample properties (uniqueness, similarity, and more) from embeddings
If you want to harness the power of embeddings in computer vision but aren’t sure where to start, FiftyOne has you covered. Any FiftyOne Brain method that utilizes embeddings is equipped with a default general purpose embedding model.
This means that in many cases you can use Brain methods out of the box to assess properties of the samples in your dataset.
For instance, when training a model, unique samples will help your model to understand and represent distributions in your dataset. To aid in this process, the FiftyOne Brain provides a method to compute the “uniqueness” of images, helping you to answer questions such as “What data should I select to annotate?”. Here’s how easy it is to compute the uniqueness of images in the quickstart dataset:
In a similar vein, when constructing a dataset or training a model, have you ever wanted to find similar examples to an image or object of interest? The FiftyOne Brain makes computing visual similarity easy, too. You can compute the similarity of samples in your dataset using the default (
MobileNetv2) embedding model and store the results in the brain key
image_sim
:
You can then sort your samples by similarity or use this information to find potential duplicate images.
Visualize embeddings
Another application of embeddings in computer vision is visualization. By pairing embeddings with a dimensionality reduction technique such as
uMAP or
tSNE, you can plot your data in a low dimensional space, allowing you to identify clusters and better understand the patterns in your data.
In FiftyOne, all of the logic of combining embeddings, dimensionality reduction techniques, and plotting are wrapped in the easy to use interface of the FiftyOne Brain’s compute_visualization()
method.
If we want to generate default embeddings for the
Quickstart dataset and use them to visualize our dataset, employing uMAP to reduce the dimensionality of the embedding vector down to two dimensions, we can run:
Then we can visualize these results, coloring by label:
Use your own embedding model
If you’re working with more specialized or application-specific computer vision data, it might be beneficial to use your own, custom model to generate embeddings. Fortunately, all of the Brain methods that utilize embeddings, from compute_uniqueness()
and compute_similarity()
to compute_visualization()
support this!
Any FiftyOne Brain method that works with embeddings allows you to specify your own embeddings in three ways.
The first two ways involve the optional embeddings
argument, which is useful if you have already precomputed your embeddings. You can either give the precomputed embeddings explicitly as input in the form of an num_samples x num_embedding_dims
array, or you can pass in the name of a field in the dataset containing the embeddings:
As an alternative, if you have not yet computed your embeddings, you can pass in a model for FiftyOne to use to compute the embeddings:
Any model that has logits should work, and you can check if the model supports embeddings by verifying that print(model.has_embeddings)
returns True
.
Compute embeddings for object patches
Just as you can compute embeddings for images, FiftyOne allows you to compute embeddings for object patches, with the built-in method compute_patch_embeddings()
functioning in analogous fashion to compute_embeddings()
. To compute embeddings for ground truth objects, we just use ”ground_truth”
as the patches field:
Alternatively, Brain methods like compute_similarity()
and compute_visualization()
work natively with patches when you specify the patches_field
argument, telling the Brain which labels to use:
Perform zero-shot classification with CLIP
Over the past few years, progress in embeddings has enabled a suite of new applications and use cases in computer vision. One of the most significant advances in this area has been the development and honing of techniques which embed multimodal data into a common embedding space.
In particular, with
OpenAI’s CLIP model pioneering contrastive language-image pre-training, it is possible to transfer knowledge from one domain to another. This paved the way for dramatically improved zero-shot computer vision models.
Here’s an example of performing zero-shot classification on
COCO validation data with custom class labels:
Join the FiftyOne community!
Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!