Navigate The Space of Concepts with Text and Images

Welcome to week nine of Ten Weeks of Plugins. During these ten weeks, we will be building a FiftyOne plugin (or multiple!) each week and sharing the lessons learned!

If you’re new to them, FiftyOne Plugins provide a flexible mechanism for anyone to extend the functionality of their FiftyOne App. You may find the following resources helpful:

FiftyOne Plugins Repo
FiftyOne Plugin Docs
Plugins Channel in the FiftyOne Community Slack

What we’ve built so far:

Week 0: 🌩️ Image Quality Issues & 📈 Concept Interpolation
Week 1: 🎨 AI Art Gallery & Twilio Automation
Week 2: ❓Visual Question Answering
Week 3: 🎥 YouTube Player Panel
Week 4: 🪞Image Deduplication
Week 5: 👓Optical Character Recognition (OCR) & 🔑Keyword Search
Week 6: 🎭 Zero-shot Prediction
Week 7: 🏃 Active Learning
Week 8: ⏪ Reverse Image Search

Ok, let’s dive into this week’s FiftyOne Plugin - Concept Traversal!

💪The Concept Traversal Plugin was a joint effort between myself and the amazing Ibrahim Manjra 💪

Concept Traversal Plugin 🧠🗺️🌌

Right before the Ten Weeks of Plugins journey began, I created a “Concept Interpolation” Plugin. This plugin made it possible to specify two concepts via text prompts, e.g. “rainy” and “sunny”, and interpolate between them by changing the value of a slider. Behind the scenes, executing the operation generated embedding vectors for the two text prompts, combined them into a new vector, normalized the vector, and performed a similarity search on the dataset with this vector.

While Concept Interpolation itself is cool, the lasting impact of this Week 0 plugin was that it opened the floodgates to new multimodal ways of interacting with data in FiftyOne. If we could combine text concepts to move along one “axis” in concept space, why not allow arbitrary navigation, or traversal, in concept space?

Here’s a simple, if silly, example: start from a picture of a Siberian husky, and add text concepts like 10% chihuahua and 15% golden lab, and retrieve the images from the dataset that best match this newly formed vector.

To this end, Ibrahim and I built a Concept Traversal Plugin. Select an initial image from your dataset as the starting point, and iteratively add concepts in different quantities, moving around in the space of multimodal embeddings as you go!

💡This is an experimental plugin. The mathematics of vector addition for multimodal embeddings are not guaranteed to be meaningful. Rather, the plugin is meant to inspire members of the community to try new approaches to interacting with their data.

Plugin Overview & Functionality

The Concept Traversal Plugin is a joint Python/JavaScript plugin with three operators:

open_traversal_panel: opens the Concept Traversal Panel.
traverse: runs the traversal on the dataset with the specified concepts from the given starting image.
get_sample_url: returns the URL for a sample from its Sample ID

For this walkthrough, I’ll be using the test split of the COCO 2017 dataset.

Creating the Similarity Index

As with FiftyOne’s core similarity search functionality, to run reverse image search on your dataset, you first need to have a similarity index. Importantly, the model your similarity index uses must support both image and text embeddings!

You can generate a similarity index by running compute_similarity()on your dataset from Python, specifying a model from the FiftyOne Model Zoo, and a vector search engine backend to use to construct the index. Here we use a CLIP model to compute embeddings, and Milvus as our vector database:

Alternatively, you can compute similarity from within the FiftyOne App:

Traversing Concept Space

Once you have generated a multimodal similarity index on your dataset, you can open the Concept Traversal Panel and begin your exploration!

Select the name of the similarity index you’d like to use (if you only have one, this will be the only option), and set the number of retrieval results you would like returned by the vector search engine each time you update your position in concept space.

Next, select an image from your dataset to use as the starting point from which you will traverse. Once you set the image, a preview of the image should appear in the panel, and should stay there even when that image is no longer visible in the sample grid:

Now you are ready to move in concept space! Add a text concept in the first text box, and set a relative strength for the concept — how far you want to move in that direction. When you begin typing in this box, another row will appear below, where you can add another concept. Feel free to add as many concepts as you’d like, each with their own relative strengths.

At the bottom, you will see an absolute strength for the text concepts. This scales the total distance traversed in the embedding space. Depending on your dataset and model, you may need to play around with this multiplier to get reasonable results.

As you update your position in the concept space (any time you change one of the text prompts or scaling factors), a vector search will be performed on the dataset using this updated query vector.

Lessons Learned

Displaying Sample Images in Panels

For last week’s Reverse Image Search Plugin, I created an imagePreview React component which displayed a preview of the image we were running the reverse image search against. This worked whether we were searching against an image in our local filesystem, or via URL.

Displaying an image selected from a sample in our dataset, however, is a bit more complex. When you select a sample in the sample grid, the sample’s ID is accessible to your operator. For a Python operator, it is present in ctx.selected, and fos.selected in JavaScript operators, with import * as fos from "@fiftyone/state". The difficulty lies in retrieving the sample from the state. Fortunately, Ibrahim came to the rescue to help make this happen.

The solution involves passing the Sample ID to a Python operator, get_sample_url. This operator queries the dataset for the sample, extracts the filepath from the sample, and then maps this filepath to a (locally hosted) URL.

Creating a Flexible API

When building a combined Python/JavaScript plugin for FiftyOne, typically it makes sense to handle the interactivity and UI in JavaScript, and the data and compute-heavy processing in Python. In practice, that means writing a Python operator which takes a dictionary of inputs, and then writing JavaScript code to execute that Python operator.

Here’s an example of what the connection between Python operator and JavaScript code looked like for the Concept Interpolation plugin:

I thought that this meant that the parameters passed from JavaScript had to be very tightly constrained.

It turns out that this connection is a little more flexible than I originally thought. In particular, you can pass lists of arbitrary length from JavaScript into Python! Because the type of the variable is the same regardless of the list’s length, you can encode different quantities of information depending on how you fill the list. In this Concept Traversal Plugin, I just used this flexibility to allow for arbitrarily many text concepts:

Conclusion

Over the past nine weeks, we’ve covered a lot of ground. We’ve built plugins to automate data ingestion, accelerate your data labeling workflows, and enable new modes of interacting with and exploring your data. This is by far the most experimental plugin we have built, and it feels fitting that a plugin centered around traversing the space of concepts pushes us to reconceptualize the plugin itself. Plugins aren’t just a medium for incorporating existing workflows into FiftyOne; they provide a platform for testing out new workflows!

Tune in next week for the final installment of Ten Weeks of Plugins! You can track our journey in our ten-weeks-of-plugins repo — and even though these ten weeks are coming to an end, I encourage you to fork the repo and start your own journey!

Talk to a computer vision expert