Skip to content

VoxelGPT: Your AI Assistant for Computer Vision

Want to surface interesting insights about your image and video datasets without writing code? Now you can with VoxelGPT!

VoxelGPT combines the power of large language models (LLMs) with FiftyOne’s flexible computer vision query language, making it easier than ever to semantically slice computer vision datasets and build better machine learning models.

This breakthrough gives you unprecedented control over your data via natural language. You can filter your image and video datasets, uncover new insights, and become a better computer vision engineer, all without writing a single line of code!  

Yes, you read that right: VoxelGPT translates your English queries into Python code that filters your datasets for you!

Even better, VoxelGPT is completely open source and free to use. The code is located at https://github.com/voxel51/voxelgpt and you can try it out live at gpt.fiftyone.ai. Want to use VoxelGPT locally on your own data? You can install the plugin locally (read on for instructions).

What is VoxelGPT

VoxelGPT is an LLM-powered application built with FiftyOne, GPT-3.5, and LangChain. VoxelGPT provides a chat-like interface for interacting with your computer vision dataset by translating natural language queries into FiftyOne Python syntax and constructing the resulting DatasetView

This is a game-changer because mastering the FiftyOne query language, with all of its flexibility, can have a steep learning curve. With VoxelGPT, you can immediately wield the full power of FiftyOne to semantically slice your data without any prior knowledge of the query language.

What’s more, VoxelGPT can also answer specific FiftyOne usage questions and machine learning questions more broadly.

You can use VoxelGPT via a simple Python API, or you can use VoxelGPT natively inside the FiftyOne App by installing it as a FiftyOne Plugin

Try VoxelGPT for yourself at gpt.fiftyone.ai

VoxelGPT won’t do your computer vision work for you, but it can certainly increase your speed and efficiency. It is a pair programmer, a translator, and an educational tool all rolled into one.

VoxelGPT capabilities

VoxelGPT is capable of handling any of the following types of queries:

When you ask VoxelGPT a question, it will interpret your intent, and determine which type of query you are asking. If VoxelGPT is unsure, it will ask you to clarify.

Dataset queries

How does it work? VoxelGPT interprets your query, translates it into FiftyOne query language Python code, and displays the resulting view. It knows how to work with datasets where the samples are images or videos, and has full support for one and two-stage queries based on the set of examples on which it was developed.

When interpreting your query, VoxelGPT will do the following:

  • Recognize field and class names: VoxelGPT is able to select the appropriate fields and class names based on the natural language query and information about the specific dataset. It uses named entity recognition to identify fields and class names, plus semantic matching for class names for fields with fewer than 1,000 classes.
  • Infer relevant computations: VoxelGPT determines whether Brain runs or Evaluation runs might be relevant to a natural language query and, if they are, automatically selects the relevant runs.
  • Print helpful messages: If VoxelGPT determines that a computation may need to be run on the dataset, or that the query does not contain all information required for conversion to ViewStages, it will respond with a message indicating this.

Here are some examples of dataset queries you can ask VoxelGPT—try them out live at gpt.fiftyone.ai:

  • Retrieve 10 random samples
  • Display the most unique images with a false positive prediction
  • Just the images with at least two people detected with high confidence
  • Show the 25 images that are most similar to the first image with a dog

FiftyOne docs queries

VoxelGPT is not just a pair programmer; it is also an educational tool. The model has access to the entire FiftyOne docs – including tutorials, user guide, and the API reference—and can use this information to answer your questions.

Here are some examples of documentation queries you can ask VoxelGPT:

  • How do I load a dataset from the FiftyOne Zoo?
  • Docs: What does the match() stage do?
  • Can I export my dataset in COCO format?

By effortlessly switching between dataset queries and docs queries, you can use VoxelGPT to better understand how the FiftyOne query language works. 

General computer vision queries

VoxelGPT can also answer general questions in computer vision, machine learning, and data science. It can help you to understand basic concepts and overcome data quality issues.

Here are some examples of computer vision queries you can ask VoxelGPT:

  • What is the difference between precision and recall?
  • How can I detect faces in my images?
  • What are some ways I can reduce redundancy in my dataset?

What VoxelGPT cannot do

While VoxelGPT is powerful, we have purposely limited its scope to provide a focused user experience while also leaving the door open for high value upgrades in the future.

VoxelGPT is currently not able to:

  • Have general conversations: VoxelGPT is not a generic chatbot. If your query is deemed out of scope, VoxelGPT will prompt you for a new query.
  • Perform computations: Some computations, such as generating a vector similarity index, may be expensive and time consuming. We don’t (yet) give VoxelGPT the power to do these computations on your behalf, but VoxelGPT will recognize when they may need to be run, and will tell you as much.
  • Permanent operations: In a similar vein, VoxelGPT is not yet able to permanently change your data—for instance, it cannot delete samples from the underlying dataset, copy a dataset, or move the locations of any of your media files.
  • Delegate tasks: At present, VoxelGPT is not equipped with any HuggingGPT-style dispatching capabilities. It cannot delegate computer vision tasks to other ML models.

Generalizability and feedback

VoxelGPT’s current implementation is based on a limited set of examples, so it may not generalize well to all data. The more specific your query, the better the results will be. If you have a more involved task, see if you can split it up into multiple natural language queries and combine VoxelGPT’s results.

We’d love to hear from you about how VoxelGPT performs on your use cases. Bugs, feature requests, and surprising findings that VoxelGPT uncovered are all welcome. Please drop us a line!

Getting up and running

Live demo

If you want to experience VoxelGPT and see for yourself how the model turns natural language into computer vision insights, check out the live demo at gpt.fiftyone.ai, where you can use VoxelGPT natively in the FiftyOne App on a few example datasets.

Install VoxelGPT locally

You can also interact with VoxelGPT programmatically in Python and/or work with your own datasets by installing it locally by following the instructions below.

Install FiftyOne 

If you haven’t already, install FiftyOne:

pip install fiftyone

Provide an OpenAI API key

Next provide an OpenAI API key:

export OPENAI_API_KEY=XXXXXXXX

 If you do not have an OpenAI key, you will need to create one.

A single query typically costs only $0.01 in OpenAI API calls

App-only use

Now if you only want to use VoxelGPT in the FiftyOne App, you can install it as a plugin by running:

fiftyone plugins download https://github.com/voxel51/voxelgpt
fiftyone plugins requirements @voxel51/voxelgpt --install

Python use

Alternatively, if you want to programmatically interact with VoxelGPT – or you want to contribute to the project – then clone voxelgpt the repository:

git clone https://github.com/voxel51/voxelgpt
cd voxelgpt

and install the requirements:

pip install -r requirements.txt

To make the plugin available within the FiftyOne App as well, you can symlink it into your FiftyOne plugins directory:

ln -s "$(pwd)" "$(fiftyone config plugins_dir)/voxelgpt"

Using VoxelGPT in the App

With VoxelGPT installed, you can use it natively within the FiftyOne App with any dataset:

import fiftyone as fo
import fiftyone.zoo as foz

## load quickstart dataset. 
## -- If you want to load another dataset from the Zoo,
## -- Ask VoxelGPT: "what datasets are in the FiftyOne Dataset Zoo?"

dataset = foz.load_zoo_dataset("quickstart")
session = fo.launch_app(dataset)

In the FiftyOne App, simply:

  • Click on the OpenAI icon above the grid, or
  • Press the + icon next to the Samples tab and select VoxelGPT

Pro tip: use the now keyword to incorporate your previous prompts as context for your next query!

Using VoxelGPT in Python

If you opted for a Python/development install, you can also interact with VoxelGPT programmatically in Python. 

Interactive mode

You can use VoxelGPT in interactive mode via ask_voxelgpt_interactive(). In interactive mode, you can converse with VoxelGPT via input() prompts.

To get started, just pass in a dataset and an App session:

import fiftyone as fo
import fiftyone.zoo as foz

from voxelgpt import ask_voxelgpt_interactive

dataset = foz.load_zoo_dataset("quickstart")
session = fo.launch_app(dataset)

ask_voxelgpt_interactive(dataset, session=session)

As usual, you can prompt VoxelGPT with any combination of dataset, documentation, and general computer vision queries. For example, a conversation might look like:

You:

    what is the difference between precision and recall?

VoxelGPT:

    Precision and recall are two important metrics used to evaluate the

    performance of a machine learning model in classification tasks. Precision

    measures the proportion of true positive predictions among all positive

    predictions made by the model. Recall, on the other hand, measures the

    proportion of true positive predictions among all actual positive instances

    in the dataset.

    In other words, precision is the ability of the model to correctly identify

    positive instances, while recall is the ability of the model to identify

    all positive instances. A high precision means that the model is making

    very few false positive predictions, while a high recall means that the

    model is able to identify most of the positive instances in the dataset.

    It is important to balance precision and recall when evaluating a model's

    performance. A model with high precision but low recall may be missing

    important positive instances, while a model with high recall but low

    precision may be making too many false positive predictions.

    In the context of the FiftyOne library, precision and recall can be

    computed using the `fiftyone.utils.metrics` module, which provides

    functions for computing various classification metrics. For example, the

    `fiftyone.utils.metrics.binary_classification` function can be used to

    compute precision and recall for binary classification tasks.

You:

    does FiftyOne integrate with Label Studio?

VoxelGPT:

    Yes, FiftyOne integrates with Label Studio, which is an open-source data

    labeling tool with a user-friendly UI. The integration allows you to easily

    upload your data directly from FiftyOne to Label Studio for labeling. You

    can get started with Label Studio through a simple pip install to get a

    local server up and running. FiftyOne provides simple setup instructions

    that you can use to specify the necessary account credentials and server

    endpoint to use. FiftyOne also provides an API to create projects, upload

    data, define label schemas, and download annotations using Label Studio,

    all programmatically in Python.

You:

    show me predicted airplanes

VoxelGPT:

    Identified potential algorithms: evaluation

    No evaluation runs found.

    Identified potential fields: predictions

    Identified potential label classes: {'predictions': ['airplane']}

    Identified potential view stages: ['match', 'filter_labels', 'match_labels', 'exclude_labels', 'filter_field']

    Okay, I'm going to load dataset.filter_labels("predictions",F("label") == "airplane")

You:

    now only show me the first 10 samples

VoxelGPT:

    Identified potential algorithms: evaluation

    No evaluation runs found.

    Identified potential fields: predictions

    Identified potential label classes: {'predictions': ['airplane']}

    Identified potential view stages: ['match', 'limit', 'limit_labels', 'skip', 'sort_by']

    Okay, I'm going to load dataset.match(F("predictions.detections.label").contains("airplane")).limit(10)

You: exit

In interactive mode, VoxelGPT automatically loads any views it creates in the App, and you can access them via your session object:

print(session.view.count("predictions.detections"))

Single query mode

If you just want to run a single query, you can use ask_voxelgpt():

from voxelgpt import ask_voxelgpt
ask_voxelgpt("Does FiftyOne integrate with CVAT?")
Yes, FiftyOne integrates with CVAT, which is an open-source image and video

annotation tool. You can upload your data directly from FiftyOne to CVAT to

add or edit labels. FiftyOne provides simple setup instructions that you can

use to specify the necessary account credentials and server endpoint to use.

CVAT provides three levels of abstraction for annotation workflows: projects,

tasks, and jobs.

If you pass a dataset along with your query and VoxelGPT interprets your prompt as a request to load a view into your dataset, it will be returned to you:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")
view = ask_voxelgpt("show me 10 random samples", dataset)

From there you can then interact with or further refine the view as you would any other DatasetView in FiftyOne.

num_objects = view.count_values("ground_truth.detections.label")

Conclusion

VoxelGPT brings the power of large language models to your computer vision datasets. It is entirely open source and you can try it out today at gpt.fiftyone.ai

If you like VoxelGPT, consider showing your support by giving voxelgpt a star on GitHub. And while you’re at it, give fiftyone a star too; it’s all open source! We appreciate your support 🙂

Join the FiftyOne community

Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!