Recapping the Journey from Zero to Plugin Hero
Back in June, with the release of FiftyOne 0.21, the open source computer vision library FiftyOne became more flexible, customizable, and extensible than ever with the introduction of operators. With this simple, yet powerful feature, we unlocked limitless possibilities for anyone to customize their FiftyOne experience exactly the way they want.
These operators form the basis for the plugin system in FiftyOne. As a user, you can write a FiftyOne plugin in Python, JavaScript, or both, in order to tailor your experience in FiftyOne for your specific use cases. If you can dream it up, you can make it happen in the FiftyOne App – the sky’s the limit!
Ten weeks ago, I embarked on a journey to understand and explore the depths of the FiftyOne plugin framework. I knew the framework was powerful — it had enabled the creation of VoxelGPT — but I didn’t know what else was possible. At the start of this journey, I’d never written any type of plugin before, nevermind a FiftyOne plugin. And I’d never written any JavaScript or TypeScript code!
Nevertheless, I set out with the intention of building one plugin each week for ten weeks. My initial goal was to build the custom functionality I wanted in the FiftyOne App, wrapping many of my commonly executed code blocks in Python operators so I could run these commands without digging up each snippet every time I wanted to use it. However, my Ten Weeks of Plugins journey turned into so much more.
In this post, I want to summarize all of the plugins I managed to build throughout this journey, and share the lessons I learned along the way. The plugins below are presented in chronological order (week by week). Very roughly, the plugins and lessons learned build upon each other, but if you’d like, feel free to skip ahead to any topic that interests you.
- 🌩️Image Quality Issues
- 📈Concept Interpolation
- 🎨Build Your Own AI Art Gallery
- 📲Data Ingestion with Twilio
- ❔Visual Question Answering
- 📺In-App Youtube Player
- 🪞Image Deduplication
- 🔡Optical Character Recognition with PyTesseract
- 🔑Keyword Search
- 🔮Zero-Shot Prediction
- 🏃Active Learning
- ⏪Reverse Image Search
- 🌌Concept Space Traversal
- 🛠️Plugin Management and Development
- 🔊Audio Retrieval with ImageBind
- 🔎Semantic Document Search
🌩 Image Quality Issues (Week 0)
GitHub Repo: https://github.com/jacobmarks/image-quality-issues
Plugin Overview
One of the most pervasive problems in computer vision is constructing a high quality dataset to use for model training. Some aspects of dataset construction are subtle. But some issues are clear as day.
This plugin allows you to find common image quality issues plaguing your dataset, including low entropy (low information content), low or high contrast, bright and dark images, and images that are blurry. You can specify the threshold for designating an image as having a certain issue, or use preset defaults. Relevant computations for given diagnoses will be run if necessary.
Lessons Learned
- Streamline Common Workflows: whether you are computing entropy, brightness, or a custom metric, you are best served by wrapping this code in a Python operator which you can access from the FiftyOne App!
- Interactivity Retains Flexibility: instead of choosing a set threshold for diagnosing an image with a certain issue, it is much more powerful and flexible to use a slider (or some other input mechanism) to let the user specify the value.
📈 Concept Interpolation (Week 0)
GitHub Repo: https://github.com/jacobmarks/concept-interpolation
Plugin Overview
With natural language image search, you can type in a text prompt and find the images in your dataset that most closely match that description. Under the hood, this type of query leverages a multimodal embedding model like OpenAI’s CLIP, which can embed text and images in the same vector space.
Concept Interpolation takes this a step further and allows you to interpolate between two “concepts” by way of their embeddings. Pick two text prompts — for instance “sunny” and “rainy”, and choose a point along the line connecting them. The Concept Interpolation plugin generates the weighted average of the embeddings for the two prompts, and searches for the images whose embeddings most closely match this new vector.
Lessons Learned
- Operators for Opening Panels: when building joint Python/JavaScript plugins for FiftyOne, a useful paradigm is to have one operator specifically for opening the plugin’s JavaScript panel. This is the operator that will be executed when clicking on a button in the UI. Use different operators to perform computations.
- Custom Operator Icons: you can use any SVG as an icon for an operator in FiftyOne. Open the SVG file with a text editor to edit its contents, specify the fill colors, and remove unwanted text.
🎨 Build Your Own AI Art Gallery (Week 1)
GitHub Repo: https://github.com/jacobmarks/ai-art-gallery
Plugin Overview
Text-to-image models like Stable Diffusion and DALLE-2 can create some stunning AI-generated artwork. Once you generate these images, however, the process of curating your own AI art gallery — keeping track of the creation date, model, prompt, and hyperparameters — can be quite laborious.
This plugin automates all of this gruntwork. Run a variety of text-to-image models directly from the FiftyOne App, including Stable Diffusion, SDXL, Latent Consistency Models, and DALLE-2, and have the resulting images and the relevant metadata automatically added to your dataset/collection!
Lessons Learned
- Custom Component Properties: you can pass in a dictionary of componentProps as an argument to any view. This essentially puts the power of Material UI Components in your hands.
- Plugins and Environment Variables: you can gracefully handle different cases based on whether or not the user has certain packages installed. Use importlib’s
find_spec
, check environment variables withos.environ
, and lazily import with FiftyOne’slazy_import
.
For more details, check out the blog post, Build Your Own AI Art Gallery.
📲 Data Ingestion with Twilio (Week 1)
GitHub Repo: https://github.com/jacobmarks/twilio-automation-plugin
Plugin Overview
Have you ever happened across a real-life scene you wish you could capture and add into one of your datasets? With FiftyOne and Twilio (the programmable communication company), now you can! Take a picture (or multiple), and send them to your Twilio phone number. Next time you open the FiftyOne App to your desired dataset, use the add_twilio_images
operator to add the photos you sent directly to the dataset. You can also filter by message content, only adding images from messages whose text contains a given text string!
Lessons Learned
- Python Plugins Can Be Simple: I built the majority of this plugin within two hours at the Twilio Signal Creator Summit Hackathon. Since then, I’ve given it a few upgrades, but the core code has remained the same. FiftyOne Python plugins are easy enough to build that you can create a single-purpose plugin within a few hours.
❔Visual Question Answering (Week 2)
GitHub Repo: https://github.com/jacobmarks/vqa-plugin
Plugin Overview
Do you ever wish you could have a conversation with the images in your dataset? With this Visual Question Answering (VQA) plugin, you can ask your images anything. Select an image from the sample grid, enter a question (text prompt), and receive a text-based response from your model of choice. This plugin gives you access to four cutting edge multimodal models: Adept’s Fuyu-8b, Llava-13b, BLIP2, and a Vision-Language Transformer (ViLT) fine-tuned on the VQA2 dataset.
Lessons Learned
- Using Selected Samples: within an operator’s
resolve_input()
,resolve_output()
, andexecute()
methods,ctx.selected
will give you access to the list of currently samples samples in the FiftyOne App — the same assession.selected
. - Returning Informative Outputs: you can return a JSON-style dictionary from
execute()
and use these variables inresolve_output()
For more details, check out the blog post, Ask Your Images Anything.
📺 In-App Youtube Player (Week 3)
GitHub Repo: https://github.com/jacobmarks/fiftyone-youtube-panel-plugin/
Plugin Overview
On the surface, this plugin is about enabling you to play YouTube videos in a custom panel with the FiftyOne App. Choose from the preset list of videos, or enter the URL for another YouTube video, and the in-app YouTube embed will update automatically. This could be useful if you want to watch a FiftyOne tutorial and follow along in real time, without leaving the app.
Delving deeper, however, this plugin is meant to showcase the simplicity of building custom user journeys and interactive workflows within the FiftyOne App! You can define custom panels as React components and fill the canvas of these panels with whatever Material UI components and interactive logic you’d like!
Lessons Learned
- Icon Rendering: you can set icons for the operator, any buttons that call the operator, and the icon that appears at the top of JavaScript panels. These can all be the same or different. You may need to fiddle with the
viewBox
for the icons to appear centered and sized as desired. - Trigger Composes Operators: The
ctx.trigger()
method within an operator’sexecute()
method allows you to execute other operators — whether they are defined in Python or JavaScript, in the same plugin or a different plugin. - No JavaScript?; No Problem!: I’m not a JavaScript developer, but the FiftyOne plugin docs combined with the React Material UI component docs and ChatGPT was enough for me to begin creating custom UIs!
For more details, check out the blog post, Build Custom Computer Vision Applications.
🪞Image Deduplication (Week 4)
GitHub Repo: https://github.com/jacobmarks/image-deduplication-plugin
Plugin Overview
Duplicate data can lead to a plethora of problems, from longer than necessary training times, to models over-indexing on specific samples rather than learning larger patterns. But deduplicating your dataset doesn’t have to be complicated!
This image deduplication plugin makes it easy to find both exact duplicates (via file hashes) and approximate duplicates (via image embeddings). Once you’ve found these duplicates, the plugin provides operators for displaying groups of duplicates, removing all duplicates, and deduplicating — removing all except one representative from each group.
Lessons Learned
- Splitting Code into Submodules: separation of concerns is good. To extract logically related sections of code into separate Python modules and import them in a FiftyOne plugin, you can use FiftyOne’s
add_sys_path
util in conjunction with theos.path
methods to locate the module to be imported. - Loading a View: you can trigger another operator from an operator’s
execute()
method. To load a view, use the built-inload_view
operator, passing in a serialized representation of theDatasetView
to be loaded. - Icons for Operators: setting the
icon
attribute to an operator’sOperatorConfig
allows for a custom icon to appear next to the operator in the operator list that shows up when you press “`”.
For more details, check out the blog post, Double Trouble: Eliminate Image Duplicates with FiftyOne.
🔡 Optical Character Recognition with PyTesseract (Week 5)
GitHub Repo: https://github.com/jacobmarks/pytesseract-ocr-plugin
Plugin Overview
Optical Character Recognition (OCR) is the task of identifying and localizing text (characters) in a document. It can be useful for extracting insights from everything from license plates to legal documents, medical records, and even handwritten notes.
This plugin leverages the most popular OCR engine, Tesseract, via its Python bindings (PyTesseract), allowing you to recognize text in the images in your dataset. Choose whether to save word predictions, contiguous blocks of text, or both, and whether to execute in real-time or delegate as a job. OCR text predictions are converted to and stored in Detection
labels on samples.
Lessons Learned
- Menu Buttons in Python Plugins: you can turn your Python operator into a button in the FiftyOne App with the
resolve_placement()
method. You don’t need to write any JavaScript.
For more details, check out the blog post, Optical Character Recognition with PyTesseract.
🔑 Keyword Search (Week 5)
GitHub Repo: https://github.com/jacobmarks/keyword-search-plugin
Plugin Overview
Motivated by the need to search through the blocks of text generated by the OCR plugin, the Keyword Search plugin makes it easy to search your dataset for specific text strings without writing any custom code. The plugin finds all top-level strings and lists of strings, as well as strings and lists of strings within detection labels, and applies the appropriate view stage to search for the query text within the specified field. It acts as a wrapper for the match()
and match_labels()
view stages with the contains_str()
view expression, exposing the option for case sensitive or insensitive search.
The plugin uses a cache manager to cache the name of the last field queried, and sets that as the default field to be queried moving forward. This serves as a proof of principle for caching user preferences in Python plugins.
Lessons Learned
- Caching in Python Plugins: you can’t do any caching within the
__init__.py
file itself, because this file gets reloaded and its variables reset every time the operator is used. Create a separate Python file for caching and import the caching function from this module in__init__.py
. - Plugging in Your Plugins: the keyword search plugin and the OCR plugin together are way more valuable than each is on its own. Make your plugins flexible enough to work well together, so you can create complex workflows.
For more details, check out the blog post, Optical Character Recognition with PyTesseract.
🔮 Zero-Shot Prediction (Week 6)
GitHub Repo: https://github.com/jacobmarks/zero-shot-prediction-plugin
Plugin Overview
One of the most costly and time-consuming parts of machine learning and computer vision work is the annotation of ground truth data. With foundation models like CLIP and SAM, we can accelerate this process by pre-labeling our data.
This plugin provides a unified interface for zero-shot or open-vocabulary prediction, which can be used to greatly streamline annotation workflows. Whether you are interested in classification, object detection, instance segmentation, or semantic segmentation, this plugin has you covered.
Select your task and model, input your desired class names — either directly or by pointing to a labels file — and generate zero-shot predictions. You can run the zero-shot models on selected samples, your current view or the entire dataset, and the plugin can be used in conjunction with FiftyOne’s annotation integrations (CVAT, Labelbox, Label Studio).
Lessons Learned
- Keep up with FiftyOne’s Plugin Features: this plugin utilized two features which did not exist at the beginning of the ten week journey: delegated operators and the file explorer. New features are being added all the time, so stay up to date!
- Reduce Boilerplate Code: if you have multiple operators with very similar information or execution flows, you can extract these as more general functions.
For more details, check out the blog post, Zero-Shot Prediction Plugin for FiftyOne.
🏃Active Learning (Week 7)
GitHub Repo: https://github.com/jacobmarks/active-learning-plugin
Plugin Overview
Active Learning is a set of techniques for getting to a set of ground truth labels faster and more efficiently. Starting from some preliminary labeled/tagged examples, active learning generates hypotheses for the rest of the samples in the dataset, and cleverly determines which samples for the human to label next based on the certainty of the learning model’s existing hypotheses.
This plugin wraps the functionality of the modAL library in a simple interface within the FiftyOne App. You can create a “learner” from an initial set of labels, query the learner for new samples to label, and teach the learner with the newly labeled examples.
The plugin can be used in conjunction with the zero-shot prediction plugin, as well as FiftyOne’s annotation integrations (CVAT, Labelbox, Label Studio). Currently it only supports classification tasks, but could readily be extended to support other computer vision labeling tasks.
Lessons Learned
- Caching Is King: sometimes you need to cache more than just user preferences. As in the keyword search plugin, you can cache variables in modules that are imported by the
__init__.py
file. - Don’t Reinvent the Wheel: for many workflows, there are decent pre-existing solutions. It is often better to wrap the functionality from a purpose-built library as a FiftyOne plugin than to try to recreate all of the functionality from scratch.
For more details, check out the blog post, Supercharge Your Annotation Workflow with Active Learning.
⏪ Reverse Image Search (Week 8)
GitHub Repo: https://github.com/jacobmarks/reverse-image-search-plugin
Plugin Overview
Since March 2023 when we added native vector search functionality into FiftyOne, it has been possible to select an image from your dataset and perform a similarity search to find the k
most similar images within the dataset. This plugin extends this functionality, allowing you to query your dataset with external images. With reverse image search, you can now drag and drop images from your local filesystem, or enter the URL of an image, and find the most similar images in your dataset to these queries!
Lessons Learned
- Handling Remote Images: you can work with images from the web without storing them to disk. Use Python’s `requests` library to get the data, then create a file-like object using
io.BytesIO
, and treat the resulting object like you would a file on disk. - Handling Local Files: for security reasons, absolute filepaths for media files are not accessible via JavaScript. Decode the file from its base64 encoding, and then use
BytesIO
to create the file-like object.
For more details, check out the blog post, Reverse Image Search Plugin for FiftyOne.
🌌 Concept Space Traversal (Week 9)
💪💪 Joint effort with Ibrahim Manjra 💪💪
GitHub Repo: https://github.com/jacobmarks/concept-space-traversal-plugin
Plugin Overview
Similar in spirit to Concept Interpolation, but more flexible and interactive, Concept Space Traversal allows you to combine text and image concepts and dynamically move through the space of embeddings. Select a starting image, and iteratively add text concepts in desired amounts. On the backend, the text and image embedding vectors will be combined to create a new “destination” vector, which is then used to query the dataset’s vector similarity index. By adding text prompts, you can move through the space of “concepts” in your dataset.
Lessons Learned
- Displaying Sample Images in Panels: to show the media file associated with a sample in a JavaScript panel, get its unique url (it can be local, from filepath) in Python and pass the information to JavaScript.
- Lists Enable Flexible APIs: lists in JavaScript and Python can have arbitrary lengths, so you can leave the number of elements unspecified, making content and actions dynamic.
For more details, check out the blog post, Concept Traversal Plugin for FiftyOne.
🛠️ Plugin Management and Development (Week 10)
💪💪 Joint effort with Brian Moore 💪💪
GitHub Repo: https://github.com/voxel51/fiftyone-plugins/tree/main/plugins/plugins
Plugin Overview
We learned a lot of lessons from building plugins for ten weeks. This plugin packages those lessons into utilities to make plugin management and development easier. On the management side, you can install new plugins and enable/disable installed plugins from within the FiftyOne App. On the development side, this plugin helps you to interactively build input components for Python operators, and walks you through the process of scaffolding the code for Python plugins, even creating the files for you!
Lessons Learned
- Dynamically Updating Code Blocks: to make code blocks that update based on user interactions, make the identifier for code block component dependent on each of the inputs. This way, the app acts as if the old component was removed and a new component matching the new input specifications was created in its place.
- It’s Plugins All The Way Down: plugins are insanely flexible, giving you the power to build custom computer vision applications for almost any workflow. This same flexibility makes it possible to build a plugin-builder plugin. But by the same token, it is impossible to bottle up the full richness of the plugin system and provide this via a crisp UI.
For more details, check out the blog post, Plugin for Building and Managing Plugins.
🔊 Audio Retrieval with ImageBind (✨ Bonus)
GitHub Repo: https://github.com/jacobmarks/audio-retrieval-plugin
Plugin Overview
If querying your image datasets with other images or natural language isn’t your cup of tea, this plugin allows you to search for images that most closely match input audio clips! Have a .wav file with a train chugging along? Now you can use that to find images of trains. This is made possible by Meta AI’s ImageBind model, which embeds data of six modalities in the same space, including images and audio. The plugin also uses a custom Qdrant collection (not the native integration) to enable fast vector search.
Lessons Learned
- FiftyOne Supports Audio: a custom panel in a FiftyOne plugin is a blank canvas where you can paint with JavaScript, as well as standard HTML elements. This means that FiftyOne plugins inherently support MP3s and other audio files via the
<audio controls>
element. - Non-Native Vector Database Integrations: while FiftyOne does have native integrations with vector databases (Qdrant, Pinecone, Milvus, LanceDB), sometimes it is easier to create a custom vector db wrapper for a specific use case. In this case, because ImageBind is accessed via Replicate, I found it easier to write the search queries explicitly than try to fit the form factor of FiftyOne’s similarity index.
🔎 Semantic Document Search (✨ Bonus)
GitHub Repo: https://github.com/jacobmarks/semantic-document-search-plugin
Plugin Overview
If you liked the OCR and Keyword Search plugins from week five, you’re going to love this one, which allows you to semantically search through the text blocks generated by the OCR engine. With this plugin, you can embed the text blocks from PyTesseract using the GTE-base embedding model from Hugging Face’s Sentence Transformers library, and generate a vector index out of these embeddings. You can then find all text blocks across all pages in your dataset containing the most similar text to your queries.
Lessons Learned
Plugging in Your Plugins Redux: with OCR and Keyword Search, we saw how composing plugins could create more value. Semantic Document Search doubles down on this. You can use the PDF loader plugin to read pdf pages in as image samples, run the OCR plugin to extract text from the generated pages, and then use the Semantic Document Search plugin to find similar text blocks throughout the pages of a PDF!
Conclusion
Ten weeks ago, I set out to explore the FiftyOne plugin framework and wrap common computations in Python operators. During these ten weeks, I’ve managed to build (with others) plugins which run local models, and others which call models via APIs; plugins that run in real-time, and plugins that delegate execution for long-running operations; plugins in pure Python, and plugins with heavy JavaScript UIs; single-purpose plugins, and versatile plugins that “plug into” many other workflows.
But this is just the beginning. I barely scratched the surface of FiftyOne’s plugin framework. And these were just the plugins that I wanted to build. Without exception, everyone has their own set of essential workflows, and their own vision for novel computer vision applications. I hope these plugins serve as a helpful starting point as you begin to explore the endless possibilities of FiftyOne plugins!