Add Stable Diffusion and DALL-E2 Images Directly to Your Dataset
Welcome to week one of Ten Weeks of Plugins. For the next ten weeks, we will be building a FiftyOne plugin (or multiple!) each week and sharing the lessons learned!
If you’re new to them, FiftyOne Plugins provide a flexible mechanism for anyone to extend the functionality of their FiftyOne App. You may find the following resources helpful:
Ok, let’s dive into this week’s FiftyOne Plugin!
AI Art Gallery 🤖🎨🖼️
Have you ever generated stunning imagery with DALL-E2 or Stable Diffusion, only to ask yourself: “How do I catalog this??”. Do you find yourself manually downloading AI generated images and moving them into the desired folders? How do you handle metadata like which model and scheduler was used to generate a specific image?
Consider those problems solved!
Plugin Overview & Functionality
For the first week of 10 Weeks of Plugins, I built an AI Art Gallery plugin. This plugin allows you to select your text-to-image model, generate images from text prompts within the FiftyOne App, and add the image directly to your dataset — your “AI art gallery” — in one foul swoop.
Out of the box, the plugin supports three models:
Depending on which text-to-image model you choose, you can configure everything from the size of the image to be generated, all the way to the number of inference steps.
After generating an image, the sample grid in the FiftyOne App will refresh and the latest image will appear in the bottom right of the grid. You can then filter your art gallery by text prompt, model name, date and time generated, or any other attributes used in the generation process.
Adding Creator’s Notes
Want to add a note to a specific piece of artwork describing your inspiration? How about jotting down observations about how the wording of your prompt affects the image generation process? Add a tag!
Cleaning House
When you’re making art, you may not love every single piece of artwork. Part of curation is deciding what makes the cut and what doesn’t.
You can delete a piece of artwork by selecting the image (the checkbox in the upper left corner of the image), pressing the backtick (“`
”) to pull up your list of operators (functions which execute Python code from the UI), and choosing delete_selected_samples
. Hit Execute
and you’re done!
Installing the Plugin
If you haven’t already done so, install FiftyOne:
pip install fiftyone
Then you can download this plugin from the command line with:
fiftyone plugins download https://github.com/jacobmarks/ai-art-gallery
Refresh the FiftyOne App, and you should see the txt2img
operator in your operators list when you press the “`
” key.
To keep the plugin’s core code simple, this plugin uses text-to-image models that are accessible via API endpoint, with OpenAI (DALL-E2) and Replicate (Stable Diffusion and VQGAN). As such, you will need to have an account with at least one of these services in order to use the plugin out of the box.
Make sure that you have your API info in environment variables:
export OPENAI_API_KEY=... export REPLICATE_API_TOKEN=...
You do not need both to use the plugin — the operator checks your environment variables and only shows as options models accessible via the corresponding APIs.
If you want to use a different text-to-image model, local or via API, it should be easy to extend this code by writing a Text2Image
subclass for the model you are interested in.
Lessons Learned
I built the AI Art Gallery plugin as a Python Plugin, so I didn’t have to worry about writing Typescript/React code. The plugin consisted of three files:
__init__.py
: defining the operatorfiftyone.yml
: making the plugin register for download and installationREADME.md
: describing the plugin
Creating a Responsive Input Form
I managed to build the plugin with a single operator: txt2img
, and decided to lean heavily on the operator’s inputs, which are described in the resolve_input()
method (this is the bulk of the code!). Not every FiftyOne Plugin will be so input heavy.
As you can see in the GIF at the top of the blog, the options displayed in the input form change as I change the selected model. I was able to do this as follows:
- Define an input object,
radio_choices
, as a collection of radio buttons — an enumeration of discrete choices. The user can then select a single one of these. - Give the user’s choice a “name”, in this case
model_choices
. - Retrieve the value selected by the user from the
params
dictionary in the operator’s context:ctx.params.get("model_choices", False)
and perform different blocks of logic depending on the value.
Another key to building a responsive input form was making extensive use of the FiftyOne Python Operator API docs. There are a ton of operator types to choose from, and within this input form I used Dropdown
, RadioGroup
, and SliderView
.
Custom Component Properties
While I only scratched the surface of component customization in this plugin, I learned that as a plugin creator, you have immense control over the look and feel of your plugin components, even when just working with Python operators!
When building the slider to set the number of inference steps, one of the front end engineers at Voxel51, Ibrahim, let me in on a little secret: you can pass in a dictionary of componentProps
as an argument to any view. This essentially puts the power of Material UI Components in your hands.
I stayed pretty basic in this plugin, using componentProps
to specify the minimum and maximum allowed values for the slider, as well as a step size:
inference_steps_slider = types.SliderView( label="Num Inference Steps", componentsProps={"slider": {"min": 1, "max": 500, "step": 1}}, )
But moving forward, I definitely want to explore this more deeply!
Plugins and Environment Variables
While it isn’t shown in the GIF, what appears in the input form will be different depending on what packages you have installed and API connections you have set up. The key enabler here is that plugins have access to your environment variables.
To make this plugin work in a variety of conditions, I did the following:
- Use the FiftyOne utils
lazy_import()
method so that a package is only imported when it is needed. Otherwise, it could cause an import error unnecessarily. - Use importlib’s
find_spec
to check if a package matching a certain specification has been installed. - Check whether a certain variable name is in the environment variables with
os.environ
. - Execute different logic blocks based on these conditions.
Conclusion
Whether you are building your own AI art gallery, or using text-to-image models to generate a synthetic dataset, this plugin will shorten the feedback cycle and empower you to curate your visual data better.
But this plugin is only the beginning. With FiftyOne Plugins, the sky’s the limit on how you can extend the FiftyOne computer vision toolkit to meet the needs of your data and model workflows. Stay tuned over the next ten weeks while we pump out a killer lineup of plugins! You can track our journey in our ten-weeks-of-plugins repo — and I encourage you to fork the repo and join me on this journey!