Welcome to the final community update for 2023! In this blog we’ll bring you up to speed on recent happenings in the computer vision mindspace and FiftyOne community, plus celebrate noteworthy 2023 milestones. 🙌 🚀
Voxel51’s continuing commitment to open source
Open source, transparency, and giving back to the computer vision community is what we are all about! Whether it’s developing the open source FiftyOne computer vision toolset to help engineers and data scientists build high-quality datasets and models, sponsoring Meetups to help members boost their computer vision knowledge, or giving to charitable causes on behalf of the community, Voxel51 is committed to bringing transparency and clarity to the world’s data.
Voxel51 turned 5 years old!
Voxel51 Inc. was started back on October 18, 2018 to enable developers, scientists, and organizations to build high-quality visual AI datasets and computer vision models. Despite being 5 years old, no one is still quite sure what’s with the “51” in the name!
Top 10 Computer Vision Developments in 2023
Chatbots and LLMs dominated the public discourse in 2023, but computer vision also had a banner year. Here’s our top 10 for the year! Get a deep dive and analysis on these developments in Jacob Marks’ post, “Why 2023 was the most exciting year in computer vision history.”
GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user.
This is a big deal because it takes multimodal large language models mainstream, providing an intuitive and accessible interface for interacting with images along with text, pdfs, and CSV files.
Segment Anything Model
This year Meta AI announced SAM, which can “cut out” any object, in any image, with a single click. In a nutshell, SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training.
This is a big deal because SAM is arguably the first foundation model for segmentation, and it has already enabled countless computer vision applications.
Check out how to use SAM for the prediction of segmentations on the Kaggle Football Player Segmentation dataset using FiftyOne.
DINO and DINOv2 are self supervised systems from Facebook AI, the latter of which was announced this year. These models provide new methods for training high-performance computer vision models. Because it uses self-supervision, DINOv2 can learn from any collection of images. It can also learn features, such as depth estimation.
This is a big deal because DINOv2 can be used as a backbone in a variety of computer vision applications, and does not require fine-tuning!
Deci announced their state-of-the-art object detection model, YOLO-NAS, which achieves unparalleled accuracy-speed performance, outperforming competitors such as YOLOv5, YOLOv6, YOLOv7 and YOLOv8.
This is a big deal because real-time object detection keeps getting faster and more accurate, leading to improvements across a slew of downstream applications.
SDXL and SDXL Turbo
With Stable Diffusion XL and SDXL Turbo from Stability AI, you can create descriptive images with shorter prompts and generate words within images. The model delivers enhanced image composition and face generation that results in stunning visuals with realistic aesthetics. SDXL Turbo featured new distillation technology, enabling single-step image generation with unprecedented quality, while reducing the required step count from 50 to just one.
This is a big deal because near real-time generation of high-quality images is here, and it’s only going to get better!
Low-Rank Adaptation introduces a technique for parameter-efficient fine-tuning. The method works by adding low-rank “update matrices” to specific blocks of the model, such as the attention blocks. During fine-tuning, only these matrices are trained, while the original model parameters are left unchanged. At inference time, the update matrices are merged with the original model parameters to produce the final result.
This is a big deal because it makes fine-tuning quick, cheap, and accessible.
Open AI’s DALL·E 3 is an image generation model created by OpenAI. The first iteration, DALL·E, was launched in January 2021, and the latest release is its third iteration. Because it is built natively into ChatGPT, you can ask what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT automatically generates tailored, detailed prompts for DALL·E 3 that will bring the idea to life.
This is a big deal because it brings auto-prompting to image generation, and keeps the DALL·E family competitive with Stable Diffusion and Midjourney.
Check out the “Build Your Own AI Art Gallery” post to learn how to add stable diffusion and DALL-E2 images directly to your FiftyOne dataset.
Runway announced Gen-2, which is a multimodal AI system that can generate novel videos from text, images or video clips. It does this by either applying the composition and style of an image or text prompt to the structure of a source video (Video to Video). Or, using nothing but words (Text to Video).
This is a big deal because it takes text-to-video generation to a new level, and forms an intriguing foundation for Runway’s suite of AI-powered creativity tools!
Pika Labs is a new startup that is developing an AI-powered platform to edit and generate videos from captions and still images. Pika is competing against generative AI video tools and models from the likes of Runway and Stability AI.
This is a big deal because the race to build text-to-video applications is heating up!
Last but not least, Meta AI announced Emu Video and Emu Edit. Emu Video is a simple factorized method for high-quality video generation. While Emu Edit enables precise image editing via recognition and generation tasks.
This is a big deal because these Emu models respectively streamline the processes of training T2V models, and of precisely editing images via text prompts.
In 2023, customers and community members came forward with some interesting computer vision use cases and workflows that included FiftyOne.
Here’s a few highlights:
Here’s a few highlights:
Global Top 5 Defense and Space Contractor
FiftyOne accelerates the research and development of uncrewed aerial vehicles (drones).
Global Top 5 Heavy Equipment Manufacturer
FiftyOne is used in the development of safety technology, like hazard detection, for tractors and heavy construction equipment.
Global Top 10 Automotive Manufacturer
FiftyOne identifies image quality issues in the multi-modal datasets used to train models for autonomous vehicles.
Global Top 5 Automotive Manufacturer
FiftyOne is part of a centralized data visualization and management platform for the development of driver assistance systems.
Global Top 5 Computer and Electronics Manufacturer
FiftyOne is used to identify product defects in images taken from manufacturing lines.
Global Top 5 Retail and Ecommerce Company
FiftyOne is critical to the development of models and datasets for product identification and self-checkout monitoring systems.
Global Top 5 Automotive Technology Company
FiftyOne is used to manage the visual data for both in-cabin and external driver assistance and autonomous vehicle systems.
Check out these and other case studies on the Voxel51 Success Stories page.
2023 Product Releases and Features
In 2023 the engineering team released 17 FiftyOne open source and 19 FiftyOne Teams releases. Here’s a sample of some of the exciting features and integrations released in 2023!
FiftyOne provides a powerful plugin framework that allows for extending and customizing the functionality of the tool to suit your specific needs. With plugins, you can add new functionality to the FiftyOne App, create integrations with other tools and APIs, render custom panels, and add custom buttons to menus. You can even schedule long running tasks from within the App that execute on a connected workflow orchestration tool like Apache Airflow.
Get started with plugins by installing some popular plugins, then try your hand at writing your own!
AI Utilities: Operators
Operators are a powerful feature in FiftyOne that allow plugin developers to define custom operations that can be executed from within the App.
Some operators may expose themselves as custom buttons, icons, or menu items throughout the App. Plus, users can launch the Operator Browser with a single click to search through all available operators.
Workflows: Delegated Operators
Delegated operations are a powerful feature of FiftyOne’s plugin framework that allows you to schedule tasks from within the App that are executed on a connected workflow orchestrator like Apache Airflow or run just run locally in a separate process.
Dataset Versioning allows you to capture the state of your dataset in time so that it can be referenced in the future. This enables workflows like recalling particular important events in the dataset’s lifecycle (model trained, annotation added, etc) as well as helping to prevent accidental data loss.
If you’d like to see data versioning in action, make sure to request a FiftyOne Teams demo for your team!
Vector Search Integrations
With FiftyOne’s native vector search integrations you can search through billions of images with a single line of code and enable capabilities like reverse image search, traversing concept space, and semantic document search. FiftyOne supports the following engines, with more on the way!
VoxelGPT is an open source plugin for FiftyOne that translates your natural language prompts into actions that organize and explore your data. Here’s the key features:
- You can ask VoxelGPT to search your image, video, and 3D point cloud data to find what you’re looking for, no matter how complex the request.
- VoxelGPT has access to the entire collection of FiftyOne documentation, including tutorials, user guides, and API docs, making it easy to get instant answers to your questions as you work.
- VoxelGPT can answer questions about computer vision, machine learning, and data science while you’re working in the FiftyOne App.
6k+ Stars on GitHub
A big thanks to everyone we’ve met in-person or chatted with online that showed the Voxel51 open source projects some love with a star!
FiftyOne Community Slack Milestones
The FiftyOne Community Slack channel is the place to interact and exchange solutions with more than 2,300 engineers and researchers using FiftyOne in their computer vision workflows.
Late to the party? No problem! We’ve collected the most interesting exchanges over the last year in a weekly blog series. Check out almost 300 questions and their answers covering topics like heatmaps, video labels, 3D detections, polylines, pose skeletons, dynamic groups, embeddings and more, in the Tips and Tricks archive.
Voxel51 sponsors 13 virtual Computer Vision Meetups and 12 AI, Machine Learning and Data Science Meetups around the world. (To join, visit the Meetup links and scroll down to find the location friendliest to your time zone.)
In 2023, we hosted 20 Meetups with almost 60 speakers from companies and institutions like Amazon, Microsoft, Google, MIT, Harvard, Stanford and Carnegie Mellon.
If you’ve attended a Meetup in the past you know that in lieu of spending money on swag giveaways, we instead have attendees vote for a charitable cause. In 2023 we donated over $5,000 to the following charities: Sierra Club Foundation, Education Development Center, The Coalition for Rainforest Nations, BRAC, Global Empowerment Mission, Drink Local Drink Tap, Children International, Fighting Blindness Foundation, & Direct Relief on behalf of the Meetup members.