We recently announced the availability of FiftyOne .17 and FiftyOne Teams for collaborating securely on datasets. Voxel51 Co-Founder and CTO Brian Moore walked us through all the new features in a live webinar. You can watch the playback on YouTube, take a look at the slides, see the full transcript, and read the recap below for the highlights.
Donating $200 to World Literacy Foundation
In lieu of swag, we gave attendees the opportunity to vote for their favorite charity and help guide our monthly donation to charitable causes. The charity that received the highest number of votes was the World Literacy Foundation. We are pleased to be making a donation of $200 to them on behalf of the FiftyOne community.
Presentation Highlights
To set the scene, Brian took us on a journey starting 10 years ago, at a time when you could work with a dataset manually because the dataset size was relatively small. But fast forward to now, datasets are much bigger and span more modalities — images, video, 3d, and more. While you might think working on models is what takes up a lot of an ML engineer’s time, it’s really the data quality. If you have poor quality data, it leads to problems — model bias, physical danger, reduced model performance.
Our company, Voxel51, is on a mission to bring transparency and clarity to the world’s data. We focus on building tools to help improve data quality and data-centric workflows. We do that through the open source FiftyOne project.
Introduction to open source FiftyOne
Brian described what you can do with FiftyOne — the open-source tool for building high-quality datasets and computer vision models — including all the workflows it supports, all the computer vision tasks it supports, and all the integrations, too:
FiftyOne helps you with these workflows and dozens more:
- Curate, visualize, and analyze datasets
- Streamline annotation workflows
- Find and fix labeling mistakes
- Identify and correct model failures
FiftyOne supports all popular computer vision tasks:
- Classification
- Detection
- Instance segmentation
- Semantic segmentation
- Polygons and polylines
- Keypoints
- Point clouds and annotations
- Geolocation
- Embeddings
- Multiview datasets
- Image, video, and 3D data
FiftyOne integrates with all your favorite ML tools:
FiftyOne comes with the FiftyOne Brain:
In the FiftyOne Brain you’ll find a bunch of interesting data-centric workflows designed to go beyond just the straight visualization and query capabilities of the tool, but really go into the next level where you’re trying to identify specific insights into the data sets, like automatically finding potential mistakes that your model is making, automatically computing embeddings, providing your own custom embeddings, visualizing low dimensional representations of the embeddings, and interacting with them to pull out and observe patterns in your data.
What’s new in the latest version (v.17) of FiftyOne
We recently released FiftyOne .17 and there were a number of new goodies that were added based on community input, including:
FiftyOne Open Source — Live Demo Time!
Brian covered all the awesomeness available in open source FiftyOne from ~14:34 to ~40:20 in the presentation, including these how-to’s:
- Install FiftyOne in a Python terminal window with pip install fiftyone and import the library
- Connect to a database (a MongoDB database automatically spins up when you import the library, although you can configure it to connect to an existing database if you’d like)
- Make a dataset persistent (datasets are ephemeral by default)
- Load a dataset (to show this Brian loads CIFAR-100, then later loads KITTI, COCO, MNIST, and others)
- Visualize a dataset in the FiftyOne App
- Interactively work with data across the App and code
- Filter samples in the App, whether they are images, videos, 3D, location data, etc.
- Use the FiftyOne Brain to search for visually similar images or find annotation mistakes (and much more!)
- Work with FiftyOne in a Jupyter Notebook
- Load in datasets packaged in the Dataset Zoo
- Generate scatter plots on samples
- Use models out of the Model Zoo, like MobileNet
- And so much more!
A lot of what Brian demoed is from the Images Embeddings tutorial if you’d like to have a look.
Plus, everything Brian showed until now is all part of open source FiftyOne, and you can find everything you need to get started on GitHub. If you like what you see, consider giving the project a star.
FiftyOne Teams — Live Demo Time!
Next, Brian dove into FiftyOne Teams, to enable multiple users to securely collaborate on the same datasets and models, either on-premises or in the cloud, all built on top of the open source FiftyOne.
FiftyOne Teams includes:
- Native cloud storage
- Centralized web portal with SSO
- Dataset/user permissions
- Dataset versioning
- Dataset listing and management
- Custom dashboards
Brian covered all the enterprise features in FiftyOne Teams starting at ~45:31 in the presentation, including how to:
- Log into FiftyOne Teams
- Manage access to datasets
- Manage user roles — admins, members, guests
- Pin datasets
- Find datasets based on a variety of filters
- Install, connect to, and work with datasets through Python
- More!
Q&A from the Webinar
There was a lively Q&A all throughout the presentation, covering open source FiftyOne, FiftyOne Teams, and more! Here’s a recap:
Do the FiftyOne features work with video datasets?
Yes! FiftyOne fully supports video datasets just like image datasets and even allows you to work with clips or even individual frames in your video datasets.
Learn more about working with video datasets in the FiftyOne Docs.
How does the session object work with FiftyOne Teams when multiple users have multiple sessions in the app?
The hosted FiftyOne Teams App doesn’t currently support connection to Python sessions (but stay tuned!) Instead, Teams users can achieve interactive Python sessions by launching the localhost App via the Python SDK, just like Brian showed in the open source demo allowing users to interact through code.
Learn more about FiftyOne Teams.
Can I work with embeddings directly in the App like I can with geolocation through the new Map tab?
Currently, embeddings work is done through Jupyter notebooks with plotly plots, but we plan to add support for interactive embeddings workflows directly in the App (like geolocation maps) in the near future!
Check out this tutorial for how to interact with embeddings and the FiftyOne App.
Can you share the notebook used in the image embeddings demo?
The demo is based largely on this tutorial. For background, FiftyOne provides a powerful embeddings visualization capability that you can use to generate low-dimensional representations of the samples and objects in your datasets.
This notebook highlights several applications of visualizing image embeddings, with the goal of motivating some of the many possible workflows that you can perform.
Are we able to use geolocations in the to_patches() view as well?
Yep! If you store fo.GeoLocation
labels on your fo.Detection
labels, then when you convert to_patches()
you’ll be able to interact with the map in the patches view. There is one more step required in between, but you can check out this GitHub gist for an example of how to do this.
To what extent will the FiftyOne project’s features continue to remain open source?
FiftyOne will always be free open source for individual users to install locally. For teams of users who need to collaborate on their datasets, we offer commercial FiftyOne Teams.
Can the FiftyOne Brain be used for finding incorrect detection annotations?
Yes! We have a mistakenness computation in the brain to do just that. Check out the docs to learn more about label mistakes.
Does FiftyOne currently support 3D mesh data?
In 0.17, we added support for 3D point clouds stored in .pcd format. We’re also discussing adding support for .ply and .obj files.
Alternatively, you can always write a custom plugin that supports visualizing other 3D formats like meshes.
How can we analyze statistical metrics with FiftyOne, like accuracy, IoU, etc?
FiftyOne natively supports evaluation of regressions, classifications, detections, segmentations, and temporal detections. Specifically, for each task, FiftyOne implements best practice protocols like COCO for object detection. (Note that FiftyOne is the recommended way to evaluate COCO datasets on the COCO website!)
Uniquely, FiftyOne’s evaluation methods store individual TP/FP/FN results on your data, as well as aggregate metrics like accuracy/mAP so you can really dig in and see how individual samples/predictions performed.
Can I connect my trained model to a dataset management/version control system?
FiftyOne Teams comes with dataset versioning builtin so you can track changes, tag revisions, and rollback to previous dataset versions as needed. You can use this feature in concert with experiment tracking tools like Weights & Biases or MLflow to record the specific dataset revision on which a model/experiment was trained
Make sure to follow Voxel51 on Linkedin for an upcoming blog post showing best practices for using FiftyOne together with your favorite experiment tracking tools!
What are some interesting non-machine learning use cases for FiftyOne that you’ve seen?
We are constantly amazed by the variety of applications that FiftyOne users showcase in our Slack community. We’ve recently seen datasets including agriculture, satellite imagery, robotics, retail, healthcare, and more.
Who knows, maybe somebody is using the new custom plugins feature as we speak to do something truly creative, like say … a photo editing plugin!
How does FiftyOne Teams differentiate itself from products like Scale AI and Labelbox?
FiftyOne Teams is built on top of open source FiftyOne, which offers a number of key benefits to users:
- Lower barrier to entry: FiftyOne is 100% free, forever, with unlimited data volumes
- FiftyOne Teams is backwards-compatible with OSS FiftyOne
- The FiftyOne ecosystem is purpose built with flexibility and extensibility in-mind
- No vendor lock-in for annotation
- Significantly more full-featured Python library for building data-centric workflows in your ML environment
What’s Next
- If you like what you see on GitHub, give the project a star!
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help!