I’m excited to announce the release of FiftyOne 1.1 and FiftyOne Teams 2.2, which include powerful new features designed to bring important tasks in your AI development workflow closer to your data. With these releases, we’re beginning to harness the power of panels (launched in FiftyOne 0.25)–containers for rich data applications built directly into the FiftyOne App. Read on to learn more.
More Than a Data Visualizer
At Voxel51, we know FiftyOne plays a central role in the end-to-end AI development lifecycle. Many people first find FiftyOne because of its powerful out-of-the-box visualization capabilities. However, users soon discover they can do much more to carefully curate datasets and efficiently develop ML models. To that end, we’ve heard two instructive themes from customers recently when asking more about the role they envision for FiftyOne at their companies. First, while FiftyOne is a world-class visualizer, developers would like to do even more with their data while working inside the App, including tackling key data preparation and model evaluation tasks. And second, data teams have datasets located at certain sources and of certain sizes that they wish to integrate more seamlessly with the FiftyOne App.
With FiftyOne Teams 2.2, we’re starting to address these challenges.
Specifically, we’re launching the following features, described in more detail below:
- Model Evaluation Panel – Expertly analyze and evaluate model performance with just a few clicks (also available in open source FiftyOne 1.1!)
- Data Quality Panel – Proactively catch and fix data issues before they impact your models, with automated detection and remediation workflows
- Builtin Compute – Schedule compute-intensive tasks to run in the background on our infrastructure while you work
- Data Lens – Query 1B+ samples across your data lake with ease; preview and import the exact data you need into FiftyOne datasets
- Query Performance Panel – Add field indexes and summary fields on your dataset to speed up data exploration and visualization, without leaving the App
As these new features are themselves true data applications, we are just scratching the surface of what’s possible. We’ll continue adding functionality to and enhancing ergonomics around the new panels and neighboring workflows, unlocking even more use cases within computer vision and beyond.
Okay, let’s dive into the data curation, model evaluation, and production-grade dataset features included in this release.
Streamlining ML Workflows with Builtin Panels
Model Evaluation Panel
Model performance evaluation is an ongoing, iterative exercise at every critical stage of the AI development workflow. Analyzing evaluation results is not just for late-stage QA and deployment; developers need to understand how a model performs during preparation, tuning, and training.
Crucially, this analysis depends on visibility into the data itself. As with other disciplines requiring advanced data analysis, interrogating results in ML model development typically leads to more questions, where often only sample-level data can unlock findings. Enter the Model Evaluation Panel, which brings together a UI for performance computation, model iteration comparisons, and your data itself in the FiftyOne App:
With the Model Evaluation Panel, you can kick off a model evaluation directly from the FiftyOne App. Just provide your predictions and ground truth fields, desired evaluation method, and a name for your evaluation, and let FiftyOne calculate industry-standard metrics like IoU, precision, recall, and F1-score.
When the job finishes, you’ll see a report in the Panel with the results, including visualizations for performance metrics, class accuracy, and confusion. What’s more, you can filter the samples in your dataset by directly interacting with these visualizations; click a button to see your true positives/false positives/true negatives/false negatives, or select a series in one of the bar charts to isolate a specific class.
Want to see how your evaluation compares to past runs or different model versions? You can compare the results of two evaluations side-by-side, and FiftyOne will let you know which version has better performance scores. And after you’ve generated meaningful insights, leave some thoughts in the markdown notes section to summarize findings for your teammates and future self.
Data Quality Panel
When curating datasets for model training, achieving some baseline of data quality is a must-have. The ultimate performance of ML models depends in large part on the presence of quality issues in your dataset. And especially when embarking on a new project, issues in data quality might impede downstream model exploration and analysis.
While it has been possible to improve data quality in FiftyOne using plugins or manual, subjective sample curation, we want to make it even easier for you to check for and react to quality issues. That’s why we’ve built the Data Quality Panel:
Now, you can compute, display, and take action against common data quality issues with a few clicks in the FiftyOne App. With the Data Quality Panel, you can filter your samples via controllable thresholds to visualize the results of quality scans at a granular level. Once you’ve identified the specific subset of data relevant to your use case’s quantitative or qualitative standards, add new or existing tags to reflect your conclusion across the dataset. For example, you might tag samples in the top-5% of blurriness results with “blurry – review” and ask for a second opinion from colleagues on whether to remove those samples from your dataset.
FiftyOne Teams 2.2 includes support for the following issue types:
- Aspect ratio: The minimum of width/height and height/width is computed.
- Blurriness: Measures the lack of sharpness (or clarity) in images.
- Brightness: Attempts to quantify the amount of light in an image, as it is perceived by the viewer.
- Entropy: A measure of the information content of the pixels in an image.
- Exact duplicates: Finds exact duplicate images in a dataset using a hash function.
- Near duplicates: Finds near-duplicate images in a dataset using a specified similarity index paired with either a distance threshold or a fraction of samples to mark as duplicates.
Builtin Compute
Scanning a dataset to detect quality issues or calculate performance metrics can be computationally expensive. Depending on the size of your dataset, it might not make sense to wait around for the task to complete before continuing in your workstream. That’s where FiftyOne delegated operations can help. When kicking off a task in the Data Quality or Model Evaluation Panel, you can choose between synchronous execution or scheduling via delegated operation. If your team has established a connection between FiftyOne and an orchestrator like Airflow, you can choose from available orchestrators when electing to delegate the task.
But what if you don’t have Airflow connected to FiftyOne? FiftyOne Teams 2.2 now offers builtin task executors that can handle your computationally expensive tasks. All customers will have the option to delegate operations using Voxel51’s compute, so you can schedule tasks to run in the background while you, for example, learn more about your data’s distribution or fine-tune your model in FiftyOne.
Additional Support for Enterprise-Sized Datasets
Data Lens
Datasets are the backbone of FiftyOne. All data visualization, exploration, and analysis happen within the context of Datasets. Unsurprisingly, therefore, data teams maintain sophisticated pipelines and processes geared toward availing Datasets across FiftyOne Teams deployments. And often these Datasets necessarily represent a subset of samples possibly available in raw data storage.
In some cases, you may want to shrink the distance between your data lake and FiftyOne Datasets, as that distance might cause issues. For example, you might need access to recent data and can’t wait for your ETL pipeline. Or maybe your Datasets are too broad and contain irrelevant, time-draining samples. In the end, you just need quick access to the billions of samples in your data lake.
Now, thanks to Data Lens in FiftyOne Teams 2.2, you can query and import data from your data lake directly, bypassing the bottlenecks:
Through Data Lens, FiftyOne App users can submit queries directly to the data lake. Once your query successfully finishes, you’ll see results returned in a preview grid similar to the Samples panel. From there, you can decide whether to import some or all of the data returned from your query into your active Dataset or another Dataset to which you have access.
This powerful new interface is made possible with custom operators. Developers can write and register Python or JS operators to define the connection to a data lake, along with the query parameters to be exposed in the Data Lens panel in the FiftyOne App. For more specifics and example connectors, check out our docs.
Query Performance Panel
Fetching metadata or media from a document database can pose a performance challenge for applications, especially when the target is a production-scale dataset in the ML lifecycle. When you interact with fields in the FiftyOne App sidebar, you’re tacitly firing off queries in the connected database, requesting information like summary statistics or value ranges. No matter the complexity of these queries, they should be fast! Interacting with filters shouldn’t take more than a few hundred or thousand milliseconds; you have valuable data curation and exploration to tackle.
One way to speed up these queries is by leveraging FiftyOne’s field indexes and summary fields. And this is where the new Query Performance Panel can guide you:
Here, you can create new field indexes and–for video datasets, especially–summary fields to speed up filter interactions in the FiftyOne App. If you’re unsure which fields to index, the App will suggest possibilities by pre-populating the creation flow after you fire off a particularly long-running query. When enabled in the panel, Query Performance instructs the FiftyOne App’s sidebar to take advantage of these indexed fields.
Conclusion
FiftyOne Teams 2.2 signifies a material leap forward in streamlining end-to-end ML workflows and working with production-scale data. With the new features outlined in this blog post, data teams across startups and enterprises alike can spend less time on disconnected, one-off tasks and instead realize the power and efficiency of juxtaposing ML model development tasks with their data.
If you aren’t yet a customer, learn more about FiftyOne Teams, and let us know you’re interested in a personalized demo. We’re excited to show you how FiftyOne Teams can significantly accelerate your visual AI development workflows!