“Rapidly experiment with your datasets”
If you are looking to boost the performance of your machine learning models, chances are improving the quality of your dataset will provide the highest return on your investment. Enter FiftyOne. FiftyOne is a Python-based tool for the visual data scientist that enables you to create valuable and diverse datasets. Work efficiently with FiftyOne to achieve better models with dependable performance.
“Become one with your data”
FiftyOne does more than improve your dataset; it gets you closer to your data. Rapidly gain insight by visualizing samples overlayed with with dynamic and queryable fields such as ground truth and predicted labels, dataset splits, and much more!
FiftyOne provides advanced capabilities that will turbocharge your machine learning workflows.
Finding annotation mistakes
Annotations mistakes create an artificial ceiling on the performance of your models. However, finding these mistakes by hand is at least as arduous as the original annotation work! FiftyOne provides a mistakenness tool that can automatically identify possible label mistakes in your datasets.
Removing redundant images
During the training loop for a model, the best results will be seen when training on unique data. For example, finding and removing similar samples in your dataset can avoid accidental concept imbalance that can bias the learning of your model. FiftyOne provides a uniqueness tool that can automatically identify duplicate or near-duplicate images in your datasets.
Bootstrapping training datasets from raw images
In the early stages of a machine learning workflow, ML engineers inevitably ask themselves: what data should I select to annotate? This is a critical question, as acquiring high quality ground truth annotations is an expensive and time consuming process. FiftyOne provides methods that can automatically recommend unlabeled samples from your dataset to send for annotation, enabling you to bootsrap a training dataset that leads to demonstrably better model performance.
Tutorial coming soon!
Adding optimal samples to your dataset
While training, ML models understand attributes of certain samples faster than others. The natural question arises: what new samples should I add to my training dataset to provide the largest incremental improvement to the performance of my model? FiftyOne provides methods for mining hard samples from your datasets, a tried and true measure of mature machine learning processes.
Tutorial coming soon!
FiftyOne’s core library provides a structured yet dynamic representation to explore your datasets. You can efficiently query and manipulate your dataset by adding custom tags, model predictions and more.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
import fiftyone as fo dataset = fo.Dataset("my_dataset") sample = fo.Sample(filepath="path/to/img.png") sample.tags.append("train") sample["custom_field"] = 51 dataset.add_sample(sample) view = dataset.view().match_tag("train").sort_by("custom_field").limit(10) for sample in view: print(sample)
FiftyOne is designed to be lightweight and flexible, making it easy to load your datasets. FiftyOne supports loading datasets in a variety of common formats out-of-the-box, and it also provides the extensibility to load datasets in custom formats.
Check out loading datasets to see how to load your data into FiftyOne!
The FiftyOne App is a graphical user interface (GUI) that makes it easy to rapidly gain intuition into your datasets. You can visualize labels, bounding boxes and segmentations overlayed on the samples; sort, query and slice your dataset into any aspect you need; and more.
The FiftyOne Brain is a library of powerful machine learning-powered capabilities that provide insights into your datasets and recommend ways to modify your datasets that will lead to measurably better performance of your models.
1 2 3 4
import fiftyone.brain as fob fob.compute_uniqueness(dataset) rank_view = dataset.view().sort_by("uniqueness")
Where should you go from here? You could…