The FiftyOne Brain provides powerful machine learning techniques that are designed to transform how you curate your data from an art into a measurable science.
The FiftyOne Brain is a separate Python package that is bundled with FiftyOne. Although it is closed-source, it is licensed as freeware, and you have permission to use it for commercial or non-commercial purposes. See the license for more details.
The FiftyOne Brain methods are useful across the stages of the machine learning workflow:
Uniqueness: During the training loop for a model, the best results will be seen when training on unique data. The FiftyOne Brain provides a uniqueness measure for images that compare the content of every image in a FiftyOne Dataset with all other images. Uniqueness operates on raw images and does not require any prior annotation on the data. It is hence very useful in the early stages of the machine learning workflow when you are likely asking “What data should I select to annotate?”
Mistakenness: Annotations mistakes create an artificial ceiling on the performance of your models. However, finding these mistakes by hand is at least as arduous as the original annotation was, especially in cases of larger datasets. The FiftyOne Brain provides a quantitative mistakenness measure to identify possible label mistakes. Mistakenness operates on labeled images and requires the logit-output of your model predictions in order to provide maximum efficacy. It also works on detection datasets to find missed objects, incorrect annotations, and localization issues.
Hardness: While a model is training, it will learn to understand attributes of certain samples faster than others. The FiftyOne Brain provides a hardness measure that calculates how easy or difficult it is for your model to understand any given sample. Mining hard samples is a tried and true measure of mature machine learning processes. Use your current model instance to compute predictions on unlabeled samples to determine which are the most valuable to have annotated and fed back into the system as training samples, for example.
Each of these functions has a detailed tutorial demonstrating a workflow.
The FiftyOne Brain allows for the computation of the uniqueness of an image, in comparison with other images in a dataset; it does so without requiring any model from you. One good use of uniqueness is in the early stages of the machine learning workflow when you are deciding what subset of data with which to bootstrap your models. Unique samples are vital in creating training batches that help your model learn as efficiently and effectively as possible.
1 2 3
import fiftyone.brain as fob fob.compute_uniqueness(dataset)
Input: An unlabeled (or labeled) image dataset. There are recipes for building datasets from a wide variety of image formats, ranging from a simple directory of images to complicated dataset structures like MS-COCO.
Output: A scalar-valued field on each sample that ranks the uniqueness of
that sample (higher value means more unique). The default name of this field
uniqueness, but you can customize its name via the
The uniqueness value is normalized so it is comparable across datasets and
What to expect: Uniqueness uses a tuned algorithm that measures the
distribution of each
Sample in the
Dataset. Using this distribution, it
ranks each sample based on its relative similarity to other samples. Those
that are close to other samples are not unique whereas those that are far from
most other samples are more unique.
Check out the uniqueness tutorial to see an example use case of the Brain’s uniqueness method.
Label mistakes can be calculated for both classification and detection datasets.
During training, it is useful to identify samples that are more difficult for a model to learn so that training can be more focused around these hard samples. These hard samples are also useful as seeds when considering what other new samples of add to a training dataset.
In order to compute hardness, model predictions must be generated on the
samples of a dataset. These predictions can then be loaded into FiftyOne into
Dataset and the FiftyOne Brain can be used to compute hardness via
1 2 3
import fiftyone.brain as fob fob.compute_hardness(dataset, label_field="predictions")
dataset argument has samples on which predictions (logits)
have been computed and are stored in the
label_field. Annotations and labels
are not required for hardness.
Output: A scalar-valued field on each sample that ranks the hardness of the
sample. The default name of this field is
hardness, but you can customize its
name by using the
hardness_field argument of
What to expect: Hardness is computed in the context of a prediction model. The FiftyOne Brain hardness measure defines hard samples as those for which the prediction model is unsure about what label to assign. This measure incorporates prediction confidence and logits in a tuned model that has demonstrated empirical value in many model training exercises.
Tutorial coming soon!