fiftyone.brain¶
Module contents¶
The brains behind FiftyOne: a powerful package for dataset curation, analysis, and visualization.
See https://github.com/voxel51/fiftyone for more information.
Functions
|
Adds a hardness field to each sample scoring the difficulty that the specified label field observed in classifying the sample. |
|
Computes the mistakenness of the labels in the specified |
|
Adds a uniqueness field to each sample scoring how unique it is with respect to the rest of the samples. |
|
Computes a low-dimensional representation of the samples’ media or their patches that can be interactively visualized and manipulated via the returned |
-
fiftyone.brain.
compute_hardness
(samples, label_field, hardness_field='hardness')¶ Adds a hardness field to each sample scoring the difficulty that the specified label field observed in classifying the sample.
Hardness is a measure computed based on model prediction output (through logits) that summarizes a measure of the uncertainty the model had with the sample. This makes hardness quantitative and can be used to detect things like hard samples, annotation errors during noisy training, and more.
Note
Runs of this method can be referenced later via brain key
hardness_field
.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
label_field – the
fiftyone.core.labels.Classification
orfiftyone.core.labels.Classifications
field to use from each samplehardness_field ("hardness") – the field name to use to store the hardness value for each sample
-
fiftyone.brain.
compute_mistakenness
(samples, pred_field, label_field='ground_truth', mistakenness_field='mistakenness', missing_field='possible_missing', spurious_field='possible_spurious', use_logits=False, copy_missing=False)¶ Computes the mistakenness of the labels in the specified
label_field
, scoring the chance that the labels are incorrect.Mistakenness is computed based on the predictions in the
pred_field
, through either itsconfidence
orlogits
attributes. This measure can be used to detect things like annotation errors and unusually hard samples.This method supports both classifications and detections.
For classifications, a
mistakenness_field
field is populated on each sample that quantifies the likelihood that the label in thelabel_field
of that sample is incorrect.For detections, the mistakenness of each detection in
label_field
is computed, usingfiftyone.core.collections.SampleCollection.evaluate_detections()
to locate corresponding detections inpred_field
. Three types of mistakes are identified:(Mistakes) Detections in
label_field
with a match inpred_field
are assigned a mistakenness value in theirmistakenness_field
that captures the likelihood that the class label of the detection inlabel_field
is a mistake. Amistakenness_field + "_loc"
field is also populated that captures the likelihood that the detection inlabel_field
is a mistake due to its localization (bounding box).(Missing) Detections in
pred_field
with no matches inlabel_field
but which are likely to be correct will have theirmissing_field
attribute set to True. In addition, ifcopy_missing
is True, copies of these detections are added to the ground truth detectionslabel_field
.(Spurious) Detections in
label_field
with no matches inpred_field
but which are likely to be incorrect will have theirspurious_field
attribute set to True.
In addition, for detections only, the following sample-level fields are populated:
(Mistakes) The
mistakenness_field
of each sample is populated with the maximum mistakenness of the detections inlabel_field
(Missing) The
missing_field
of each sample is populated with the number of missing detections that were deemed missing fromlabel_field
.(Spurious) The
spurious_field
of each sample is populated with the number of detections inlabel_field
that were given deemed spurious.
Note
Runs of this method can be referenced later via brain key
mistakenness_field
.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
pred_field – the name of the predicted label field to use from each sample. Can be of type
fiftyone.core.labels.Classification
,fiftyone.core.labels.Classifications
, orfiftyone.core.labels.Detections
label_field ("ground_truth") – the name of the “ground truth” label field that you want to test for mistakes with respect to the predictions in
pred_field
. Must have the same type aspred_field
mistakenness_field ("mistakenness") – the field name to use to store the mistakenness value for each sample
missing_field ("possible_missing) – the field in which to store per-sample counts of potential missing detections. Only applicable for
fiftyone.core.labels.Detections
labelsspurious_field ("possible_spurious) – the field in which to store per-sample counts of potential spurious detections. Only applicable for
fiftyone.core.labels.Detections
labelsuse_logits (False) – whether to use logits (True) or confidence (False) to compute mistakenness. Logits typically yield better results, when they are available
copy_missing (False) – whether to copy predicted detections that were deemed to be missing into
label_field
. Only applicable forfiftyone.core.labels.Detections
labels
-
fiftyone.brain.
compute_uniqueness
(samples, uniqueness_field='uniqueness', roi_field=None, embeddings=None, model=None, batch_size=None, force_square=False, alpha=None)¶ Adds a uniqueness field to each sample scoring how unique it is with respect to the rest of the samples.
This function only uses the pixel data and can therefore process labeled or unlabeled samples.
You can provide your own embeddings to seed this method by specifying either the
embeddings
ormodel
arguments.Note
Runs of this method can be referenced later via brain key
uniqueness_field
.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
uniqueness_field ("uniqueness") – the field name to use to store the uniqueness value for each sample
roi_field (None) – an optional
fiftyone.core.labels.Detection
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polyline
, orfiftyone.core.labels.Polylines
field defining a region of interest within each image to use to compute uniquenessembeddings (None) –
pre-computed embeddings to use. Can be any of the following:
a
num_samples x num_dims
array of embeddingsif
roi_field
is specified, a dict mapping sample IDs tonum_patches x num_dims
arrays of patch embeddingsthe name of a dataset field containing the embeddings to use
model (None) – a
fiftyone.core.models.Model
or the name of a model from the FiftyOne Model Zoo to use to generate embeddings. The model must expose embeddings (model.has_embeddings = True
)batch_size (None) – a batch size to use when computing embeddings. Only applicable when a
model
is providedforce_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a
model
androi_field
are specifiedalpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, \infty)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 1.1
to expand the boxes by 10%, and setalpha = 0.9
to contract the boxes by 10%. Only applicable when amodel
androi_field
are specified
-
fiftyone.brain.
compute_visualization
(samples, patches_field=None, embeddings=None, brain_key=None, num_dims=2, method='umap', config=None, model=None, batch_size=None, force_square=False, alpha=None, **kwargs)¶ Computes a low-dimensional representation of the samples’ media or their patches that can be interactively visualized and manipulated via the returned
VisualizationResults
object.If no
embeddings
ormodel
is provided, a default model is used to generate embeddings.- Parameters
samples – a
fiftyone.core.collections.SampleCollection
patches_field (None) – a sample field defining the image patches in each sample that have been/will be embedded
embeddings (None) –
pre-computed embeddings to use. Can be any of the following:
a
num_samples x num_dims
array of embeddingsif
patches_field
is specified, a dict mapping sample IDs tonum_patches x num_dims
arrays of patch embeddingsthe name of a dataset field containing the embeddings to use
brain_key (None) – a brain key under which to store the results of this visualization
num_dims (2) – the dimension of the visualization space
method ("umap") – the dimensionality-reduction method to use. Supported values are
("umap", "tsne", "pca")
config (None) – a
fiftyone.brain.visualization.VisualizationConfig
specifying the parameters to use. If provided, takes precedence over other parametersmodel (None) –
a
fiftyone.core.models.Model
or the name of a model from the FiftyOne Model Zoo to use to generate embeddings. The model must expose embeddings (model.has_embeddings = True
)batch_size (None) – an optional batch size to use when computing embeddings. Only applicable when a
model
is providedforce_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction. Only applicable when a
model
andpatches_field
are specifiedalpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, \infty)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 1.1
to expand the boxes by 10%, and setalpha = 0.9
to contract the boxes by 10%. Only applicable when amodel
andpatches_field
are specified**kwargs – optional keyword arguments for the constructor of the
fiftyone.brain.visualization.VisualizationConfig
being used
- Returns