# fiftyone.core.collections¶

Interface for sample collections.

Classes

 SampleCollection Abstract class representing an ordered collection of fiftyone.core.sample.Sample instances in a fiftyone.core.dataset.Dataset.

Functions

 aggregation(func) get_frame_labels_fields(sample_collection[, …]) Gets the frame label field(s) of the sample collection matching the specified arguments. get_label_fields(sample_collection[, …]) Gets the label field(s) of the sample collection matching the specified arguments. view_stage(func)
fiftyone.core.collections.view_stage(func)
fiftyone.core.collections.aggregation(func)
class fiftyone.core.collections.SampleCollection

Bases: object

Abstract class representing an ordered collection of fiftyone.core.sample.Sample instances in a fiftyone.core.dataset.Dataset.

property name

The name of the collection.

property media_type

The media type of the collection.

property info

The info dict of the underlying dataset.

See fiftyone.core.dataset.Dataset.info() for more information.

property classes

The classes of the underlying dataset.

See fiftyone.core.dataset.Dataset.classes() for more information.

property default_classes

The default classes of the underlying dataset.

See fiftyone.core.dataset.Dataset.default_classes() for more information.

property mask_targets

The mask targets of the underlying dataset.

See fiftyone.core.dataset.Dataset.mask_targets() for more information.

property default_mask_targets

The default mask targets of the underlying dataset.

See fiftyone.core.dataset.Dataset.default_mask_targets() for more information.

summary()

Returns a string summary of the collection.

Returns

a string summary

first()

Returns the first sample in the collection.

Returns
Raises

ValueError – if the collection is empty

last()

Returns the last sample in the collection.

Returns
Raises

ValueError – if the collection is empty

head(num_samples=3)

Returns a list of the first few samples in the collection.

If fewer than num_samples samples are in the collection, only the available samples are returned.

Parameters

num_samples (3) – the number of samples

Returns

a list of fiftyone.core.sample.Sample objects

tail(num_samples=3)

Returns a list of the last few samples in the collection.

If fewer than num_samples samples are in the collection, only the available samples are returned.

Parameters

num_samples (3) – the number of samples

Returns

a list of fiftyone.core.sample.Sample objects

one(expr, exact=False)

Returns a single sample in this collection matching the expression.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

#
# Get a sample by filepath
#

# A random filepath in the dataset
filepath = dataset.take(1).first().filepath

# Get sample by filepath
sample = dataset.one(F("filepath") == filepath)

#
# Dealing with multiple matches
#

# Get a sample whose image is JPEG
sample = dataset.one(F("filepath").ends_with(".jpg"))

# Raises an error since there are multiple JPEGs
dataset.one(F("filepath").ends_with(".jpg"), exact=True)

Parameters
Returns
view()

Returns a fiftyone.core.view.DatasetView containing the collection.

Returns
iter_samples()

Returns an iterator over the samples in the collection.

Returns

an iterator over fiftyone.core.sample.Sample or fiftyone.core.sample.SampleView instances

get_field_schema(ftype=None, embedded_doc_type=None, include_private=False)

Returns a schema dictionary describing the fields of the samples in the collection.

Parameters
• ftype (None) – an optional field type to which to restrict the returned schema. Must be a subclass of fiftyone.core.fields.Field

• embedded_doc_type (None) – an optional embedded document type to which to restrict the returned schema. Must be a subclass of fiftyone.core.odm.BaseEmbeddedDocument

• include_private (False) – whether to include fields that start with _ in the returned schema

Returns

a dictionary mapping field names to field types

get_frame_field_schema(ftype=None, embedded_doc_type=None, include_private=False)

Returns a schema dictionary describing the fields of the frames of the samples in the collection.

Only applicable for video collections.

Parameters
• ftype (None) – an optional field type to which to restrict the returned schema. Must be a subclass of fiftyone.core.fields.Field

• embedded_doc_type (None) – an optional embedded document type to which to restrict the returned schema. Must be a subclass of fiftyone.core.odm.BaseEmbeddedDocument

• include_private (False) – whether to include fields that start with _ in the returned schema

Returns

a dictionary mapping field names to field types, or None if the collection is not a video collection

make_unique_field_name(root='')

Makes a unique field name with the given root name for the collection.

Parameters

root – an optional root for the output field name

Returns

the field name

has_sample_field(field_name)

Determines whether the collection has a sample field with the given name.

Parameters

field_name – the field name

Returns

True/False

has_frame_field(field_name)

Determines whether the collection has a frame-level field with the given name.

Parameters

field_name – the field name

Returns

True/False

validate_fields_exist(field_or_fields)

Validates that the collection has fields with the given names.

If field_or_fields contains an embedded field name such as field_name.document.field, only the root field_name is checked for existence.

Parameters

field_or_fields – a field name or iterable of field names

Raises

ValueError – if one or more of the fields do not exist

validate_field_type(field_name, ftype, embedded_doc_type=None, subfield=None)

Validates that the collection has a field of the given type.

Parameters
Raises

ValueError – if the field does not exist or does not have the expected type

tag_samples(tags)

Adds the tag(s) to all samples in this collection, if necessary.

Parameters

tags – a tag or iterable of tags

untag_samples(tags)

Removes the tag(s) from all samples in this collection, if necessary.

Parameters

tags – a tag or iterable of tags

count_sample_tags()

Counts the occurrences of sample tags in this collection.

Returns

a dict mapping tags to counts

tag_labels(tags, label_fields=None)

Adds the tag(s) to all labels in the specified label field(s) of this collection, if necessary.

Parameters
untag_labels(tags, label_fields=None)

Removes the tag from all labels in the specified label field(s) of this collection, if necessary.

Parameters
count_label_tags(label_fields=None)

Counts the occurrences of all label tags in the specified label field(s) of this collection.

Parameters

label_fields (None) – an optional name or iterable of names of fiftyone.core.labels.Label fields. By default, all label fields are used

Returns

a dict mapping tags to counts

set_values(field_name, values, skip_none=False, _allow_missing=False)

Sets the field or embedded field on each sample or frame in the collection to the given values.

When setting a sample field embedded.field.name, this function is an efficient implementation of the following loop:

for sample, value in zip(sample_collection, values):
sample.embedded.field.name = value
sample.save()


When modifying a sample field that contains an array, say embedded.array.field.name, this function is an efficient implementation of the following loop:

for sample, array_values in zip(sample_collection, values):
for doc, value in zip(sample.embedded.array):
doc.field.name = value

sample.save()


When setting a frame field frames.embedded.field.name, this function is an efficient implementation of the following loop:

for sample, frame_values in zip(sample_collection, values):
for frame, value in zip(sample.frames.values(), frame_values):
frame.embedded.field.name = value

sample.save()


When modifying a frame field that contains an array, say frames.embedded.array.field.name, this function is an efficient implementation of the following loop:

for sample, frame_values in zip(sample_collection, values):
for frame, array_values in zip(sample.frames.values(), frame_values):
for doc, value in zip(frame.embedded.array, array_values):
doc.field.name = value

sample.save()

Parameters
• field_name – a field or embedded.field.name

• values – an iterable of values, one for each sample in the collection. When setting frame fields, each element should be an iterable of values, one for each frame of the sample. If field_name contains array fields, the corresponding entries of values must be arrays of the same lengths

• skip_none (False) – whether to treat None data in values as missing data that should not be set

compute_metadata(overwrite=False, num_workers=None, skip_failures=True)

Populates the metadata field of all samples in the collection.

Any samples with existing metadata are skipped, unless overwrite == True.

Parameters
• overwrite (False) – whether to overwrite existing metadata

• num_workers (None) – the number of processes to use. By default, multiprocessing.cpu_count() is used

• skip_failures (True) – whether to gracefully continue without raising an error if metadata cannot be computed for a sample

apply_model(model, label_field='predictions', confidence_thresh=None, store_logits=False, batch_size=None, num_workers=None)

Applies the fiftyone.core.models.Model to the samples in the collection.

Parameters
• label_field ("predictions") – the name (or prefix) of the field in which to store the model predictions

• confidence_thresh (None) – an optional confidence threshold to apply to any applicable labels generated by the model

• store_logits (False) – whether to store logits for the model predictions. This is only supported when the provided model has logits, model.has_logits == True

• batch_size (None) – an optional batch size to use. Only applicable for image samples

• num_workers (None) – the number of workers to use when loading images. Only applicable for Torch models

compute_embeddings(model, embeddings_field=None, batch_size=None, num_workers=None)

Computes embeddings for the samples in the collection using the given fiftyone.core.models.Model.

The model must expose embeddings, i.e., fiftyone.core.models.Model.has_embeddings() must return True.

If an embeddings_field is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.

Parameters
• embeddings_field (None) – the name of a field in which to store the embeddings

• batch_size (None) – an optional batch size to use. Only applicable for image samples

• num_workers (None) – the number of workers to use when loading images. Only applicable for Torch models

Returns

None, if an embeddings_field is provided; otherwise, a numpy array whose first dimension is len(samples) containing the embeddings

compute_patch_embeddings(model, patches_field, embeddings_field=None, force_square=False, alpha=None, handle_missing='skip', batch_size=None, num_workers=None)

Computes embeddings for the image patches defined by patches_field of the samples in the collection using the given fiftyone.core.models.Model.

The model must expose embeddings, i.e., fiftyone.core.models.Model.has_embeddings() must return True.

If an embeddings_field is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.

Parameters
• patches_field – a fiftyone.core.labels.Detection, fiftyone.core.labels.Detections, fiftyone.core.labels.Polyline, or fiftyone.core.labels.Polylines field defining the image patches in each sample to embed

• embeddings_field (None) – the name of a field in which to store the embeddings

• force_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction

• alpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in [-1, \infty). If provided, the length and width of the box are expanded (or contracted, when alpha < 0) by (100 * alpha)%. For example, set alpha = 1.1 to expand the boxes by 10%, and set alpha = 0.9 to contract the boxes by 10%

• handle_missing ("skip") –

how to handle images with no patches. Supported values are:

• ”skip”: skip the image and assign its embedding as None

• ”image”: use the whole image as a single patch

• ”error”: raise an error

• batch_size (None) – an optional batch size to use

• num_workers (None) – the number of workers to use when loading images. Only applicable for Torch models

Returns

None, if an embeddings_field is provided; otherwise, a dict mapping sample IDs to arrays of patch embeddings

evaluate_classifications(pred_field, gt_field='ground_truth', eval_key=None, classes=None, missing=None, method='simple', config=None, **kwargs)

Evaluates the classification predictions in this collection with respect to the specified ground truth labels.

By default, this method simply compares the ground truth and prediction for each sample, but other strategies such as binary evaluation and top-k matching can be configured via the method and config parameters.

If an eval_key is specified, then this method will record some statistics on each sample:

• When evaluating sample-level fields, an eval_key field will be populated on each sample recording whether that sample’s prediction is correct.

• When evaluating frame-level fields, an eval_key field will be populated on each frame recording whether that frame’s prediction is correct. In addition, an eval_key field will be populated on each sample that records the average accuracy of the frame predictions of the sample.

Parameters
• pred_field – the name of the field containing the predicted fiftyone.core.labels.Classification instances

• gt_field ("ground_truth") – the name of the field containing the ground truth fiftyone.core.labels.Classification instances

• eval_key (None) – an evaluation key to use to refer to this evaluation

• classes (None) – the list of possible classes. If not provided, classes are loaded from fiftyone.core.dataset.Dataset.classes() or fiftyone.core.dataset.Dataset.default_classes() if possible, or else the observed ground truth/predicted labels are used

• missing (None) – a missing label string. Any None-valued labels are given this label for results purposes

• method ("simple") – a string specifying the evaluation method to use. Supported values are ("simple", "binary", "top-k")

• config (None) – an ClassificationEvaluationConfig specifying the evaluation method to use. If a config is provided, the method and kwargs parameters are ignored

• **kwargs – optional keyword arguments for the constructor of the ClassificationEvaluationConfig being used

Returns

a ClassificationResults

evaluate_detections(pred_field, gt_field='ground_truth', eval_key=None, classes=None, missing=None, method='coco', iou=0.5, classwise=True, config=None, **kwargs)

Evaluates the specified predicted detections in this collection with respect to the specified ground truth detections.

By default, this method uses COCO-style evaluation, but this can be configued via the method and config parameters.

If an eval_key is provided, a number of fields are populated at the detection- and sample-level recording the results of the evaluation:

• True positive (TP), false positive (FP), and false negative (FN) counts for the each sample are saved in top-level fields of each sample:

TP: sample.<eval_key>_tp
FP: sample.<eval_key>_fp
FN: sample.<eval_key>_fn


In addition, when evaluating frame-level objects, TP/FP/FN counts are recorded for each frame:

TP: frame.<eval_key>_tp
FP: frame.<eval_key>_fp
FN: frame.<eval_key>_fn

• The fields listed below are populated on each individual fiftyone.core.labels.Detection instance; these fields tabulate the TP/FP/FN status of the object, the ID of the matching object (if any), and the matching IoU:

TP/FP/FN: detection.<eval_key>
ID: detection.<eval_key>_id
IoU: detection.<eval_key>_iou

Parameters
Returns
evaluate_segmentations(pred_field, gt_field='ground_truth', eval_key=None, mask_targets=None, method='simple', config=None, **kwargs)

Evaluates the specified semantic segmentation masks in this collection with respect to the specified ground truth masks.

If the size of a predicted mask does not match the ground truth mask, it is resized to match the ground truth.

If an eval_key is provided, the accuracy, precision, and recall of each sample is recorded in top-level fields of each sample:

 Accuracy: sample.<eval_key>_accuracy
Precision: sample.<eval_key>_precision
Recall: sample.<eval_key>_recall


In addition, when evaluating frame-level masks, the accuracy, precision, and recall of each frame if recorded in the following frame-level fields:

 Accuracy: frame.<eval_key>_accuracy
Precision: frame.<eval_key>_precision
Recall: frame.<eval_key>_recall


Note

The mask value 0 is treated as a background class for the purposes of computing evaluation metrics like precision and recall.

Parameters
Returns
property has_evaluations

Whether this colection has any evaluation results.

has_evaluation(eval_key)

Whether this collection has an evaluation with the given key.

Parameters

eval_key – an evaluation key

Returns

True/False

list_evaluations()

Returns a list of all evaluation keys on this collection.

Returns

a list of evaluation keys

get_evaluation_info(eval_key)

Returns information about the evaluation with the given key on this collection.

Parameters

eval_key – an evaluation key

Returns
load_evaluation_results(eval_key)

Loads the fiftyone.core.evaluation.EvaluationResults for the evaluation with the given key on this collection.

Parameters

eval_key – an evaluation key

Returns
load_evaluation_view(eval_key, select_fields=False)

Loads the fiftyone.core.view.DatasetView on which the specified evaluation was performed on this collection.

Parameters
• eval_key – an evaluation key

• select_fields (False) – whether to select only the fields involved in the evaluation

Returns
delete_evaluation(eval_key)

Deletes the evaluation results associated with the given evaluation key from this collection.

Parameters

eval_key – an evaluation key

delete_evaluations()

Deletes all evaluation results from this collection.

property has_brain_runs

Whether this colection has any brain runs.

has_brain_run(brain_key)

Whether this collection has a brain method run with the given key.

Parameters

brain_key – a brain key

Returns

True/False

list_brain_runs()

Returns a list of all brain keys on this collection.

Returns

a list of brain keys

get_brain_info(brain_key)

Returns information about the brain method run with the given key on this collection.

Parameters

brain_key – a brain key

Returns
load_brain_results(brain_key)

Loads the fiftyone.core.brain.BrainResults for the run with the given key on this collection.

Parameters

brain_key – a brain key

Returns
load_brain_view(brain_key, select_fields=False)

Loads the fiftyone.core.view.DatasetView on which the specified brain method run was performed on this collection.

Parameters
• brain_key – a brain key

• select_fields (False) – whether to select only the fields involved in the brain method run

Returns
delete_brain_run(brain_key)

Deletes the brain method run with the given key from this collection.

Parameters

brain_key – a brain key

delete_brain_runs()

Deletes all brain method runs from this collection.

classmethod list_view_stages()

Returns a list of all available methods on this collection that apply fiftyone.core.stages.ViewStage operations to this collection.

Returns

a list of SampleCollection method names

add_stage(stage)

Applies the given fiftyone.core.stages.ViewStage to the collection.

Parameters
Returns
Raises

fiftyone.core.stages.ViewStageError – if the stage was not a valid stage for this collection

exclude(sample_ids)

Excludes the samples with the given IDs from the collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(filepath="/path/to/image1.png"),
fo.Sample(filepath="/path/to/image2.png"),
fo.Sample(filepath="/path/to/image3.png"),
]
)

#
# Exclude the first sample from the dataset
#

sample_id = dataset.first().id
view = dataset.exclude(sample_id)

#
# Exclude the first and last samples from the dataset
#

sample_ids = [dataset.first().id, dataset.last().id]
view = dataset.exclude(sample_ids)

Parameters

sample_ids

the samples to exclude. Can be any of the following:

Returns
exclude_fields(field_names)

Excludes the fields with the given names from the samples in the collection.

Note that default fields cannot be excluded.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
predictions=fo.Classification(label="cat", confidence=0.9),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
predictions=fo.Classification(label="dog", confidence=0.8),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
predictions=None,
),
]
)

#
# Exclude the predictions field from all samples
#

view = dataset.exclude_fields("predictions")

Parameters

field_names – a field name or iterable of field names to exclude

Returns
exclude_labels(labels=None, ids=None, tags=None, fields=None)

Excludes the specified labels from the collection.

The returned view will omit samples, sample fields, and individual labels that do not match the specified selection criteria.

You can perform an exclusion via one of the following methods:

• Provide one or both of the ids and tags arguments, and optionally the fields argument

• Provide the labels argument, which should have the following format:

[
{
"field": "ground_truth",
},
{
"field": "ground_truth",
},
...
]


Examples:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Exclude the labels currently selected in the App
#

session = fo.launch_app(dataset)

# Select some labels in the App...

view = dataset.exclude_labels(labels=session.selected_labels)

#
# Exclude labels with the specified IDs
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

view = dataset.exclude_labels(ids=ids)

print(dataset.count("ground_truth.detections"))
print(view.count("ground_truth.detections"))

print(dataset.count("predictions.detections"))
print(view.count("predictions.detections"))

#
# Exclude labels with the specified tags
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

# Give the labels a "test" tag
dataset = dataset.clone()  # create a copy since we're modifying data
dataset.select_labels(ids=ids).tag_labels("test")

print(dataset.count_values("ground_truth.detections.tags"))
print(dataset.count_values("predictions.detections.tags"))

# Exclude the labels via their tag
view = dataset.exclude_labels(tags=["test"])

print(dataset.count("ground_truth.detections"))
print(view.count("ground_truth.detections"))

print(dataset.count("predictions.detections"))
print(view.count("predictions.detections"))

Parameters
• labels (None) – a list of dicts specifying the labels to exclude

• ids (None) – an ID or iterable of IDs of the labels to exclude

• tags (None) – a tag or iterable of tags of labels to exclude

• fields (None) – a field or iterable of fields from which to exclude

exists(field, bool=True)

Returns a view containing the samples in the collection that have (or do not have) a non-None value for the given field or embedded field.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
predictions=fo.Classification(label="cat", confidence=0.9),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
predictions=fo.Classification(label="dog", confidence=0.8),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=fo.Classification(label="dog"),
predictions=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image4.png",
ground_truth=None,
predictions=None,
),
fo.Sample(filepath="/path/to/image5.png"),
]
)

#
# Only include samples that have a value in their predictions
# field
#

view = dataset.exists("predictions")

#
# Only include samples that do NOT have a value in their
# predictions field
#

view = dataset.exists("predictions", False)

#
# Only include samples that have prediction confidences
#

view = dataset.exists("predictions.confidence")

Parameters
• field – the field name or embedded.field.name

• bool (True) – whether to check if the field exists (True) or does not exist (False)

Returns
filter_field(field, filter, only_matches=True)

Filters the values of a given sample (or embedded document) field of each sample in the collection.

Values of field for which filter returns False are replaced with None.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
predictions=fo.Classification(label="cat", confidence=0.9),
numeric_field=1.0,
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
predictions=fo.Classification(label="dog", confidence=0.8),
numeric_field=-1.0,
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
predictions=None,
numeric_field=None,
),
]
)

#
# Only include classifications in the predictions field
# whose label is "cat"
#

view = dataset.filter_field("predictions", F("label") == "cat")

#
# Only include samples whose numeric_field value is positive
#

view = dataset.filter_field("numeric_field", F() > 0)

Parameters
Returns
filter_labels(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Label field of each sample in the collection.

If the specified field is a single fiftyone.core.labels.Label type, fields for which filter returns False are replaced with None:

If the specified field is a fiftyone.core.labels.Label list type, the label elements for which filter returns False are omitted from the view:

Classifications Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Classification(label="cat", confidence=0.9),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Classification(label="dog", confidence=0.8),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=fo.Classification(label="rabbit"),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Only include classifications in the predictions field whose
# confidence is greater than 0.8
#

view = dataset.filter_labels("predictions", F("confidence") > 0.8)

#
# Only include classifications in the predictions field whose
# label is "cat" or "dog"
#

view = dataset.filter_labels(
"predictions", F("label").is_in(["cat", "dog"])
)


Detections Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="squirrel",
bounding_box=[0.25, 0.25, 0.5, 0.5],
confidence=0.5,
),
]
),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Only include detections in the predictions field whose
# confidence is greater than 0.8
#

view = dataset.filter_labels("predictions", F("confidence") > 0.8)

#
# Only include detections in the predictions field whose label
# is "cat" or "dog"
#

view = dataset.filter_labels(
"predictions", F("label").is_in(["cat", "dog"])
)

#
# Only include detections in the predictions field whose bounding
# box area is smaller than 0.2
#

# Bboxes are in [top-left-x, top-left-y, width, height] format
bbox_area = F("bounding_box")[2] * F("bounding_box")[3]

view = dataset.filter_labels("predictions", bbox_area < 0.2)


Polylines Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Polylines(
polylines=[
fo.Polyline(
label="lane",
points=[[(0.1, 0.1), (0.1, 0.6)]],
filled=False,
),
fo.Polyline(
points=[[(0.2, 0.2), (0.5, 0.5), (0.2, 0.5)]],
filled=True,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Polylines(
polylines=[
fo.Polyline(
label="lane",
points=[[(0.4, 0.4), (0.9, 0.4)]],
filled=False,
),
fo.Polyline(
points=[[(0.6, 0.6), (0.9, 0.9), (0.6, 0.9)]],
filled=True,
),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Only include polylines in the predictions field that are filled
#

view = dataset.filter_labels("predictions", F("filled") == True)

#
# Only include polylines in the predictions field whose label
# is "lane"
#

view = dataset.filter_labels("predictions", F("label") == "lane")

#
# Only include polylines in the predictions field with at least
# 3 vertices
#

num_vertices = F("points").map(F().length()).sum()
view = dataset.filter_labels("predictions", num_vertices >= 3)


Keypoints Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Keypoint(
label="house",
points=[(0.1, 0.1), (0.1, 0.9), (0.9, 0.9), (0.9, 0.1)],
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Keypoint(
label="window",
points=[(0.4, 0.4), (0.5, 0.5), (0.6, 0.6)],
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Only include keypoints in the predictions field whose label
# is "house"
#

view = dataset.filter_labels("predictions", F("label") == "house")

#
# Only include keypoints in the predictions field with less than
# four points
#

view = dataset.filter_labels("predictions", F("points").length() < 4)

Parameters
Returns
filter_classifications(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Classification elements in the specified fiftyone.core.labels.Classifications field of each sample in the collection.

Warning

This method is deprecated and will be removed in a future release. Use the drop-in replacement filter_labels() instead.

Parameters
Returns
filter_detections(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Detection elements in the specified fiftyone.core.labels.Detections field of each sample in the collection.

Warning

This method is deprecated and will be removed in a future release. Use the drop-in replacement filter_labels() instead.

Parameters
Returns
filter_polylines(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Polyline elements in the specified fiftyone.core.labels.Polylines field of each sample in the collection.

Warning

This method is deprecated and will be removed in a future release. Use the drop-in replacement filter_labels() instead.

Parameters
Returns
filter_keypoints(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Keypoint elements in the specified fiftyone.core.labels.Keypoints field of each sample in the collection.

Warning

This method is deprecated and will be removed in a future release. Use the drop-in replacement filter_labels() instead.

Parameters
Returns
geo_near(point, location_field=None, min_distance=None, max_distance=None, query=None)

Sorts the samples in the collection by their proximity to a specified geolocation.

Note

This stage must be the first stage in any fiftyone.core.view.DatasetView in which it appears.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

TIMES_SQUARE = [-73.9855, 40.7580]

#
# Sort the samples by their proximity to Times Square
#

view = dataset.geo_near(TIMES_SQUARE)

#
# Sort the samples by their proximity to Times Square, and only
# include samples within 5km
#

view = dataset.geo_near(TIMES_SQUARE, max_distance=5000)

#
# Sort the samples by their proximity to Times Square, and only
# include samples that are in Manhattan
#

import fiftyone.utils.geojson as foug

in_manhattan = foug.geo_within(
"location.point",
[
[
[-73.949701, 40.834487],
[-73.896611, 40.815076],
[-73.998083, 40.696534],
[-74.031751, 40.715273],
[-73.949701, 40.834487],
]
]
)

view = dataset.geo_near(
TIMES_SQUARE, location_field="location", query=in_manhattan
)

Parameters
• point

the reference point to compute distances to. Can be any of the following:

• location_field (None) –

the location data of each sample to use. Can be any of the following:

• The name of a fiftyone.core.fields.GeoLocation field whose point attribute to use as location data

• An embedded.field.name containing GeoJSON data to use as location data

• None, in which case there must be a single fiftyone.core.fields.GeoLocation field on the samples, which is used by default

• min_distance (None) – filter samples that are less than this distance (in meters) from point

• max_distance (None) – filter samples that are greater than this distance (in meters) from point

• query (None) – an optional dict defining a MongoDB read query that samples must match in order to be included in this view

Returns
geo_within(boundary, location_field=None, strict=True)

Filters the samples in this collection to only include samples whose geolocation is within a specified boundary.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

MANHATTAN = [
[
[-73.949701, 40.834487],
[-73.896611, 40.815076],
[-73.998083, 40.696534],
[-74.031751, 40.715273],
[-73.949701, 40.834487],
]
]

#
# Create a view that only contains samples in Manhattan
#

view = dataset.geo_within(MANHATTAN)

Parameters
• boundary – a fiftyone.core.labels.GeoLocation, fiftyone.core.labels.GeoLocations, GeoJSON dict, or list of coordinates that define a Polygon or MultiPolygon to search within

• location_field (None) –

the location data of each sample to use. Can be any of the following:

• The name of a fiftyone.core.fields.GeoLocation field whose point attribute to use as location data

• An embedded.field.name that directly contains the GeoJSON location data to use

• None, in which case there must be a single fiftyone.core.fields.GeoLocation field on the samples, which is used by default

• strict (True) – whether a sample’s location data must strictly fall within boundary (True) in order to match, or whether any intersection suffices (False)

Returns
limit(limit)

Returns a view with at most the given number of samples.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
),
]
)

#
# Only include the first 2 samples in the view
#

view = dataset.limit(2)

Parameters

limit – the maximum number of samples to return. If a non-positive number is provided, an empty view is returned

Returns
limit_labels(field, limit)

Limits the number of fiftyone.core.labels.Label instances in the specified labels list field of each sample in the collection.

The specified field must be one of the following types:

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Only include the first detection in the predictions field of
# each sample
#

view = dataset.limit_labels("predictions", 1)

Parameters
• field – the labels list field to filter

• limit – the maximum number of labels to include in each labels list. If a non-positive number is provided, all lists will be empty

Returns
map_labels(field, map)

Maps the label values of a fiftyone.core.labels.Label field to new values for each sample in the collection.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
weather=fo.Classification(label="sunny"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
weather=fo.Classification(label="cloudy"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
weather=fo.Classification(label="partly cloudy"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="squirrel",
bounding_box=[0.25, 0.25, 0.5, 0.5],
confidence=0.5,
),
]
),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Map the "partly cloudy" weather label to "cloudy"
#

view = dataset.map_labels("weather", {"partly cloudy": "cloudy"})

#
# Map "rabbit" and "squirrel" predictions to "other"
#

view = dataset.map_labels(
"predictions", {"rabbit": "other", "squirrel": "other"}
)

Parameters
• field – the labels field to map

• map – a dict mapping label values to new label values

Returns
set_field(field, expr, _allow_missing=False)

Sets a field or embedded field on each sample in a collection by evaluating the given expression.

This method can process embedded list fields. To do so, simply append [] to any list component(s) of the field path.

Note

There are two cases where FiftyOne will automatically unwind array fields without requiring you to explicitly specify this via the [] syntax:

Top-level lists: when you specify a field path that refers to a top-level list field of a dataset; i.e., list_field is automatically coerced to list_field[], if necessary.

List fields: When you specify a field path that refers to the list field of a Label class, such as the Detections.detections attribute; i.e., ground_truth.detections.label is automatically coerced to ground_truth.detections[].label, if necessary.

See the examples below for demonstrations of this behavior.

The provided expr is interpreted relative to the document on which the embedded field is being set. For example, if you are setting a nested field field="embedded.document.field", then the expression expr you provide will be applied to the embedded.document document. Note that you can override this behavior by defining an expression that is bound to the root document by prepending "$" to any field name(s) in the expression. See the examples below for more information. Note Note that you cannot set a non-existing top-level field using this stage, since doing so would violate the dataset’s schema. You can, however, first declare a new field via fiftyone.core.dataset.Dataset.add_sample_field() and then populate it in a view via this stage. Examples: import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Replace all values of the uniqueness field that are less than # 0.5 with None # view = dataset.set_field( "uniqueness", (F("uniqueness") >= 0.5).if_else(F("uniqueness"), None) ) print(view.bounds("uniqueness")) # # Lower bound all object confidences in the predictions field at # 0.5 # view = dataset.set_field( "predictions.detections.confidence", F("confidence").max(0.5) ) print(view.bounds("predictions.detections.confidence")) # # Add a num_predictions property to the predictions field that # contains the number of objects in the field # view = dataset.set_field( "predictions.num_predictions", F("$predictions.detections").length(),
)
print(view.bounds("predictions.num_predictions"))

#
# Set an is_animal field on each object in the predictions field
# that indicates whether the object is an animal
#

ANIMALS = [
"bear", "bird", "cat", "cow", "dog", "elephant", "giraffe",
"horse", "sheep", "zebra"
]

view = dataset.set_field(
"predictions.detections.is_animal", F("label").is_in(ANIMALS)
)
print(view.count_values("predictions.detections.is_animal"))

Parameters
Returns
match(filter)

Filters the samples in the collection by the given filter.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
weather=fo.Classification(label="sunny"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.jpg",
weather=fo.Classification(label="cloudy"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
weather=fo.Classification(label="partly cloudy"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="squirrel",
bounding_box=[0.25, 0.25, 0.5, 0.5],
confidence=0.5,
),
]
),
),
fo.Sample(
filepath="/path/to/image4.jpg",
predictions=None,
),
]
)

#
# Only include samples whose filepath ends with ".jpg"
#

view = dataset.match(F("filepath").ends_with(".jpg"))

#
# Only include samples whose weather field is "sunny"
#

view = dataset.match(F("weather").label == "sunny")

#
# Only include samples with at least 2 objects in their
# predictions field
#

view = dataset.match(F("predictions").detections.length() >= 2)

#
# Only include samples whose predictions field contains at least
# one object with area smaller than 0.2
#

# Bboxes are in [top-left-x, top-left-y, width, height] format
bbox = F("bounding_box")
bbox_area = bbox[2] * bbox[3]

small_boxes = F("predictions.detections").filter(bbox_area < 0.2)
view = dataset.match(small_boxes.length() > 0)

Parameters

filter

a fiftyone.core.expressions.ViewExpression or MongoDB expression that returns a boolean describing the filter to apply

Returns
match_tags(tags)

Returns a view containing the samples in the collection that have any of the given tag(s).

To match samples that must contain multiple tags, chain multiple match_tags() calls together.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
tags=["train"],
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
tags=["test"],
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
),
]
)

#
# Only include samples that have the "test" tag
#

view = dataset.match_tags("test")

#
# Only include samples that have either the "test" or "train" tag
#

view = dataset.match_tags(["test", "train"])

Parameters

tags – the tag or iterable of tags to match

Returns
mongo(pipeline)

Adds a view stage defined by a raw MongoDB aggregation pipeline.

See MongoDB aggregation pipelines for more details.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="squirrel",
bounding_box=[0.25, 0.25, 0.5, 0.5],
confidence=0.5,
),
]
),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Extract a view containing the second and third samples in the
# dataset
#

view = dataset.mongo([{"$skip": 1}, {"$limit": 2}])

#
# Sort by the number of objects in the precictions field
#

view = dataset.mongo([
{
"$addFields": { "_sort_field": { "$size": {"$ifNull": ["$predictions.detections", []]}
}
}
},
{"$sort": {"_sort_field": -1}}, {"$unset": "_sort_field"}
])

Parameters

pipeline – a MongoDB aggregation pipeline (list of dicts)

Returns
select(sample_ids)

Selects the samples with the given IDs from the collection.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Create a view containing the currently selected samples in the App
#

session = fo.launch_app(dataset)

# Select samples in the App...

view = dataset.select(session.selected)

Parameters

sample_ids

the samples to select. Can be any of the following:

Returns
select_fields(field_names=None)

Selects only the fields with the given names from the samples in the collection. All other fields are excluded.

Note that default sample fields are always selected.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[-1, 0, 1],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=-1.0,
numeric_list_field=[-2, -1, 0, 1],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
),
]
)

#
# Include only the default fields on each sample
#

view = dataset.select_fields()

#
# Include only the numeric_field field (and the default fields)
# on each sample
#

view = dataset.select_fields("numeric_field")

Parameters

field_names (None) – a field name or iterable of field names to select

Returns
select_labels(labels=None, ids=None, tags=None, fields=None)

Selects only the specified labels from the collection.

The returned view will omit samples, sample fields, and individual labels that do not match the specified selection criteria.

You can perform a selection via one of the following methods:

• Provide one or both of the ids and tags arguments, and optionally the fields argument

• Provide the labels argument, which should have the following format:

[
{
"field": "ground_truth",
},
{
"field": "ground_truth",
},
...
]


Examples:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Only include the labels currently selected in the App
#

session = fo.launch_app(dataset)

# Select some labels in the App...

view = dataset.select_labels(labels=session.selected_labels)

#
# Only include labels with the specified IDs
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

view = dataset.select_labels(ids=ids)

print(view.count("ground_truth.detections"))
print(view.count("predictions.detections"))

#
# Only include labels with the specified tags
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

# Give the labels a "test" tag
dataset = dataset.clone()  # create a copy since we're modifying data
dataset.select_labels(ids=ids).tag_labels("test")

print(dataset.count_label_tags())

# Retrieve the labels via their tag
view = dataset.select_labels(tags=["test"])

print(view.count("ground_truth.detections"))
print(view.count("predictions.detections"))

Parameters
• labels (None) – a list of dicts specifying the labels to select

• ids (None) – an ID or iterable of IDs of the labels to select

• tags (None) – a tag or iterable of tags of labels to select

• fields (None) – a field or iterable of fields from which to select

Returns
shuffle(seed=None)

Randomly shuffles the samples in the collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
),
]
)

#
# Return a view that contains a randomly shuffled version of the
# samples in the dataset
#

view = dataset.shuffle()

#
# Shuffle the samples with a fixed random seed
#

view = dataset.shuffle(seed=51)

Parameters

seed (None) – an optional random seed to use when shuffling the samples

Returns
skip(skip)

Omits the given number of samples from the head of the collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=fo.Classification(label="rabbit"),
),
fo.Sample(
filepath="/path/to/image4.png",
ground_truth=None,
),
]
)

#
# Omit the first two samples from the dataset
#

view = dataset.skip(2)

Parameters

skip – the number of samples to skip. If a non-positive number is provided, no samples are omitted

Returns
sort_by(field_or_expr, reverse=False)

Sorts the samples in the collection by the given field or expression.

When sorting by an expression, field_or_expr can either be a fiftyone.core.expressions.ViewExpression or a MongoDB expression that defines the quantity to sort by.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

#
# Sort the samples by their uniqueness field in ascending order
#

view = dataset.sort_by("uniqueness", reverse=False)

#
# Sorts the samples in descending order by the number of detections
# in their predictions field whose bounding box area is less than
# 0.2
#

# Bboxes are in [top-left-x, top-left-y, width, height] format
bbox = F("bounding_box")
bbox_area = bbox[2] * bbox[3]

small_boxes = F("predictions.detections").filter(bbox_area < 0.2)
view = dataset.sort_by(small_boxes.length(), reverse=True)

Parameters
• field_or_expr – the field or expression to sort by

• reverse (False) – whether to return the results in descending order

Returns
take(size, seed=None)

Randomly samples the given number of samples from the collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=fo.Classification(label="rabbit"),
),
fo.Sample(
filepath="/path/to/image4.png",
ground_truth=None,
),
]
)

#
# Take two random samples from the dataset
#

view = dataset.take(2)

#
# Take two random samples from the dataset with a fixed seed
#

view = dataset.take(2, seed=51)

Parameters
• size – the number of samples to return. If a non-positive number is provided, an empty view is returned

• seed (None) – an optional random seed to use when selecting the samples

Returns
classmethod list_aggregations()

Returns a list of all available methods on this collection that apply fiftyone.core.aggregations.Aggregation operations to this collection.

Returns

a list of SampleCollection method names

bounds(field_name, expr=None)

Computes the bounds of a numeric field of the collection.

None-valued fields are ignored.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Compute the bounds of a numeric field
#

bounds = dataset.bounds("numeric_field")
print(bounds)  # (min, max)

#
# Compute the a bounds of a numeric list field
#

bounds = dataset.bounds("numeric_list_field")
print(bounds)  # (min, max)

#
# Compute the bounds of a transformation of a numeric field
#

bounds = dataset.bounds("numeric_field", expr=2 * (F() + 1))
print(bounds)  # (min, max)

Parameters
Returns

the (min, max) bounds

count(field_name=None, expr=None)

Counts the number of field values in the collection.

None-valued fields are ignored.

If no field is provided, the samples themselves are counted.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="rabbit"),
fo.Detection(label="squirrel"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Count the number of samples in the dataset
#

count = dataset.count()
print(count)  # the count

#
# Count the number of samples with predictions
#

count = dataset.count("predictions")
print(count)  # the count

#
# Count the number of objects in the predictions field
#

count = dataset.count("predictions.detections")
print(count)  # the count

#
# Count the number of samples with more than 2 predictions
#

expr = (F("detections").length() > 2).if_else(F("detections"), None)
count = dataset.count("predictions", expr=expr)
print(count)  # the count

Parameters
Returns

the count

count_values(field_name, expr=None)

Counts the occurrences of field values in the collection.

This aggregation is typically applied to countable field types (or lists of such types):

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
tags=["sunny"],
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
tags=["cloudy"],
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Compute the tag counts in the dataset
#

counts = dataset.count_values("tags")
print(counts)  # dict mapping values to counts

#
# Compute the predicted label counts in the dataset
#

counts = dataset.count_values("predictions.detections.label")
print(counts)  # dict mapping values to counts

#
# Compute the predicted label counts after some normalization
#

expr = F().map_values({"cat": "pet", "dog": "pet"}).upper()
counts = dataset.count_values("predictions.detections.label", expr=expr)
print(counts)  # dict mapping values to counts

Parameters
Returns

a dict mapping values to counts

distinct(field_name, expr=None)

Computes the distinct values of a field in the collection.

None-valued fields are ignored.

This aggregation is typically applied to countable field types (or lists of such types):

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
tags=["sunny"],
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
tags=["sunny", "cloudy"],
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Get the distinct tags in a dataset
#

values = dataset.distinct("tags")
print(values)  # list of distinct values

#
# Get the distinct predicted labels in a dataset
#

values = dataset.distinct("predictions.detections.label")
print(values)  # list of distinct values

#
# Get the distinct predicted labels after some normalization
#

expr = F().map_values({"cat": "pet", "dog": "pet"}).upper()
values = dataset.distinct("predictions.detections.label", expr=expr)
print(values)  # list of distinct values

Parameters
Returns

a sorted list of distinct values

histogram_values(field_name, expr=None, bins=None, range=None, auto=False)

Computes a histogram of the field values in the collection.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import numpy as np
import matplotlib.pyplot as plt

import fiftyone as fo

samples = []
for idx in range(100):
samples.append(
fo.Sample(
filepath="/path/to/image%d.png" % idx,
numeric_field=np.random.randn(),
numeric_list_field=list(np.random.randn(10)),
)
)

dataset = fo.Dataset()

def plot_hist(counts, edges):
counts = np.asarray(counts)
edges = np.asarray(edges)
left_edges = edges[:-1]
widths = edges[1:] - edges[:-1]
plt.bar(left_edges, counts, width=widths, align="edge")

#
# Compute a histogram of a numeric field
#

counts, edges, other = dataset.histogram_values(
"numeric_field", bins=50, range=(-4, 4)
)

plot_hist(counts, edges)
plt.show(block=False)

#
# Compute the histogram of a numeric list field
#

counts, edges, other = dataset.histogram_values(
"numeric_list_field", bins=50
)

plot_hist(counts, edges)
plt.show(block=False)

#
# Compute the histogram of a transformation of a numeric field
#

counts, edges, other = dataset.histogram_values(
"numeric_field", expr=2 * (F() + 1), bins=50
)

plot_hist(counts, edges)
plt.show(block=False)

Parameters
• field_name – the name of the field to operate on

• expr (None) –

an optional fiftyone.core.expressions.ViewExpression or MongoDB expression to apply to the field before aggregating

• bins (None) – can be either an integer number of bins to generate or a monotonically increasing sequence specifying the bin edges to use. By default, 10 bins are created. If bins is an integer and no range is specified, bin edges are automatically distributed in an attempt to evenly distribute the counts in each bin

• range (None) – a (lower, upper) tuple specifying a range in which to generate equal-width bins. Only applicable when bins is an integer

• auto (False) – whether to automatically choose bin edges in an attempt to evenly distribute the counts in each bin. If this option is chosen, bins will only be used if it is an integer, and the range parameter is ignored

Returns

a tuple of

• counts: a list of counts in each bin

• edges: an increasing list of bin edges of length len(counts) + 1. Note that each bin is treated as having an inclusive lower boundary and exclusive upper boundary, [lower, upper), including the rightmost bin

• other: the number of items outside the bins

mean(field_name, expr=None)

Computes the arithmetic mean of the field values of the collection.

None-valued fields are ignored.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Compute the mean of a numeric field
#

mean = dataset.mean("numeric_field")
print(mean)  # the mean

#
# Compute the mean of a numeric list field
#

mean = dataset.mean("numeric_list_field")
print(mean)  # the mean

#
# Compute the mean of a transformation of a numeric field
#

mean = dataset.mean("numeric_field", expr=2 * (F() + 1))
print(mean)  # the mean

Parameters
Returns

the mean

std(field_name, expr=None, sample=False)

Computes the standard deviation of the field values of the collection.

None-valued fields are ignored.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Compute the standard deviation of a numeric field
#

std = dataset.std("numeric_field")
print(std)  # the standard deviation

#
# Compute the standard deviation of a numeric list field
#

std = dataset.std("numeric_list_field")
print(std)  # the standard deviation

#
# Compute the standard deviation of a transformation of a numeric field
#

std = dataset.std("numeric_field", expr=2 * (F() + 1))
print(std)  # the standard deviation

Parameters
Returns

the standard deviation

sum(field_name, expr=None)

Computes the sum of the field values of the collection.

None-valued fields are ignored.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Compute the sum of a numeric field
#

total = dataset.sum("numeric_field")
print(total)  # the sum

#
# Compute the sum of a numeric list field
#

total = dataset.sum("numeric_list_field")
print(total)  # the sum

#
# Compute the sum of a transformation of a numeric field
#

total = dataset.sum("numeric_field", expr=2 * (F() + 1))
print(total)  # the sum

Parameters
Returns

the sum

values(field_name, expr=None, missing_value=None, unwind=False, _allow_missing=False)

Extracts the values of a field from all samples in the collection.

Note

Unlike other aggregations, values() does not automatically unwind list fields, which ensures that the returned values match the potentially-nested structure of the documents.

You can opt-in to unwinding specific list fields using the [] syntax, or you can pass the optional unwind=True parameter to unwind all supported list fields. See Aggregating list fields for more information.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Get all values of a field
#

values = dataset.values("numeric_field")
print(values)  # [1.0, 4.0, None]

#
# Get all values of a list field
#

values = dataset.values("numeric_list_field")
print(values)  # [[1, 2, 3], [1, 2], None]

#
# Get all values of transformed field
#

values = dataset.values("numeric_field", expr=2 * (F() + 1))
print(values)  # [4.0, 10.0, None]

Parameters
• field_name – the name of the field to operate on

• expr (None) –

an optional fiftyone.core.expressions.ViewExpression or MongoDB expression to apply to the field before aggregating

• missing_value (None) – a value to insert for missing or None-valued fields

• unwind (False) – whether to automatically unwind all recognized list fields

Returns

the list of values

draw_labels(anno_dir, label_fields=None, overwrite=False, annotation_config=None)

Renders annotated versions of the samples in the collection with label field(s) overlaid to the given directory.

The filenames of the sample data are maintained, unless a name conflict would occur in anno_dir, in which case an index of the form "-%d" % count is appended to the base filename.

Images are written in format fo.config.default_image_ext.

Parameters
Returns

the list of paths to the labeled images

export(export_dir=None, dataset_type=None, dataset_exporter=None, label_field=None, label_prefix=None, labels_dict=None, frame_labels_field=None, frame_labels_prefix=None, frame_labels_dict=None, overwrite=False, **kwargs)

Exports the samples in the collection to disk.

Provide either export_dir and dataset_type or dataset_exporter to perform an export.

See this guide for more details about exporting datasets in custom formats by defining your own DatasetExporter.

Parameters
• export_dir (None) –

the directory to which to export the samples in format dataset_type. This can also be an archive path with one of the following extensions:

.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz


If an archive path is specified, the export is performed in a directory of same name (minus extension) and then automatically archived and the directory then deleted

• dataset_type (None) – the fiftyone.types.dataset_types.Dataset type to write. If not specified, the default type for label_field is used

• dataset_exporter (None) – a fiftyone.utils.data.exporters.DatasetExporter to use to export the samples

• label_field (None) – the name of the label field to export. Only applicable to labeled image datasets or labeled video datasets with sample-level labels. If none of label_field, label_prefix, and labels_dict are specified and the requested output type is a labeled image dataset or labeled video dataset with sample-level labels, the first field of compatible type for the output format is used

• label_prefix (None) – a label field prefix; all fields whose name starts with the given prefix will be exported (with the prefix removed when constructing the label dicts). Only applicable to labeled image datasets or labeled video datasets with sample-level labels. This parameter can only be used when the exporter can handle dictionaries of labels

• labels_dict (None) – a dictionary mapping label field names to keys to use when constructing the label dict to pass to the exporter. Only applicable to labeled image datasets or labeled video datasets with sample-level labels. This parameter can only be used when the exporter can handle dictionaries of labels

• frame_labels_field (None) – the name of the frame labels field to export. Only applicable for labeled video datasets. If none of frame_labels_field, frame_labels_prefix, and frame_labels_dict are specified and the requested output type is a labeled video dataset with frame-level labels, the first frame-level field of compatible type for the output format is used

• frame_labels_prefix (None) – a frame labels field prefix; all frame-level fields whose name starts with the given prefix will be exported (with the prefix removed when constructing the frame label dicts). Only applicable for labeled video datasets. This parameter can only be used when the exporter can handle dictionaries of frame-level labels

• frame_labels_dict (None) – a dictionary mapping frame-level label field names to keys to use when constructing the frame labels dicts to pass to the exporter. Only applicable for labeled video datasets. This parameter can only be used when the exporter can handle dictionaries of frame-level labels

• overwrite (False) – when an export_dir is provided, whether to delete the existing directory before performing the export

• **kwargs – optional keyword arguments to pass to the dataset exporter’s constructor via DatasetExporter(export_dir, **kwargs)

list_indexes(include_private=False)

Returns the fields of the dataset that are indexed.

Parameters

include_private (False) – whether to include private fields that start with _

Returns

a list of field names

create_index(field_name, unique=False, sphere2d=False)

Creates an index on the given field.

If the given field already has a unique index, it will be retained regardless of the unique value you specify.

If the given field already has a non-unique index but you requested a unique index, the existing index will be dropped.

Indexes enable efficient sorting, merging, and other such operations.

Parameters
• field_name – the field name or embedded.field.name

• unique (False) – whether to add a uniqueness constraint to the index

• sphere2d (False) – whether the field is a GeoJSON field that requires a sphere2d index

drop_index(field_name)

Drops the index on the given field.

Parameters

field_name – the field name or embedded.field.name

to_dict(rel_dir=None, frame_labels_dir=None, pretty_print=False)

Returns a JSON dictionary representation of the collection.

Parameters
• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render frame labels JSON in human readable format with newlines and indentations. Only applicable to video datasets when a frame_labels_dir is provided

Returns

a JSON dict

to_json(rel_dir=None, frame_labels_dir=None, pretty_print=False)

Returns a JSON string representation of the collection.

The samples will be written as a list in a top-level samples field of the returned dictionary.

Parameters
• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

Returns

a JSON string

write_json(json_path, rel_dir=None, frame_labels_dir=None, pretty_print=False)

Writes the colllection to disk in JSON format.

Parameters
• json_path – the path to write the JSON

• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

aggregate(aggregations, _attach_frames=True)

Aggregates one or more fiftyone.core.aggregations.Aggregation instances.

Note that it is best practice to group aggregations into a single call to aggregate(), as this will be more efficient than performing multiple aggregations in series.

Parameters

aggregations – an fiftyone.core.aggregations.Aggregation or iterable of <fiftyone.core.aggregations.Aggregation> instances

Returns

an aggregation result or list of aggregation results corresponding to the input aggregation(s)

fiftyone.core.collections.get_label_fields(sample_collection, label_field=None, label_prefix=None, labels_dict=None, dataset_exporter=None, required=False, force_dict=False)

Gets the label field(s) of the sample collection matching the specified arguments.

Provide one of label_field, label_prefix, labels_dict, or dataset_exporter.

Parameters
• sample_collection – a SampleCollection

• label_field (None) – the name of the label field to export

• label_prefix (None) – a label field prefix; the returned labels dict will contain all fields whose name starts with the given prefix

• labels_dict (None) – a dictionary mapping label field names to keys

• dataset_exporter (None) – a fiftyone.utils.data.exporters.DatasetExporter to use to choose appropriate label field(s)

• required (False) – whether at least one matching field must be found

• force_dict (False) – whether to always return a labels dict rather than an individual label field

Returns

a label field or dict mapping label fields to keys

fiftyone.core.collections.get_frame_labels_fields(sample_collection, frame_labels_field=None, frame_labels_prefix=None, frame_labels_dict=None, dataset_exporter=None, required=False, force_dict=False)

Gets the frame label field(s) of the sample collection matching the specified arguments.

Provide one of frame_labels_field, frame_labels_prefix, frame_labels_dict, or dataset_exporter.

Parameters
• sample_collection – a SampleCollection

• frame_labels_field (None) – the name of the frame labels field to export

• frame_labels_prefix (None) – a frame labels field prefix; the returned labels dict will contain all frame-level fields whose name starts with the given prefix

• frame_labels_dict (None) – a dictionary mapping frame-level label field names to keys

• dataset_exporter (None) – a fiftyone.utils.data.exporters.DatasetExporter to use to choose appropriate frame label field(s)

• required (False) – whether at least one matching frame field must be found

• force_dict (False) – whether to always return a labels dict rather than an individual label field

Returns

a frame label field or dict mapping frame label fields to keys