# fiftyone.core.video¶

Video frame views.

Classes:

 FrameView(doc, view[, selected_fields, …]) A frame in a FramesView. FramesView(source_collection, frames_stage, …) A fiftyone.core.view.DatasetView of frames from a video fiftyone.core.dataset.Dataset.

Functions:

 make_frames_dataset(sample_collection[, …]) Creates a dataset that contains one sample per video frame in the collection.
class fiftyone.core.video.FrameView(doc, view, selected_fields=None, excluded_fields=None, filtered_fields=None)

A frame in a FramesView.

FrameView instances should not be created manually; they are generated by iterating over FramesView instances.

Parameters
• doc – a fiftyone.core.odm.DatasetSampleDocument

• view – the FramesView that the frame belongs to

• selected_fields (None) – a set of field names that this view is restricted to

• excluded_fields (None) – a set of field names that are excluded from this view

• filtered_fields (None) – a set of field names of list fields that are filtered in this view

Methods:

 add_labels(labels, label_field[, …]) Adds the given labels to the sample. clear_field(field_name) compute_metadata([skip_failures]) Populates the metadata field of the sample. copy([fields, omit_fields]) Returns a deep copy of the document that has not been added to the database. get_field(field_name) has_field(field_name) Determines whether the document has the given field. iter_fields([include_id]) Returns an iterator over the (name, value) pairs of the fields of the document. merge(sample[, fields, omit_fields, …]) Merges the fields of the given sample into this sample. save() Saves the frame to the database. set_field(field_name, value[, create]) to_dict([include_frames]) Serializes the sample view to a JSON dictionary. to_json([pretty_print]) Serializes the document to a JSON string. to_mongo_dict([include_id]) Serializes the document to a BSON dictionary equivalent to the representation that would be stored in the database. update_fields(fields_dict[, expand_schema]) Sets the dictionary of fields on the document.

Attributes:

 dataset The dataset to which this document belongs, or None if it has not been added to a dataset. excluded_field_names The set of field names that are excluded on this document view, or None if no fields are explicitly excluded. field_names An ordered tuple of field names of this document view. filename The basename of the media’s filepath. filtered_field_names The set of field names or embedded.field.names that have been filtered on this document view, or None if no fields are filtered. id The ID of the document, or None if it has not been added to the database. in_dataset Whether the document has been added to a dataset. ingest_time The time the document was added to the database, or None if it has not been added to the database. media_type The media type of the sample. selected_field_names The set of field names that are selected on this document view, or None if no fields are explicitly selected.
save()

Saves the frame to the database.

Adds the given labels to the sample.

The provided labels can be any of the following:

• A fiftyone.core.labels.Label instance, in which case the labels are directly saved in the specified label_field

• A dict mapping keys to fiftyone.core.labels.Label instances. In this case, the labels are added as follows:

for key, value in labels.items():
sample[label_field + "_" + key] = value

• A dict mapping frame numbers to fiftyone.core.labels.Label instances. In this case, the provided labels are interpreted as frame-level labels that should be added as follows:

sample.frames.merge(
{
frame_number: {label_field: label}
for frame_number, label in labels.items()
}
)

• A dict mapping frame numbers to dicts mapping keys to fiftyone.core.labels.Label instances. In this case, the provided labels are interpreted as frame-level labels that should be added as follows:

sample.frames.merge(
{
frame_number: {
label_field + "_" + name: label
for name, label in frame_dict.items()
}
for frame_number, frame_dict in labels.items()
}
)

Parameters
• labels – a fiftyone.core.labels.Label or dict of labels per the description above

• label_field – the sample field or prefix in which to save the labels

• confidence_thresh (None) – an optional confidence threshold to apply to any applicable labels before saving them

• expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if any fields are not in the dataset schema

clear_field(field_name)

Populates the metadata field of the sample.

Parameters

skip_failures (False) – whether to gracefully continue without raising an error if metadata cannot be computed

copy(fields=None, omit_fields=None)

Returns a deep copy of the document that has not been added to the database.

Parameters
• fields (None) – an optional field or iterable of fields to which to restrict the copy. This can also be a dict mapping existing field names to new field names

• omit_fields (None) – an optional field or iterable of fields to exclude from the copy

Returns

a Document

property dataset

The dataset to which this document belongs, or None if it has not been added to a dataset.

property excluded_field_names

The set of field names that are excluded on this document view, or None if no fields are explicitly excluded.

property field_names

An ordered tuple of field names of this document view.

This may be a subset of all fields of the document if fields have been selected or excluded.

property filename

The basename of the media’s filepath.

property filtered_field_names

The set of field names or embedded.field.names that have been filtered on this document view, or None if no fields are filtered.

get_field(field_name)
has_field(field_name)

Determines whether the document has the given field.

Parameters

field_name – the field name

Returns

True/False

property id

The ID of the document, or None if it has not been added to the database.

property in_dataset

Whether the document has been added to a dataset.

property ingest_time

The time the document was added to the database, or None if it has not been added to the database.

iter_fields(include_id=False)

Returns an iterator over the (name, value) pairs of the fields of the document.

Parameters

include_id (False) – whether to include the id field

Returns

an iterator that emits (name, value) tuples

property media_type

The media type of the sample.

merge(sample, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True)

Merges the fields of the given sample into this sample.

The behavior of this method is highly customizable. By default, all top-level fields from the provided sample are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both samples are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

• Whether new fields can be added to the dataset schema

• Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements

• Whether to merge only specific fields, or all but certain fields

• Mapping input sample fields to different field names of this sample

Parameters
• sample – a fiftyone.core.sample.Sample

• fields (None) – an optional field or iterable of fields to which to restrict the merge. May contain frame fields for video samples. This can also be a dict mapping field names of the input sample to field names of this sample

• omit_fields (None) – an optional field or iterable of fields to exclude from the merge. May contain frame fields for video samples

• merge_lists (True) – whether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided sample

• overwrite (True) – whether to overwrite (True) or skip (False) existing fields and label elements

• expand_schema (True) – whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if any fields are not in the dataset schema

property selected_field_names

The set of field names that are selected on this document view, or None if no fields are explicitly selected.

set_field(field_name, value, create=True)
to_dict(include_frames=False)

Serializes the sample view to a JSON dictionary.

The sample ID and private fields are excluded in this representation.

Parameters

include_frames (False) – whether to include the frame labels for video samples

Returns

a JSON dict

to_json(pretty_print=False)

Serializes the document to a JSON string.

The document ID and private fields are excluded in this representation.

Parameters

pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

Returns

a JSON string

to_mongo_dict(include_id=False)

Serializes the document to a BSON dictionary equivalent to the representation that would be stored in the database.

Parameters

include_id (False) – whether to include the document ID

Returns

a BSON dict

update_fields(fields_dict, expand_schema=True)

Sets the dictionary of fields on the document.

Parameters
• fields_dict – a dict mapping field names to values

• expand_schema (True) – whether to dynamically add new fields encountered to the document schema. If False, an error is raised if any fields are not in the document schema

Raises

AttributeError – if expand_schema == False and a field does not exist

class fiftyone.core.video.FramesView(source_collection, frames_stage, frames_dataset, _stages=None)

A fiftyone.core.view.DatasetView of frames from a video fiftyone.core.dataset.Dataset.

Frames views contain an ordered collection of frames, each of which corresponds to a single frame of a video from the source collection.

Parameters

Methods:

Attributes:

 classes The classes of the underlying dataset. dataset_name The name of the underlying dataset. default_classes The default classes of the underlying dataset. default_mask_targets The default mask targets of the underlying dataset. has_annotation_runs Whether this colection has any annotation runs. has_brain_runs Whether this colection has any brain runs. has_evaluations Whether this colection has any evaluation results. info The info dict of the underlying dataset. mask_targets The mask targets of the underlying dataset. media_type The media type of the underlying dataset. name The name of the view.
property name

The name of the view.

set_values(field_name, *args, **kwargs)

Sets the field or embedded field on each sample or frame in the collection to the given values.

When setting a sample field embedded.field.name, this function is an efficient implementation of the following loop:

for sample, value in zip(sample_collection, values):
sample.embedded.field.name = value
sample.save()

When modifying a sample field that contains an array, say embedded.array.field.name, this function is an efficient implementation of the following loop:

for sample, array_values in zip(sample_collection, values):
for doc, value in zip(sample.embedded.array):
doc.field.name = value

sample.save()

When setting a frame field frames.embedded.field.name, this function is an efficient implementation of the following loop:

for sample, frame_values in zip(sample_collection, values):
for frame, value in zip(sample.frames.values(), frame_values):
frame.embedded.field.name = value

sample.save()

When modifying a frame field that contains an array, say frames.embedded.array.field.name, this function is an efficient implementation of the following loop:

for sample, frame_values in zip(sample_collection, values):
for frame, array_values in zip(sample.frames.values(), frame_values):
for doc, value in zip(frame.embedded.array, array_values):
doc.field.name = value

sample.save()

The dual function of set_values() is values(), which can be used to efficiently extract the values of a field or embedded field of all samples in a collection as lists of values in the same structure expected by this method.

Note

If the values you are setting can be described by a fiftyone.core.expressions.ViewExpression applied to the existing dataset contents, then consider using set_field() + save() for an even more efficient alternative to explicitly iterating over the dataset or calling values() + set_values() to perform the update in-memory.

Examples:

import random

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

#
# Create a new sample field
#

values = [random.random() for _ in range(len(dataset))]
dataset.set_values("random", values)

print(dataset.bounds("random"))

#
# Add a tag to all low confidence labels
#

view = dataset.filter_labels("predictions", F("confidence") < 0.06)

detections = view.values("predictions.detections")
for sample_detections in detections:
for detection in sample_detections:
detection.tags.append("low_confidence")

view.set_values("predictions.detections", detections)

print(dataset.count_label_tags())
Parameters
• field_name – a field or embedded.field.name

• values – an iterable of values, one for each sample in the collection. When setting frame fields, each element should be an iterable of values, one for each frame of the sample. If field_name contains array fields, the corresponding entries of values must be arrays of the same lengths

• skip_none (False) – whether to treat None data in values as missing data that should not be set

• expand_schema (True) – whether to dynamically add new sample/frame fields encountered to the dataset schema. If False, an error is raised if the root field_name does not exist

save(fields=None)

Overwrites the frames in the source dataset with the contents of the view.

Warning

This will permanently delete any omitted or filtered contents from the source dataset.

Parameters

fields (None) – an optional field or list of fields to save. If specified, only these fields are overwritten

Reloads this view from the frames of the source collection in the database.

Note that FrameView instances are not singletons, so any in-memory frames extracted from this view will not be updated by calling this method.

Applies the given fiftyone.core.stages.ViewStage to the collection.

Parameters

stage – a fiftyone.core.stages.ViewStage

Returns
aggregate(aggregations)

Aggregates one or more fiftyone.core.aggregations.Aggregation instances.

Note that it is best practice to group aggregations into a single call to aggregate(), as this will be more efficient than performing multiple aggregations in series.

Parameters

aggregations – an fiftyone.core.aggregations.Aggregation or iterable of fiftyone.core.aggregations.Aggregation instances

Returns

an aggregation result or list of aggregation results corresponding to the input aggregation(s)

annotate(anno_key, label_schema=None, label_field=None, label_type=None, classes=None, attributes=True, media_field='filepath', backend=None, launch_editor=False, **kwargs)

Exports the samples and optional label field(s) in this collection to the given annotation backend.

The backend parameter controls which annotation backend to use. Depending on the backend you use, you may want/need to provide extra keyword arguments to this function for the constructor of the backend’s fiftyone.utils.annotations.AnnotationBackendConfig class.

The natively provided backends and their associated config classes are:

Parameters
• anno_key – a string key to use to refer to this annotation run

• label_schema (None) – a dictionary defining the label schema to use. If this argument is provided, it takes precedence over the other schema-related arguments

• label_field (None) – a string indicating a new or existing label field to annotate

• label_type (None) –

a string or type indicating the type of labels to annotate. The possible label strings/types are:

You can also specify "scalar" for a primitive scalar field or pass any of the supported scalar field types:

All new label fields must have their type specified via this argument or in label_schema. Note that annotation backends may not support all label types

• classes (None) – a list of strings indicating the class options for label_field or all fields in label_schema without classes specified. All new label fields must have a class list provided via one of the supported methods. For existing label fields, if classes are not provided by this argument nor label_schema, they are parsed from classes() or default_classes()

• attributes (True) –

specifies the label attributes of each label field to include (other than their label, which is always included) in the annotation export. Can be any of the following:

• True: export all label attributes

• False: don’t export any custom label attributes

• a list of label attributes to export

• a dict mapping attribute names to dicts specifying the type, values, and default for each attribute

If provided, this parameter will apply to all label fields in label_schema that do not define their attributes

• media_field ("filepath") – the field containing the paths to the media files to upload

• backend (None) – the annotation backend to use. The supported values are fiftyone.annotation_config.backends.keys() and the default is fiftyone.annotation_config.default_backend

• launch_editor (False) – whether to launch the annotation backend’s editor after uploading the samples

• **kwargs – keyword arguments for the fiftyone.utils.annotations.AnnotationBackendConfig

Returns

an fiftyone.utils.annotations.AnnnotationResults

apply_model(model, label_field='predictions', confidence_thresh=None, store_logits=False, batch_size=None, num_workers=None, skip_failures=True, **trainer_kwargs)

Applies the FiftyOne model or Lightning Flash model to the samples in the collection.

This method supports all of the following cases:

Parameters
• label_field ("predictions") – the name of the field in which to store the model predictions. When performing inference on video frames, the “frames.” prefix is optional

• confidence_thresh (None) – an optional confidence threshold to apply to any applicable labels generated by the model

• store_logits (False) – whether to store logits for the model predictions. This is only supported when the provided model has logits, model.has_logits == True

• batch_size (None) – an optional batch size to use, if the model supports batching

• num_workers (None) – the number of workers for the torch.utils.data.DataLoader to use. Only applicable for Torch-based models

• skip_failures (True) – whether to gracefully continue without raising an error if predictions cannot be generated for a sample. Only applicable to fiftyone.core.models.Model instances

• **trainer_kwargs – optional keyword arguments used to initialize the Trainer when using Flash models. These can be used to, for example, configure the number of GPUs to use and other distributed inference parameters

bounds(field_or_expr, expr=None)

Computes the bounds of a numeric field of the collection.

None-valued fields are ignored.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Compute the bounds of a numeric field
#

bounds = dataset.bounds("numeric_field")
print(bounds)  # (min, max)

#
# Compute the a bounds of a numeric list field
#

bounds = dataset.bounds("numeric_list_field")
print(bounds)  # (min, max)

#
# Compute the bounds of a transformation of a numeric field
#

bounds = dataset.bounds(2 * (F("numeric_field") + 1))
print(bounds)  # (min, max)
Parameters
Returns

the (min, max) bounds

property classes

The classes of the underlying dataset.

clear_frame_field(field_name)

Clears the values of the frame-level field from all samples in the view.

The field will remain in the dataset’s frame schema, and all frames in the view will have the value None for the field.

You can use dot notation (embedded.field.name) to clear embedded frame fields.

Only applicable to video datasets.

Note

This method is not a fiftyone.core.stages.ViewStage; it immediately writes the requested changes to the underlying dataset.

Warning

If the field name you specify is an embedded field, be aware that this operation will save the entire top-level field after clearing the field, which may result in data modification/loss if this view modifies the field in any other ways.

Parameters

field_name – the field name or embedded.field.name

clear_frame_fields(field_names)

Clears the values of the frame-level fields from all samples in the view.

The fields will remain in the dataset’s frame schema, and all frames in the view will have the value None for the fields.

You can use dot notation (embedded.field.name) to clear embedded frame fields.

Only applicable to video datasets.

Note

This method is not a fiftyone.core.stages.ViewStage; it immediately writes the requested changes to the underlying dataset.

Warning

If any of the field names you specify are embedded fields, be aware that this operation will save the entire top-level field after clearing the fields, which may result in data modification/loss if this view modifies these fields in any other ways.

Parameters

field_names – the field name or iterable of field names

clear_sample_field(field_name)

Clears the values of the field from all samples in the view.

The field will remain in the dataset’s schema, and all samples in the view will have the value None for the field.

You can use dot notation (embedded.field.name) to clear embedded fields.

Note

This method is not a fiftyone.core.stages.ViewStage; it immediately writes the requested changes to the underlying dataset.

Warning

If the field name you specify is an embedded field, be aware that this operation will save the entire top-level field after clearing the field, which may result in data modification/loss if this view modifies the field in any other ways.

Parameters

field_name – the field name or embedded.field.name

clear_sample_fields(field_names)

Clears the values of the fields from all samples in the view.

The fields will remain in the dataset’s schema, and all samples in the view will have the value None for the fields.

You can use dot notation (embedded.field.name) to clear embedded fields.

Note

This method is not a fiftyone.core.stages.ViewStage; it immediately writes the requested changes to the underlying dataset.

Warning

If any of the field names you specify are embedded fields, be aware that this operation will save the entire top-level field after clearing the fields, which may result in data modification/loss if this view modifies these fields in any other ways.

Parameters

field_names – the field name or iterable of field names

clone(name=None)

Creates a new dataset containing only the contents of the view.

Parameters

name (None) – a name for the cloned dataset. By default, get_default_dataset_name() is used

Returns

the new Dataset

clone_frame_field(field_name, new_field_name)

Clones the frame-level field of the view into a new field.

You can use dot notation (embedded.field.name) to clone embedded frame fields.

Only applicable to video datasets.

Note

This method is not a fiftyone.core.stages.ViewStage; it immediately writes the requested changes to the underlying dataset.

Warning

If new_field_name is an embedded field, be aware that this operation will save the entire top-level field of new_field_name after performing the clone, which may result in data modification/loss if this view modifies this field in any other ways.

Parameters
• field_name – the field name or embedded.field.name

• new_field_name – the new field name or embedded.field.name

clone_frame_fields(field_mapping)

Clones the frame-level fields of the view into new frame-level fields of the dataset.

You can use dot notation (embedded.field.name) to clone embedded frame fields.

Only applicable to video datasets.

Note

This method is not a fiftyone.core.stages.ViewStage; it immediately writes the requested changes to the underlying dataset.

Warning

If any of the new field names to specify are embedded fields, be aware that this operation will save the entire top-level new fields after performing the clone, which may result in data modification/loss if this view modifies these fields in any other ways.

Parameters

field_mapping – a dict mapping field names to new field names into which to clone each field

clone_sample_field(field_name, new_field_name)

Clones the given sample field of the view into a new field of the dataset.

You can use dot notation (embedded.field.name) to clone embedded fields.

Note

This method is not a fiftyone.core.stages.ViewStage; it immediately writes the requested changes to the underlying dataset.

Warning

If new_field_name is an embedded field, be aware that this operation will save the entire top-level field of new_field_name after performing the clone, which may result in data modification/loss if this view modifies this field in any other ways.

Parameters
• field_name – the field name or embedded.field.name

• new_field_name – the new field name or embedded.field.name

clone_sample_fields(field_mapping)

Clones the given sample fields of the view into new fields of the dataset.

You can use dot notation (embedded.field.name) to clone embedded fields.

Note

This method is not a fiftyone.core.stages.ViewStage; it immediately writes the requested changes to the underlying dataset.

Warning

If any of the new field names to specify are embedded fields, be aware that this operation will save the entire top-level new fields after performing the clone, which may result in data modification/loss if this view modifies these fields in any other ways.

Parameters

field_mapping – a dict mapping field names to new field names into which to clone each field

compute_embeddings(model, embeddings_field=None, batch_size=None, num_workers=None, skip_failures=True, **trainer_kwargs)

Computes embeddings for the samples in the collection using the given FiftyOne model or Lightning Flash model.

This method supports all the following cases:

When using a FiftyOne model, the model must expose embeddings, i.e., fiftyone.core.models.Model.has_embeddings() must return True.

If an embeddings_field is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.

Parameters
• embeddings_field (None) – the name of a field in which to store the embeddings. When computing video frame embeddings, the “frames.” prefix is optional

• batch_size (None) – an optional batch size to use, if the model supports batching

• num_workers (None) – the number of workers for the torch.utils.data.DataLoader to use. Only applicable for Torch-based models

• skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample. Only applicable to fiftyone.core.models.Model instances

• **trainer_kwargs – optional keyword arguments used to initialize the Trainer when using Flash models. These can be used to, for example, configure the number of GPUs to use and other distributed inference parameters

Returns

• None, if an embeddings_field is provided

• a num_samples x num_dim array of embeddings, when computing embeddings for image/video collections with image/video models, respectively, and no embeddings_field is provided. If skip_failures is True and any errors are detected, a list of length num_samples is returned instead containing all successfully computed embedding vectors along with None entries for samples for which embeddings could not be computed

• a dictionary mapping sample IDs to num_frames x num_dim arrays of embeddings, when computing frame embeddings for video collections using an image model. If skip_failures is True and any errors are detected, the values of this dictionary will contain arrays of embeddings for all frames 1, 2, … until the error occurred, or None if no embeddings were computed at all

Return type

one of the following

Populates the metadata field of all samples in the collection.

Any samples with existing metadata are skipped, unless overwrite == True.

Parameters
• overwrite (False) – whether to overwrite existing metadata

• num_workers (None) – the number of processes to use. By default, multiprocessing.cpu_count() is used

• skip_failures (True) – whether to gracefully continue without raising an error if metadata cannot be computed for a sample

compute_patch_embeddings(model, patches_field, embeddings_field=None, force_square=False, alpha=None, handle_missing='skip', batch_size=None, num_workers=None, skip_failures=True)

Computes embeddings for the image patches defined by patches_field of the samples in the collection using the given fiftyone.core.models.Model.

This method supports all the following cases:

• Using an image model to compute patch embeddings for an image collection

• Using an image model to compute frame patch embeddings for a video collection

The model must expose embeddings, i.e., fiftyone.core.models.Model.has_embeddings() must return True.

If an embeddings_field is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.

Parameters
• model – a fiftyone.core.models.Model

• patches_field – the name of the field defining the image patches in each sample to embed. Must be of type fiftyone.core.labels.Detection, fiftyone.core.labels.Detections, fiftyone.core.labels.Polyline, or fiftyone.core.labels.Polylines. When computing video frame embeddings, the “frames.” prefix is optional

• embeddings_field (None) – the name of a field in which to store the embeddings. When computing video frame embeddings, the “frames.” prefix is optional

• force_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction

• alpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in [-1, inf). If provided, the length and width of the box are expanded (or contracted, when alpha < 0) by (100 * alpha)%. For example, set alpha = 1.1 to expand the boxes by 10%, and set alpha = 0.9 to contract the boxes by 10%

• handle_missing ("skip") –

how to handle images with no patches. Supported values are:

• ”skip”: skip the image and assign its embedding as None

• ”image”: use the whole image as a single patch

• ”error”: raise an error

• batch_size (None) – an optional batch size to use, if the model supports batching

• num_workers (None) – the number of workers for the torch.utils.data.DataLoader to use. Only applicable for Torch-based models

• skip_failures (True) – whether to gracefully continue without raising an error if embeddings cannot be generated for a sample

Returns

• None, if an embeddings_field is provided

• a dict mapping sample IDs to num_patches x num_dim arrays of patch embeddings, when computing patch embeddings for image collections and no embeddings_field is provided. If skip_failures is True and any errors are detected, this dictionary will contain None values for any samples for which embeddings could not be computed

• a dict of dicts mapping sample IDs to frame numbers to num_patches x num_dim arrays of patch embeddings, when computing patch embeddings for the frames of video collections and no embeddings_field is provided. If skip_failures is True and any errors are detected, this nested dict will contain missing or None values to indicate uncomputable embeddings

Return type

one of the following

count(field_or_expr=None, expr=None)

Counts the number of field values in the collection.

None-valued fields are ignored.

If no field is provided, the samples themselves are counted.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="rabbit"),
fo.Detection(label="squirrel"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Count the number of samples in the dataset
#

count = dataset.count()
print(count)  # the count

#
# Count the number of samples with predictions
#

count = dataset.count("predictions")
print(count)  # the count

#
# Count the number of objects in the predictions field
#

count = dataset.count("predictions.detections")
print(count)  # the count

#
# Count the number of objects in samples with > 2 predictions
#

count = dataset.count(
(F("predictions.detections").length() > 2).if_else(
F("predictions.detections"), None
)
)
print(count)  # the count
Parameters
• field_or_expr (None) –

a field name, embedded.field.name, fiftyone.core.expressions.ViewExpression, or MongoDB expression defining the field or expression to aggregate. If neither field_or_expr or expr is provided, the samples themselves are counted. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returned

• expr (None) –

a fiftyone.core.expressions.ViewExpression or MongoDB expression to apply to field_or_expr (which must be a field) before aggregating

Returns

the count

count_label_tags(label_fields=None)

Counts the occurrences of all label tags in the specified label field(s) of this collection.

Parameters

label_fields (None) – an optional name or iterable of names of fiftyone.core.labels.Label fields. By default, all label fields are used

Returns

a dict mapping tags to counts

count_sample_tags()

Counts the occurrences of sample tags in this collection.

Returns

a dict mapping tags to counts

count_values(field_or_expr, expr=None)

Counts the occurrences of field values in the collection.

This aggregation is typically applied to countable field types (or lists of such types):

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
tags=["sunny"],
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
tags=["cloudy"],
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Compute the tag counts in the dataset
#

counts = dataset.count_values("tags")
print(counts)  # dict mapping values to counts

#
# Compute the predicted label counts in the dataset
#

counts = dataset.count_values("predictions.detections.label")
print(counts)  # dict mapping values to counts

#
# Compute the predicted label counts after some normalization
#

counts = dataset.count_values(
F("predictions.detections.label").map_values(
{"cat": "pet", "dog": "pet"}
).upper()
)
print(counts)  # dict mapping values to counts
Parameters
Returns

a dict mapping values to counts

create_index(field_or_spec, unique=False, **kwargs)

Creates an index on the given field or with the given specification, if necessary.

Indexes enable efficient sorting, merging, and other such operations.

Frame-level fields can be indexed by prepending "frames." to the field name.

If you are indexing a single field and it already has a unique constraint, it will be retained regardless of the unique value you specify. Conversely, if the given field already has a non-unique index but you requested a unique index, the existing index will be replaced with a unique index. Use drop_index() to drop an existing index first if you wish to modify an existing index in other ways.

Parameters
Returns

the name of the index

property dataset_name

The name of the underlying dataset.

property default_classes

The default classes of the underlying dataset.

The default mask targets of the underlying dataset.

delete_annotation_run(anno_key)

Deletes the annotation run with the given key from this collection.

Calling this method only deletes the record of the annotation run from the collection; it will not delete any annotations loaded onto your dataset via load_annotations(), nor will it delete any associated information from the annotation backend.

Use load_annotation_results() to programmatically manage/delete a run from the annotation backend.

Parameters

anno_key – an annotation key

delete_annotation_runs()

Deletes all annotation runs from this collection.

Calling this method only deletes the records of the annotation runs from this collection; it will not delete any annotations loaded onto your dataset via load_annotations(), nor will it delete any associated information from the annotation backend.

Use load_annotation_results() to programmatically manage/delete runs in the annotation backend.

delete_brain_run(brain_key)

Deletes the brain method run with the given key from this collection.

Parameters

brain_key – a brain key

delete_brain_runs()

Deletes all brain method runs from this collection.

delete_evaluation(eval_key)

Deletes the evaluation results associated with the given evaluation key from this collection.

Parameters

eval_key – an evaluation key

delete_evaluations()

Deletes all evaluation results from this collection.

distinct(field_or_expr, expr=None)

Computes the distinct values of a field in the collection.

None-valued fields are ignored.

This aggregation is typically applied to countable field types (or lists of such types):

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
tags=["sunny"],
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="dog"),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
tags=["sunny", "cloudy"],
predictions=fo.Detections(
detections=[
fo.Detection(label="cat"),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Get the distinct tags in a dataset
#

values = dataset.distinct("tags")
print(values)  # list of distinct values

#
# Get the distinct predicted labels in a dataset
#

values = dataset.distinct("predictions.detections.label")
print(values)  # list of distinct values

#
# Get the distinct predicted labels after some normalization
#

values = dataset.distinct(
F("predictions.detections.label").map_values(
{"cat": "pet", "dog": "pet"}
).upper()
)
print(values)  # list of distinct values
Parameters
Returns

a sorted list of distinct values

draw_labels(output_dir, label_fields=None, overwrite=False, config=None)

Renders annotated versions of the media in the collection with the specified label data overlaid to the given directory.

The filenames of the sample media are maintained, unless a name conflict would occur in output_dir, in which case an index of the form "-%d" % count is appended to the base filename.

Images are written in format fo.config.default_image_ext, and videos are written in format fo.config.default_video_ext.

Parameters
Returns

the list of paths to the rendered media

drop_index(field_or_name)

Drops the index for the given field or name.

Parameters

field_or_name – a field name, embedded.field.name, or compound index name. Use list_indexes() to see the available indexes

evaluate_classifications(pred_field, gt_field='ground_truth', eval_key=None, classes=None, missing=None, method='simple', **kwargs)

Evaluates the classification predictions in this collection with respect to the specified ground truth labels.

By default, this method simply compares the ground truth and prediction for each sample, but other strategies such as binary evaluation and top-k matching can be configured via the method parameter.

You can customize the evaluation method by passing additional parameters for the method’s config class as kwargs.

The supported method values and their associated configs are:

If an eval_key is specified, then this method will record some statistics on each sample:

• When evaluating sample-level fields, an eval_key field will be populated on each sample recording whether that sample’s prediction is correct.

• When evaluating frame-level fields, an eval_key field will be populated on each frame recording whether that frame’s prediction is correct. In addition, an eval_key field will be populated on each sample that records the average accuracy of the frame predictions of the sample.

Parameters
Returns
evaluate_detections(pred_field, gt_field='ground_truth', eval_key=None, classes=None, missing=None, method='coco', iou=0.5, use_masks=False, use_boxes=False, classwise=True, **kwargs)

Evaluates the specified predicted detections in this collection with respect to the specified ground truth detections.

This method supports evaluating the following spatial data types:

By default, this method uses COCO-style evaluation, but you can use the method parameter to select a different method, and you can optionally customize the method by passing additional parameters for the method’s config class as kwargs.

The supported method values and their associated configs are:

If an eval_key is provided, a number of fields are populated at the object- and sample-level recording the results of the evaluation:

• True positive (TP), false positive (FP), and false negative (FN) counts for the each sample are saved in top-level fields of each sample:

TP: sample.<eval_key>_tp
FP: sample.<eval_key>_fp
FN: sample.<eval_key>_fn

In addition, when evaluating frame-level objects, TP/FP/FN counts are recorded for each frame:

TP: frame.<eval_key>_tp
FP: frame.<eval_key>_fp
FN: frame.<eval_key>_fn

• The fields listed below are populated on each individual object; these fields tabulate the TP/FP/FN status of the object, the ID of the matching object (if any), and the matching IoU:

TP/FP/FN: object.<eval_key>
ID: object.<eval_key>_id
IoU: object.<eval_key>_iou

Parameters
Returns
evaluate_segmentations(pred_field, gt_field='ground_truth', eval_key=None, mask_targets=None, method='simple', **kwargs)

Evaluates the specified semantic segmentation masks in this collection with respect to the specified ground truth masks.

If the size of a predicted mask does not match the ground truth mask, it is resized to match the ground truth.

By default, this method simply performs pixelwise evaluation of the full masks, but other strategies such as boundary-only evaluation can be configured by passing additional parameters for the method’s config class as kwargs.

The supported method values and their associated configs are:

If an eval_key is provided, the accuracy, precision, and recall of each sample is recorded in top-level fields of each sample:

Accuracy: sample.<eval_key>_accuracy
Precision: sample.<eval_key>_precision
Recall: sample.<eval_key>_recall

In addition, when evaluating frame-level masks, the accuracy, precision, and recall of each frame if recorded in the following frame-level fields:

Accuracy: frame.<eval_key>_accuracy
Precision: frame.<eval_key>_precision
Recall: frame.<eval_key>_recall

Note

The mask value 0 is treated as a background class for the purposes of computing evaluation metrics like precision and recall.

Parameters
Returns
exclude(sample_ids)

Excludes the samples with the given IDs from the collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(filepath="/path/to/image1.png"),
fo.Sample(filepath="/path/to/image2.png"),
fo.Sample(filepath="/path/to/image3.png"),
]
)

#
# Exclude the first sample from the dataset
#

sample_id = dataset.first().id
view = dataset.exclude(sample_id)

#
# Exclude the first and last samples from the dataset
#

sample_ids = [dataset.first().id, dataset.last().id]
view = dataset.exclude(sample_ids)
Parameters

sample_ids

the samples to exclude. Can be any of the following:

Returns
exclude_by(field, values)

Excludes the samples with the given field values from the collection.

This stage is typically used to work with categorical fields (strings, ints, and bools). If you want to exclude samples based on floating point fields, use match().

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(filepath="image%d.jpg" % i, int=i, str=str(i))
for i in range(10)
]
)

#
# Create a view excluding samples whose int field have the given
# values
#

view = dataset.exclude_by("int", [1, 9, 3, 7, 5])

#
# Create a view excluding samples whose str field have the given
# values
#

view = dataset.exclude_by("str", ["1", "9", "3", "7", "5"])
Parameters
• field – a field or embedded.field.name

• values – a value or iterable of values to exclude by

Returns
exclude_fields(field_names, _allow_missing=False)

Excludes the fields with the given names from the samples in the collection.

Note that default fields cannot be excluded.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
predictions=fo.Classification(label="cat", confidence=0.9),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
predictions=fo.Classification(label="dog", confidence=0.8),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
predictions=None,
),
]
)

#
# Exclude the predictions field from all samples
#

view = dataset.exclude_fields("predictions")
Parameters

field_names – a field name or iterable of field names to exclude

Returns
exclude_frames(frame_ids, omit_empty=True)

Excludes the frames with the given IDs from the video collection.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Exclude some specific frames
#

frame_ids = [
dataset.first().frames.first().id,
dataset.last().frames.last().id,
]

view = dataset.exclude_frames(frame_ids)

print(dataset.count("frames"))
print(view.count("frames"))
Parameters
Returns
exclude_labels(labels=None, ids=None, tags=None, fields=None, omit_empty=True)

Excludes the specified labels from the collection.

The returned view will omit samples, sample fields, and individual labels that do not match the specified selection criteria.

You can perform an exclusion via one or more of the following methods:

• Provide the labels argument, which should contain a list of dicts in the format returned by fiftyone.core.session.Session.selected_labels(), to exclude specific labels

• Provide the ids argument to exclude labels with specific IDs

• Provide the tags argument to exclude labels with specific tags

If multiple criteria are specified, labels must match all of them in order to be excluded.

By default, the exclusion is applied to all fiftyone.core.labels.Label fields, but you can provide the fields argument to explicitly define the field(s) in which to exclude.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Exclude the labels currently selected in the App
#

session = fo.launch_app(dataset)

# Select some labels in the App...

view = dataset.exclude_labels(labels=session.selected_labels)

#
# Exclude labels with the specified IDs
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

view = dataset.exclude_labels(ids=ids)

print(dataset.count("ground_truth.detections"))
print(view.count("ground_truth.detections"))

print(dataset.count("predictions.detections"))
print(view.count("predictions.detections"))

#
# Exclude labels with the specified tags
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

# Give the labels a "test" tag
dataset = dataset.clone()  # create copy since we're modifying data
dataset.select_labels(ids=ids).tag_labels("test")

print(dataset.count_values("ground_truth.detections.tags"))
print(dataset.count_values("predictions.detections.tags"))

# Exclude the labels via their tag
view = dataset.exclude_labels(tags="test")

print(dataset.count("ground_truth.detections"))
print(view.count("ground_truth.detections"))

print(dataset.count("predictions.detections"))
print(view.count("predictions.detections"))
Parameters
• labels (None) – a list of dicts specifying the labels to exclude in the format returned by fiftyone.core.session.Session.selected_labels()

• ids (None) – an ID or iterable of IDs of the labels to exclude

• tags (None) – a tag or iterable of tags of labels to exclude

• fields (None) – a field or iterable of fields from which to exclude

• omit_empty (True) – whether to omit samples that have no labels after filtering

Returns
exists(field, bool=True)

Returns a view containing the samples in the collection that have (or do not have) a non-None value for the given field or embedded field.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
predictions=fo.Classification(label="cat", confidence=0.9),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
predictions=fo.Classification(label="dog", confidence=0.8),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=fo.Classification(label="dog"),
predictions=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image4.png",
ground_truth=None,
predictions=None,
),
fo.Sample(filepath="/path/to/image5.png"),
]
)

#
# Only include samples that have a value in their predictions
# field
#

view = dataset.exists("predictions")

#
# Only include samples that do NOT have a value in their
# predictions field
#

view = dataset.exists("predictions", False)

#
# Only include samples that have prediction confidences
#

view = dataset.exists("predictions.confidence")
Parameters
• field – the field name or embedded.field.name

• bool (True) – whether to check if the field exists (True) or does not exist (False)

Returns
export(export_dir=None, dataset_type=None, data_path=None, labels_path=None, export_media=None, dataset_exporter=None, label_field=None, frame_labels_field=None, overwrite=False, **kwargs)

Exports the samples in the collection to disk.

You can perform exports with this method via the following basic patterns:

1. Provide export_dir and dataset_type to export the content to a directory in the default layout for the specified format, as documented in this page

2. Provide dataset_type along with data_path, labels_path, and/or export_media to directly specify where to export the source media and/or labels (if applicable) in your desired format. This syntax provides the flexibility to, for example, perform workflows like labels-only exports

3. Provide a dataset_exporter to which to feed samples to perform a fully-customized export

In all workflows, the remaining parameters of this method can be provided to further configure the export.

See this guide for more details about exporting datasets in custom formats by defining your own fiftyone.utils.data.exporters.DatasetExporter.

This method will automatically coerce the data to match the requested export in the following cases:

Parameters
• export_dir (None) –

the directory to which to export the samples in format dataset_type. This parameter may be omitted if you have provided appropriate values for the data_path and/or labels_path parameters. Alternatively, this can also be an archive path with one of the following extensions:

.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz

If an archive path is specified, the export is performed in a directory of same name (minus extension) and then automatically archived and the directory then deleted

• dataset_type (None) – the fiftyone.types.dataset_types.Dataset type to write. If not specified, the default type for label_field is used

• data_path (None) –

an optional parameter that enables explicit control over the location of the exported media for certain export formats. Can be any of the following:

• a folder name like "data" or "data/" specifying a subfolder of export_dir in which to export the media

• an absolute directory path in which to export the media. In this case, the export_dir has no effect on the location of the data

• a filename like "data.json" specifying the filename of a JSON manifest file in export_dir generated when export_media is "manifest"

• an absolute filepath specifying the location to write the JSON manifest file when export_media is "manifest". In this case, export_dir has no effect on the location of the data

If None, a default value of this parameter will be chosen based on the value of the export_media parameter. Note that this parameter is not applicable to certain export formats such as binary types like TF records

• labels_path (None) –

an optional parameter that enables explicit control over the location of the exported labels. Only applicable when exporting in certain labeled dataset formats. Can be any of the following:

• a type-specific folder name like "labels" or "labels/" or a filename like "labels.json" or "labels.xml" specifying the location in export_dir in which to export the labels

• an absolute directory or filepath in which to export the labels. In this case, the export_dir has no effect on the location of the labels

For labeled datasets, the default value of this parameter will be chosen based on the export format so that the labels will be exported into export_dir

• export_media (None) –

controls how to export the raw media. The supported values are:

• True: copy all media files into the output directory

• False: don’t export media. This option is only useful when exporting labeled datasets whose label format stores sufficient information to locate the associated media

• "move": move all media files into the output directory

• "symlink": create symlinks to the media files in the output directory

• "manifest": create a data.json in the output directory that maps UUIDs used in the labels files to the filepaths of the source media, rather than exporting the actual media

If None, an appropriate default value of this parameter will be chosen based on the value of the data_path parameter. Note that some dataset formats may not support certain values for this parameter (e.g., when exporting in binary formats such as TF records, “symlink” is not an option)

• dataset_exporter (None) – a fiftyone.utils.data.exporters.DatasetExporter to use to export the samples. When provided, parameters such as export_dir, dataset_type, data_path, and labels_path have no effect

• label_field (None) –

controls the label field(s) to export. Only applicable to labeled image datasets or labeled video datasets with sample-level labels. Can be any of the following:

• the name of a label field to export

• a glob pattern of label field(s) to export

• a list or tuple of label field(s) to export

• a dictionary mapping label field names to keys to use when constructing the label dictionaries to pass to the exporter

Note that multiple fields can only be specified when the exporter used can handle dictionaries of labels. By default, the first field of compatible type for the exporter is used

• frame_labels_field (None) –

controls the frame label field(s) to export. Only applicable to labeled video datasets. Can be any of the following:

• the name of a frame label field to export

• a glob pattern of frame label field(s) to export

• a list or tuple of frame label field(s) to export

• a dictionary mapping frame label field names to keys to use when constructing the frame label dictionaries to pass to the exporter

Note that multiple fields can only be specified when the exporter used can handle dictionaries of frame labels. By default, the first field of compatible type for the exporter is used

• overwrite (False) – whether to delete existing directories before performing the export (True) or to merge the export with existing files and directories (False). Not applicable when a dataset_exporter was provided

• **kwargs – optional keyword arguments to pass to the dataset exporter’s constructor. If you are exporting image patches, this can also contain keyword arguments for fiftyone.utils.patches.ImagePatchesExtractor

filter_classifications(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Classification elements in the specified fiftyone.core.labels.Classifications field of each sample in the collection.

Warning

This method is deprecated and will be removed in a future release. Use the drop-in replacement filter_labels() instead.

Parameters
Returns
filter_detections(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Detection elements in the specified fiftyone.core.labels.Detections field of each sample in the collection.

Warning

This method is deprecated and will be removed in a future release. Use the drop-in replacement filter_labels() instead.

Parameters
Returns
filter_field(field, filter, only_matches=True)

Filters the values of a field or embedded field of each sample in the collection.

Values of field for which filter returns False are replaced with None.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
predictions=fo.Classification(label="cat", confidence=0.9),
numeric_field=1.0,
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
predictions=fo.Classification(label="dog", confidence=0.8),
numeric_field=-1.0,
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
predictions=None,
numeric_field=None,
),
]
)

#
# Only include classifications in the predictions field
# whose label is "cat"
#

view = dataset.filter_field("predictions", F("label") == "cat")

#
# Only include samples whose numeric_field value is positive
#

view = dataset.filter_field("numeric_field", F() > 0)
Parameters
• field – the field name or embedded.field.name

• filter

a fiftyone.core.expressions.ViewExpression or MongoDB expression that returns a boolean describing the filter to apply

• only_matches (True) – whether to only include samples that match the filter (True) or include all samples (False)

Returns
filter_keypoints(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Keypoint elements in the specified fiftyone.core.labels.Keypoints field of each sample in the collection.

Warning

This method is deprecated and will be removed in a future release. Use the drop-in replacement filter_labels() instead.

Parameters
Returns
filter_labels(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Label field of each sample in the collection.

If the specified field is a single fiftyone.core.labels.Label type, fields for which filter returns False are replaced with None:

If the specified field is a fiftyone.core.labels.Label list type, the label elements for which filter returns False are omitted from the view:

Classifications Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Classification(label="cat", confidence=0.9),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Classification(label="dog", confidence=0.8),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=fo.Classification(label="rabbit"),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Only include classifications in the predictions field whose
# confidence is greater than 0.8
#

view = dataset.filter_labels("predictions", F("confidence") > 0.8)

#
# Only include classifications in the predictions field whose
# label is "cat" or "dog"
#

view = dataset.filter_labels(
"predictions", F("label").is_in(["cat", "dog"])
)

Detections Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="squirrel",
bounding_box=[0.25, 0.25, 0.5, 0.5],
confidence=0.5,
),
]
),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Only include detections in the predictions field whose
# confidence is greater than 0.8
#

view = dataset.filter_labels("predictions", F("confidence") > 0.8)

#
# Only include detections in the predictions field whose label
# is "cat" or "dog"
#

view = dataset.filter_labels(
"predictions", F("label").is_in(["cat", "dog"])
)

#
# Only include detections in the predictions field whose bounding
# box area is smaller than 0.2
#

# Bboxes are in [top-left-x, top-left-y, width, height] format
bbox_area = F("bounding_box")[2] * F("bounding_box")[3]

view = dataset.filter_labels("predictions", bbox_area < 0.2)

Polylines Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Polylines(
polylines=[
fo.Polyline(
label="lane",
points=[[(0.1, 0.1), (0.1, 0.6)]],
filled=False,
),
fo.Polyline(
points=[[(0.2, 0.2), (0.5, 0.5), (0.2, 0.5)]],
filled=True,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Polylines(
polylines=[
fo.Polyline(
label="lane",
points=[[(0.4, 0.4), (0.9, 0.4)]],
filled=False,
),
fo.Polyline(
points=[[(0.6, 0.6), (0.9, 0.9), (0.6, 0.9)]],
filled=True,
),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Only include polylines in the predictions field that are filled
#

view = dataset.filter_labels("predictions", F("filled") == True)

#
# Only include polylines in the predictions field whose label
# is "lane"
#

view = dataset.filter_labels("predictions", F("label") == "lane")

#
# Only include polylines in the predictions field with at least
# 3 vertices
#

num_vertices = F("points").map(F().length()).sum()
view = dataset.filter_labels("predictions", num_vertices >= 3)

Keypoints Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Keypoint(
label="house",
points=[(0.1, 0.1), (0.1, 0.9), (0.9, 0.9), (0.9, 0.1)],
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Keypoint(
label="window",
points=[(0.4, 0.4), (0.5, 0.5), (0.6, 0.6)],
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=None,
),
]
)

#
# Only include keypoints in the predictions field whose label
# is "house"
#

view = dataset.filter_labels("predictions", F("label") == "house")

#
# Only include keypoints in the predictions field with less than
# four points
#

view = dataset.filter_labels("predictions", F("points").length() < 4)
Parameters
• field – the label field to filter

• filter

a fiftyone.core.expressions.ViewExpression or MongoDB expression that returns a boolean describing the filter to apply

• only_matches (True) – whether to only include samples with at least one label after filtering (True) or include all samples (False)

Returns
filter_polylines(field, filter, only_matches=True)

Filters the fiftyone.core.labels.Polyline elements in the specified fiftyone.core.labels.Polylines field of each sample in the collection.

Warning

This method is deprecated and will be removed in a future release. Use the drop-in replacement filter_labels() instead.

Parameters
Returns
first()

Returns the first sample in the collection.

Returns
geo_near(point, location_field=None, min_distance=None, max_distance=None, query=None)

Sorts the samples in the collection by their proximity to a specified geolocation.

Note

This stage must be the first stage in any fiftyone.core.view.DatasetView in which it appears.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

TIMES_SQUARE = [-73.9855, 40.7580]

#
# Sort the samples by their proximity to Times Square
#

view = dataset.geo_near(TIMES_SQUARE)

#
# Sort the samples by their proximity to Times Square, and only
# include samples within 5km
#

view = dataset.geo_near(TIMES_SQUARE, max_distance=5000)

#
# Sort the samples by their proximity to Times Square, and only
# include samples that are in Manhattan
#

import fiftyone.utils.geojson as foug

in_manhattan = foug.geo_within(
"location.point",
[
[
[-73.949701, 40.834487],
[-73.896611, 40.815076],
[-73.998083, 40.696534],
[-74.031751, 40.715273],
[-73.949701, 40.834487],
]
]
)

view = dataset.geo_near(
TIMES_SQUARE, location_field="location", query=in_manhattan
)
Parameters
• point

the reference point to compute distances to. Can be any of the following:

• location_field (None) –

the location data of each sample to use. Can be any of the following:

• The name of a fiftyone.core.fields.GeoLocation field whose point attribute to use as location data

• An embedded.field.name containing GeoJSON data to use as location data

• None, in which case there must be a single fiftyone.core.fields.GeoLocation field on the samples, which is used by default

• min_distance (None) – filter samples that are less than this distance (in meters) from point

• max_distance (None) – filter samples that are greater than this distance (in meters) from point

• query (None) – an optional dict defining a MongoDB read query that samples must match in order to be included in this view

Returns
geo_within(boundary, location_field=None, strict=True)

Filters the samples in this collection to only include samples whose geolocation is within a specified boundary.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

MANHATTAN = [
[
[-73.949701, 40.834487],
[-73.896611, 40.815076],
[-73.998083, 40.696534],
[-74.031751, 40.715273],
[-73.949701, 40.834487],
]
]

#
# Create a view that only contains samples in Manhattan
#

view = dataset.geo_within(MANHATTAN)
Parameters
• boundary – a fiftyone.core.labels.GeoLocation, fiftyone.core.labels.GeoLocations, GeoJSON dict, or list of coordinates that define a Polygon or MultiPolygon to search within

• location_field (None) –

the location data of each sample to use. Can be any of the following:

• The name of a fiftyone.core.fields.GeoLocation field whose point attribute to use as location data

• An embedded.field.name that directly contains the GeoJSON location data to use

• None, in which case there must be a single fiftyone.core.fields.GeoLocation field on the samples, which is used by default

• strict (True) – whether a sample’s location data must strictly fall within boundary (True) in order to match, or whether any intersection suffices (False)

Returns
get_annotation_info(anno_key)

Returns information about the annotation run with the given key on this collection.

Parameters

anno_key – an annotation key

Returns
get_brain_info(brain_key)

Returns information about the brain method run with the given key on this collection.

Parameters

brain_key – a brain key

Returns
get_evaluation_info(eval_key)

Returns information about the evaluation with the given key on this collection.

Parameters

eval_key – an evaluation key

Returns
get_field_schema(ftype=None, embedded_doc_type=None, include_private=False)

Returns a schema dictionary describing the fields of the samples in the view.

Parameters
• ftype (None) – an optional field type to which to restrict the returned schema. Must be a subclass of fiftyone.core.fields.Field

• embedded_doc_type (None) – an optional embedded document type to which to restrict the returned schema. Must be a subclass of fiftyone.core.odm.BaseEmbeddedDocument

• include_private (False) – whether to include fields that start with _ in the returned schema

Returns

an OrderedDict mapping field names to field types

get_frame_field_schema(ftype=None, embedded_doc_type=None, include_private=False)

Returns a schema dictionary describing the fields of the frames of the samples in the view.

Only applicable for video datasets.

Parameters
• ftype (None) – an optional field type to which to restrict the returned schema. Must be a subclass of fiftyone.core.fields.Field

• embedded_doc_type (None) – an optional embedded document type to which to restrict the returned schema. Must be a subclass of fiftyone.core.odm.BaseEmbeddedDocument

• include_private (False) – whether to include fields that start with _ in the returned schema

Returns

a dictionary mapping field names to field types, or None if the dataset is not a video dataset

get_index_information()

Returns a dictionary of information about the indexes on this collection.

See pymongo.collection.Collection.index_information() for details on the structure of this dictionary.

Returns

a dict mapping index names to info dicts

group_by(field_or_expr, sort_expr=None, reverse=False)

Creates a view that reorganizes the samples in the collection so that they are grouped by a specified field or expression.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

# Take a random sample of 1000 samples and organize them by ground
# truth label with groups arranged in decreasing order of size
view = dataset.take(1000).group_by(
"ground_truth.label",
sort_expr=F().length(),
reverse=True,
)

print(view.values("ground_truth.label"))
print(
sorted(
view.count_values("ground_truth.label").items(),
key=lambda kv: kv[1],
reverse=True,
)
)
Parameters
Returns
has_annotation_run(anno_key)

Whether this collection has an annotation run with the given key.

Parameters

anno_key – an annotation key

Returns

True/False

property has_annotation_runs

Whether this colection has any annotation runs.

has_brain_run(brain_key)

Whether this collection has a brain method run with the given key.

Parameters

brain_key – a brain key

Returns

True/False

property has_brain_runs

Whether this colection has any brain runs.

has_evaluation(eval_key)

Whether this collection has an evaluation with the given key.

Parameters

eval_key – an evaluation key

Returns

True/False

property has_evaluations

Whether this colection has any evaluation results.

has_frame_field(field_name)

Determines whether the collection has a frame-level field with the given name.

Parameters

field_name – the field name

Returns

True/False

has_sample_field(field_name)

Determines whether the collection has a sample field with the given name.

Parameters

field_name – the field name

Returns

True/False

Returns a list of the first few samples in the collection.

If fewer than num_samples samples are in the collection, only the available samples are returned.

Parameters

num_samples (3) – the number of samples

Returns

a list of fiftyone.core.sample.Sample objects

histogram_values(field_or_expr, expr=None, bins=None, range=None, auto=False)

Computes a histogram of the field values in the collection.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import numpy as np
import matplotlib.pyplot as plt

import fiftyone as fo
from fiftyone import ViewField as F

samples = []
for idx in range(100):
samples.append(
fo.Sample(
filepath="/path/to/image%d.png" % idx,
numeric_field=np.random.randn(),
numeric_list_field=list(np.random.randn(10)),
)
)

dataset = fo.Dataset()

def plot_hist(counts, edges):
counts = np.asarray(counts)
edges = np.asarray(edges)
left_edges = edges[:-1]
widths = edges[1:] - edges[:-1]
plt.bar(left_edges, counts, width=widths, align="edge")

#
# Compute a histogram of a numeric field
#

counts, edges, other = dataset.histogram_values(
"numeric_field", bins=50, range=(-4, 4)
)

plot_hist(counts, edges)
plt.show(block=False)

#
# Compute the histogram of a numeric list field
#

counts, edges, other = dataset.histogram_values(
"numeric_list_field", bins=50
)

plot_hist(counts, edges)
plt.show(block=False)

#
# Compute the histogram of a transformation of a numeric field
#

counts, edges, other = dataset.histogram_values(
2 * (F("numeric_field") + 1), bins=50
)

plot_hist(counts, edges)
plt.show(block=False)
Parameters
• field_or_expr

a field name, embedded.field.name, fiftyone.core.expressions.ViewExpression, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returned

• expr (None) –

a fiftyone.core.expressions.ViewExpression or MongoDB expression to apply to field_or_expr (which must be a field) before aggregating

• bins (None) – can be either an integer number of bins to generate or a monotonically increasing sequence specifying the bin edges to use. By default, 10 bins are created. If bins is an integer and no range is specified, bin edges are automatically distributed in an attempt to evenly distribute the counts in each bin

• range (None) – a (lower, upper) tuple specifying a range in which to generate equal-width bins. Only applicable when bins is an integer

• auto (False) – whether to automatically choose bin edges in an attempt to evenly distribute the counts in each bin. If this option is chosen, bins will only be used if it is an integer, and the range parameter is ignored

Returns

a tuple of

• counts: a list of counts in each bin

• edges: an increasing list of bin edges of length len(counts) + 1. Note that each bin is treated as having an inclusive lower boundary and exclusive upper boundary, [lower, upper), including the rightmost bin

• other: the number of items outside the bins

property info

The info dict of the underlying dataset.

iter_samples(progress=False)

Returns an iterator over the samples in the view.

Parameters

progress (False) – whether to render a progress bar tracking the iterator’s progress

Returns

an iterator over fiftyone.core.sample.SampleView instances

last()

Returns the last sample in the collection.

Returns
limit(limit)

Returns a view with at most the given number of samples.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
),
]
)

#
# Only include the first 2 samples in the view
#

view = dataset.limit(2)
Parameters

limit – the maximum number of samples to return. If a non-positive number is provided, an empty view is returned

Returns
limit_labels(field, limit)

Limits the number of fiftyone.core.labels.Label instances in the specified labels list field of each sample in the collection.

The specified field must be one of the following types:

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Only include the first detection in the predictions field of
# each sample
#

view = dataset.limit_labels("predictions", 1)
Parameters
• field – the labels list field to filter

• limit – the maximum number of labels to include in each labels list. If a non-positive number is provided, all lists will be empty

Returns
classmethod list_aggregations()

Returns a list of all available methods on this collection that apply fiftyone.core.aggregations.Aggregation operations to this collection.

Returns

a list of SampleCollection method names

list_annotation_runs()

Returns a list of all annotation keys on this collection.

Returns

a list of annotation keys

list_brain_runs()

Returns a list of all brain keys on this collection.

Returns

a list of brain keys

list_evaluations()

Returns a list of all evaluation keys on this collection.

Returns

a list of evaluation keys

list_indexes()

Returns the list of index names on this collection.

Single-field indexes are referenced by their field name, while compound indexes are referenced by more complicated strings. See pymongo.collection.Collection.index_information() for details on the compound format.

Returns

the list of index names

classmethod list_view_stages()

Returns a list of all available methods on this collection that apply fiftyone.core.stages.ViewStage operations to this collection.

Returns

a list of SampleCollection method names

Loads the results for the annotation run with the given key on this collection.

The fiftyone.utils.annotations.AnnotationResults object returned by this method will provide a variety of backend-specific methods allowing you to perform actions such as checking the status and deleting this run from the annotation backend.

Parameters
Returns

Loads the fiftyone.core.view.DatasetView on which the specified annotation run was performed on this collection.

Parameters
• anno_key – an annotation key

• select_fields (False) – whether to select only the fields involved in the annotation run

Returns

Downloads the labels from the given annotation run from the annotation backend and merges them into this collection.

Parameters
• anno_key – an annotation key

• skip_unexpected (False) – whether to skip any unexpected labels that don’t match the run’s label schema when merging. If False and unexpected labels are encountered, you will be presented an interactive prompt to deal with them

• cleanup (False) – whether to delete any informtation regarding this run from the annotation backend after loading the annotations

• **kwargs – optional keyword arguments for fiftyone.utils.annotations.AnnotationResults.load_credentials()

Loads the results for the brain method run with the given key on this collection.

Parameters

brain_key – a brain key

Returns

Loads the fiftyone.core.view.DatasetView on which the specified brain method run was performed on this collection.

Parameters
• brain_key – a brain key

• select_fields (False) – whether to select only the fields involved in the brain method run

Returns

Loads the results for the evaluation with the given key on this collection.

Parameters

eval_key – an evaluation key

Returns

Loads the fiftyone.core.view.DatasetView on which the specified evaluation was performed on this collection.

Parameters
• eval_key – an evaluation key

• select_fields (False) – whether to select only the fields involved in the evaluation

Returns
make_unique_field_name(root='')

Makes a unique field name with the given root name for the collection.

Parameters

root – an optional root for the output field name

Returns

the field name

map_labels(field, map)

Maps the label values of a fiftyone.core.labels.Label field to new values for each sample in the collection.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
weather=fo.Classification(label="sunny"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
weather=fo.Classification(label="cloudy"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
weather=fo.Classification(label="partly cloudy"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="squirrel",
bounding_box=[0.25, 0.25, 0.5, 0.5],
confidence=0.5,
),
]
),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Map the "partly cloudy" weather label to "cloudy"
#

view = dataset.map_labels("weather", {"partly cloudy": "cloudy"})

#
# Map "rabbit" and "squirrel" predictions to "other"
#

view = dataset.map_labels(
"predictions", {"rabbit": "other", "squirrel": "other"}
)
Parameters
• field – the labels field to map

• map – a dict mapping label values to new label values

Returns

The mask targets of the underlying dataset.

match(filter)

Filters the samples in the collection by the given filter.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
weather=fo.Classification(label="sunny"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.jpg",
weather=fo.Classification(label="cloudy"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
weather=fo.Classification(label="partly cloudy"),
predictions=fo.Detections(
detections=[
fo.Detection(
label="squirrel",
bounding_box=[0.25, 0.25, 0.5, 0.5],
confidence=0.5,
),
]
),
),
fo.Sample(
filepath="/path/to/image4.jpg",
predictions=None,
),
]
)

#
# Only include samples whose filepath ends with ".jpg"
#

view = dataset.match(F("filepath").ends_with(".jpg"))

#
# Only include samples whose weather field is "sunny"
#

view = dataset.match(F("weather").label == "sunny")

#
# Only include samples with at least 2 objects in their
# predictions field
#

view = dataset.match(F("predictions").detections.length() >= 2)

#
# Only include samples whose predictions field contains at least
# one object with area smaller than 0.2
#

# Bboxes are in [top-left-x, top-left-y, width, height] format
bbox = F("bounding_box")
bbox_area = bbox[2] * bbox[3]

small_boxes = F("predictions.detections").filter(bbox_area < 0.2)
view = dataset.match(small_boxes.length() > 0)
Parameters

filter

a fiftyone.core.expressions.ViewExpression or MongoDB expression that returns a boolean describing the filter to apply

Returns
match_frames(filter, omit_empty=True)

Filters the frames in the video collection by the given filter.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

#
# Match frames with at least 10 detections
#

num_objects = F("detections.detections").length()
view = dataset.match_frames(num_objects > 10)

print(dataset.count())
print(view.count())

print(dataset.count("frames"))
print(view.count("frames"))
Parameters

filter

a fiftyone.core.expressions.ViewExpression or MongoDB aggregation expression that returns a boolean describing the filter to apply

Returns
match_labels(labels=None, ids=None, tags=None, filter=None, fields=None)

Selects the samples from the collection that contain the specified labels.

The returned view will only contain samples that have at least one label that matches the specified selection criteria.

Note that, unlike select_labels() and filter_labels(), this stage will not filter the labels themselves; it only selects the corresponding samples.

You can perform a selection via one or more of the following methods:

If multiple criteria are specified, labels must match all of them in order to trigger a sample match.

By default, the selection is applied to all fiftyone.core.labels.Label fields, but you can provide the fields argument to explicitly define the field(s) in which to search.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

#
# Only show samples whose labels are currently selected in the App
#

session = fo.launch_app(dataset)

# Select some labels in the App...

view = dataset.match_labels(labels=session.selected_labels)

#
# Only include samples that contain labels with the specified IDs
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

view = dataset.match_labels(ids=ids)

print(len(view))
print(view.count("ground_truth.detections"))
print(view.count("predictions.detections"))

#
# Only include samples that contain labels with the specified tags
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

# Give the labels a "test" tag
dataset = dataset.clone()  # create copy since we're modifying data
dataset.select_labels(ids=ids).tag_labels("test")

print(dataset.count_values("ground_truth.detections.tags"))
print(dataset.count_values("predictions.detections.tags"))

# Retrieve the labels via their tag
view = dataset.match_labels(tags="test")

print(len(view))
print(view.count("ground_truth.detections"))
print(view.count("predictions.detections"))

#
# Only include samples that contain labels matching a filter
#

filter = F("confidence") > 0.99
view = dataset.match_labels(filter=filter, fields="predictions")

print(len(view))
print(view.count("ground_truth.detections"))
print(view.count("predictions.detections"))
Parameters
Returns
match_tags(tags, bool=True)

Returns a view containing the samples in the collection that have (or do not have) any of the given tag(s).

To match samples that must contain multiple tags, chain multiple match_tags() calls together.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
tags=["train"],
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
tags=["test"],
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
),
]
)

#
# Only include samples that have the "test" tag
#

view = dataset.match_tags("test")

#
# Only include samples that have either the "test" or "train" tag
#

view = dataset.match_tags(["test", "train"])

#
# Only include samples that do not have the "train" tag
#

view = dataset.match_tags("train", bool=False)
Parameters
• tags – the tag or iterable of tags to match

• bool (True) – whether to match samples that have (True) or do not have (False) the given tags

Returns
mean(field_or_expr, expr=None)

Computes the arithmetic mean of the field values of the collection.

None-valued fields are ignored.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Compute the mean of a numeric field
#

mean = dataset.mean("numeric_field")
print(mean)  # the mean

#
# Compute the mean of a numeric list field
#

mean = dataset.mean("numeric_list_field")
print(mean)  # the mean

#
# Compute the mean of a transformation of a numeric field
#

mean = dataset.mean(2 * (F("numeric_field") + 1))
print(mean)  # the mean
Parameters
Returns

the mean

property media_type

The media type of the underlying dataset.

mongo(pipeline)

Adds a view stage defined by a raw MongoDB aggregation pipeline.

See MongoDB aggregation pipelines for more details.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.1, 0.1, 0.5, 0.5],
confidence=0.9,
),
fo.Detection(
label="dog",
bounding_box=[0.2, 0.2, 0.3, 0.3],
confidence=0.8,
),
]
),
),
fo.Sample(
filepath="/path/to/image2.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="cat",
bounding_box=[0.5, 0.5, 0.4, 0.4],
confidence=0.95,
),
fo.Detection(label="rabbit"),
]
),
),
fo.Sample(
filepath="/path/to/image3.png",
predictions=fo.Detections(
detections=[
fo.Detection(
label="squirrel",
bounding_box=[0.25, 0.25, 0.5, 0.5],
confidence=0.5,
),
]
),
),
fo.Sample(
filepath="/path/to/image4.png",
predictions=None,
),
]
)

#
# Extract a view containing the second and third samples in the
# dataset
#

view = dataset.mongo([{"$skip": 1}, {"$limit": 2}])

#
# Sort by the number of objects in the precictions field
#

view = dataset.mongo([
{
"$addFields": { "_sort_field": { "$size": {"$ifNull": ["$predictions.detections", []]}
}
}
},
{"$sort": {"_sort_field": -1}}, {"$unset": "_sort_field"}
])
Parameters

pipeline – a MongoDB aggregation pipeline (list of dicts)

Returns
one(expr, exact=False)

Returns a single sample in this collection matching the expression.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

#
# Get a sample by filepath
#

# A random filepath in the dataset
filepath = dataset.take(1).first().filepath

# Get sample by filepath
sample = dataset.one(F("filepath") == filepath)

#
# Dealing with multiple matches
#

# Get a sample whose image is JPEG
sample = dataset.one(F("filepath").ends_with(".jpg"))

# Raises an error since there are multiple JPEGs
dataset.one(F("filepath").ends_with(".jpg"), exact=True)
Parameters
Returns
select(sample_ids, ordered=False)

Selects the samples with the given IDs from the collection.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Create a view containing the currently selected samples in the App
#

session = fo.launch_app(dataset)

# Select samples in the App...

view = dataset.select(session.selected)
Parameters

sample_ids

the samples to select. Can be any of the following:

ordered (False): whether to sort the samples in the returned view to

match the order of the provided IDs

Returns
select_by(field, values, ordered=False)

Selects the samples with the given field values from the collection.

This stage is typically used to work with categorical fields (strings, ints, and bools). If you want to select samples based on floating point fields, use match().

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(filepath="image%d.jpg" % i, int=i, str=str(i))
for i in range(100)
]
)

#
# Create a view containing samples whose int field have the given
# values
#

view = dataset.select_by("int", [1, 51, 11, 41, 21, 31])

#
# Create a view containing samples whose str field have the given
# values, in order
#

view = dataset.select_by(
"str", ["1", "51", "11", "41", "21", "31"], ordered=True
)
Parameters
• field – a field or embedded.field.name

• values – a value or iterable of values to select by

• ordered (False) – whether to sort the samples in the returned view to match the order of the provided values

Returns
select_fields(field_names=None, _allow_missing=False)

Selects only the fields with the given names from the samples in the collection. All other fields are excluded.

Note that default sample fields are always selected.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[-1, 0, 1],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=-1.0,
numeric_list_field=[-2, -1, 0, 1],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
),
]
)

#
# Include only the default fields on each sample
#

view = dataset.select_fields()

#
# Include only the numeric_field field (and the default fields)
# on each sample
#

view = dataset.select_fields("numeric_field")
Parameters

field_names (None) – a field name or iterable of field names to select

Returns
select_frames(frame_ids, omit_empty=True)

Selects the frames with the given IDs from the video collection.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Select some specific frames
#

frame_ids = [
dataset.first().frames.first().id,
dataset.last().frames.last().id,
]

view = dataset.select_frames(frame_ids)

print(dataset.count())
print(view.count())

print(dataset.count("frames"))
print(view.count("frames"))
Parameters
Returns
select_labels(labels=None, ids=None, tags=None, fields=None, omit_empty=True)

Selects only the specified labels from the collection.

The returned view will omit samples, sample fields, and individual labels that do not match the specified selection criteria.

You can perform a selection via one or more of the following methods:

• Provide the labels argument, which should contain a list of dicts in the format returned by fiftyone.core.session.Session.selected_labels(), to select specific labels

• Provide the ids argument to select labels with specific IDs

• Provide the tags argument to select labels with specific tags

If multiple criteria are specified, labels must match all of them in order to be selected.

By default, the selection is applied to all fiftyone.core.labels.Label fields, but you can provide the fields argument to explicitly define the field(s) in which to select.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Only include the labels currently selected in the App
#

session = fo.launch_app(dataset)

# Select some labels in the App...

view = dataset.select_labels(labels=session.selected_labels)

#
# Only include labels with the specified IDs
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

view = dataset.select_labels(ids=ids)

print(view.count("ground_truth.detections"))
print(view.count("predictions.detections"))

#
# Only include labels with the specified tags
#

# Grab some label IDs
ids = [
dataset.first().ground_truth.detections[0].id,
dataset.last().predictions.detections[0].id,
]

# Give the labels a "test" tag
dataset = dataset.clone()  # create copy since we're modifying data
dataset.select_labels(ids=ids).tag_labels("test")

print(dataset.count_label_tags())

# Retrieve the labels via their tag
view = dataset.select_labels(tags="test")

print(view.count("ground_truth.detections"))
print(view.count("predictions.detections"))
Parameters
• labels (None) – a list of dicts specifying the labels to select in the format returned by fiftyone.core.session.Session.selected_labels()

• ids (None) – an ID or iterable of IDs of the labels to select

• tags (None) – a tag or iterable of tags of labels to select

• fields (None) – a field or iterable of fields from which to select

• omit_empty (True) – whether to omit samples that have no labels after filtering

Returns
set_field(field, expr, _allow_missing=False)

Sets a field or embedded field on each sample in a collection by evaluating the given expression.

This method can process embedded list fields. To do so, simply append [] to any list component(s) of the field path.

Note

There are two cases where FiftyOne will automatically unwind array fields without requiring you to explicitly specify this via the [] syntax:

Top-level lists: when you specify a field path that refers to a top-level list field of a dataset; i.e., list_field is automatically coerced to list_field[], if necessary.

List fields: When you specify a field path that refers to the list field of a Label class, such as the Detections.detections attribute; i.e., ground_truth.detections.label is automatically coerced to ground_truth.detections[].label, if necessary.

See the examples below for demonstrations of this behavior.

The provided expr is interpreted relative to the document on which the embedded field is being set. For example, if you are setting a nested field field="embedded.document.field", then the expression expr you provide will be applied to the embedded.document document. Note that you can override this behavior by defining an expression that is bound to the root document by prepending "$" to any field name(s) in the expression. See the examples below for more information. Note Note that you cannot set a non-existing top-level field using this stage, since doing so would violate the dataset’s schema. You can, however, first declare a new field via fiftyone.core.dataset.Dataset.add_sample_field() and then populate it in a view via this stage. Examples: import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") # # Replace all values of the uniqueness field that are less than # 0.5 with None # view = dataset.set_field( "uniqueness", (F("uniqueness") >= 0.5).if_else(F("uniqueness"), None) ) print(view.bounds("uniqueness")) # # Lower bound all object confidences in the predictions field at # 0.5 # view = dataset.set_field( "predictions.detections.confidence", F("confidence").max(0.5) ) print(view.bounds("predictions.detections.confidence")) # # Add a num_predictions property to the predictions field that # contains the number of objects in the field # view = dataset.set_field( "predictions.num_predictions", F("$predictions.detections").length(),
)
print(view.bounds("predictions.num_predictions"))

#
# Set an is_animal field on each object in the predictions field
# that indicates whether the object is an animal
#

ANIMALS = [
"bear", "bird", "cat", "cow", "dog", "elephant", "giraffe",
"horse", "sheep", "zebra"
]

view = dataset.set_field(
"predictions.detections.is_animal", F("label").is_in(ANIMALS)
)
print(view.count_values("predictions.detections.is_animal"))
Parameters
Returns
shuffle(seed=None)

Randomly shuffles the samples in the collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=None,
),
]
)

#
# Return a view that contains a randomly shuffled version of the
# samples in the dataset
#

view = dataset.shuffle()

#
# Shuffle the samples with a fixed random seed
#

view = dataset.shuffle(seed=51)
Parameters

seed (None) – an optional random seed to use when shuffling the samples

Returns
skip(skip)

Omits the given number of samples from the head of the collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=fo.Classification(label="rabbit"),
),
fo.Sample(
filepath="/path/to/image4.png",
ground_truth=None,
),
]
)

#
# Omit the first two samples from the dataset
#

view = dataset.skip(2)
Parameters

skip – the number of samples to skip. If a non-positive number is provided, no samples are omitted

Returns
sort_by(field_or_expr, reverse=False)

Sorts the samples in the collection by the given field(s) or expression(s).

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

#
# Sort the samples by their uniqueness field in ascending order
#

view = dataset.sort_by("uniqueness", reverse=False)

#
# Sorts the samples in descending order by the number of detections
# in their predictions field whose bounding box area is less than
# 0.2
#

# Bboxes are in [top-left-x, top-left-y, width, height] format
bbox = F("bounding_box")
bbox_area = bbox[2] * bbox[3]

small_boxes = F("predictions.detections").filter(bbox_area < 0.2)
view = dataset.sort_by(small_boxes.length(), reverse=True)

#
# Performs a compound sort where samples are first sorted in
# descending or by number of detections and then in ascending order
# of uniqueness for samples with the same number of predictions
#

view = dataset.sort_by(
[
(F("predictions.detections").length(), -1),
("uniqueness", 1),
]
)

num_objects, uniqueness = view[:5].values(
[F("predictions.detections").length(), "uniqueness"]
)
print(list(zip(num_objects, uniqueness)))
Parameters
• field_or_expr

the field(s) or expression(s) to sort by. This can be any of the following:

• a field to sort by

• an embedded.field.name to sort by

• a fiftyone.core.expressions.ViewExpression or a MongoDB aggregation expression that defines the quantity to sort by

• a list of (field_or_expr, order) tuples defining a compound sort criteria, where field_or_expr is a field or expression as defined above, and order can be 1 or any string starting with “a” for ascending order, or -1 or any string starting with “d” for descending order

• reverse (False) – whether to return the results in descending order

Returns
sort_by_similarity(query_ids, k=None, reverse=False, brain_key=None)

Sorts the samples in the collection by visual similiarity to a specified set of query ID(s).

In order to use this stage, you must first use fiftyone.brain.compute_similarity() to index your dataset by visual similiarity.

Examples:

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

fob.compute_similarity(dataset, brain_key="similarity")

#
# Sort the samples by their visual similarity to the first sample
# in the dataset
#

query_id = dataset.first().id
view = dataset.sort_by_similarity(query_id)
Parameters
• query_ids – an ID or iterable of query IDs. These may be sample IDs or label IDs depending on brain_key

• k (None) – the number of matches to return. By default, the entire collection is sorted

• reverse (False) – whether to sort by least similarity

• brain_key (None) – the brain key of an existing fiftyone.brain.compute_similarity() run on the dataset. If not specified, the dataset must have an applicable run, which will be used by default

Returns
std(field_or_expr, expr=None, sample=False)

Computes the standard deviation of the field values of the collection.

None-valued fields are ignored.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Compute the standard deviation of a numeric field
#

std = dataset.std("numeric_field")
print(std)  # the standard deviation

#
# Compute the standard deviation of a numeric list field
#

std = dataset.std("numeric_list_field")
print(std)  # the standard deviation

#
# Compute the standard deviation of a transformation of a numeric field
#

std = dataset.std(2 * (F("numeric_field") + 1))
print(std)  # the standard deviation
Parameters
• field_or_expr

a field name, embedded.field.name, fiftyone.core.expressions.ViewExpression, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returned

• expr (None) –

a fiftyone.core.expressions.ViewExpression or MongoDB expression to apply to field_or_expr (which must be a field) before aggregating

• sample (False) – whether to compute the sample standard deviation rather than the population standard deviation

Returns

the standard deviation

sum(field_or_expr, expr=None)

Computes the sum of the field values of the collection.

None-valued fields are ignored.

This aggregation is typically applied to numeric field types (or lists of such types):

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Compute the sum of a numeric field
#

total = dataset.sum("numeric_field")
print(total)  # the sum

#
# Compute the sum of a numeric list field
#

total = dataset.sum("numeric_list_field")
print(total)  # the sum

#
# Compute the sum of a transformation of a numeric field
#

total = dataset.sum(2 * (F("numeric_field") + 1))
print(total)  # the sum
Parameters
Returns

the sum

summary()

Returns a string summary of the view.

Returns

a string summary

tag_labels(tags, label_fields=None)

Adds the tag(s) to all labels in the specified label field(s) of this collection, if necessary.

Parameters
• tags – a tag or iterable of tags

• label_fields (None) – an optional name or iterable of names of fiftyone.core.labels.Label fields. By default, all label fields are used

tag_samples(tags)

Adds the tag(s) to all samples in this collection, if necessary.

Parameters

tags – a tag or iterable of tags

tail(num_samples=3)

Returns a list of the last few samples in the collection.

If fewer than num_samples samples are in the collection, only the available samples are returned.

Parameters

num_samples (3) – the number of samples

Returns

a list of fiftyone.core.sample.Sample objects

take(size, seed=None)

Randomly samples the given number of samples from the collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
ground_truth=fo.Classification(label="cat"),
),
fo.Sample(
filepath="/path/to/image2.png",
ground_truth=fo.Classification(label="dog"),
),
fo.Sample(
filepath="/path/to/image3.png",
ground_truth=fo.Classification(label="rabbit"),
),
fo.Sample(
filepath="/path/to/image4.png",
ground_truth=None,
),
]
)

#
# Take two random samples from the dataset
#

view = dataset.take(2)

#
# Take two random samples from the dataset with a fixed seed
#

view = dataset.take(2, seed=51)
Parameters
• size – the number of samples to return. If a non-positive number is provided, an empty view is returned

• seed (None) – an optional random seed to use when selecting the samples

Returns
to_dict(rel_dir=None, frame_labels_dir=None, pretty_print=False)

Returns a JSON dictionary representation of the view.

Parameters
• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render frame labels JSON in human readable format with newlines and indentations. Only applicable to video datasets when a frame_labels_dir is provided

Returns

a JSON dict

to_evaluation_patches(eval_key)

Creates a view based on the results of the evaluation with the given key that contains one sample for each true positive, false positive, and false negative example in the collection, respectively.

True positive examples will result in samples with both their ground truth and predicted fields populated, while false positive/negative examples will only have one of their corresponding predicted/ground truth fields populated, respectively.

If multiple predictions are matched to a ground truth object (e.g., if the evaluation protocol includes a crowd attribute), then all matched predictions will be stored in the single sample along with the ground truth object.

The returned dataset will also have top-level type and iou fields populated based on the evaluation results for that example, as well as a sample_id field recording the sample ID of the example, and a crowd field if the evaluation protocol defines a crowd attribute.

Note

The returned view will contain patches for the contents of this collection, which may differ from the view on which the eval_key evaluation was performed. This may exclude some labels that were evaluated and/or include labels that were not evaluated.

If you would like to see patches for the exact view on which an evaluation was performed, first call load_evaluation_view() to load the view and then convert to patches.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset.evaluate_detections("predictions", eval_key="eval")

session = fo.launch_app(dataset)

#
# Create a patches view for the evaluation results
#

view = dataset.to_evaluation_patches("eval")
print(view)

session.view = view
Parameters

eval_key – an evaluation key that corresponds to the evaluation of ground truth/predicted fields that are of type fiftyone.core.labels.Detections or fiftyone.core.labels.Polylines

Returns
to_frames(**kwargs)

Creates a view that contains one sample per frame in the video collection.

By default, samples will be generated for every frame of each video, based on the total frame count of the video files, but this method is highly customizable. Refer to fiftyone.core.video.make_frames_dataset() to see the available configuration options.

Note

Unless you have configured otherwise, creating frame views will sample the necessary frames from the input video collection into directories of per-frame images. For large video datasets, this may take some time and require substantial disk space.

Frames that have previously been sampled will not be resampled, so creating frame views into the same dataset will become faster after the frames have been sampled.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

session = fo.launch_app(dataset)

#
# Create a frames view for an entire video dataset
#

frames = dataset.to_frames()
print(frames)

session.view = frames

#
# Create a frames view that only contains frames with at least 10
# objects, sampled at a maximum frame rate of 1fps
#

num_objects = F("detections.detections").length()
view = dataset.match_frames(num_objects > 10)

frames = view.to_frames(max_fps=1, sparse=True)
print(frames)

session.view = frames
Parameters

**kwargs – optional keyword arguments for fiftyone.core.video.make_frames_dataset() specifying how to perform the conversion

Returns
to_json(rel_dir=None, frame_labels_dir=None, pretty_print=False)

Returns a JSON string representation of the collection.

The samples will be written as a list in a top-level samples field of the returned dictionary.

Parameters
• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

Returns

a JSON string

to_patches(field)

Creates a view that contains one sample per object patch in the specified field of the collection.

Fields other than field and the default sample fields will not be included in the returned view. A sample_id field will be added that records the sample ID from which each patch was taken.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

session = fo.launch_app(dataset)

#
# Create a view containing the ground truth patches
#

view = dataset.to_patches("ground_truth")
print(view)

session.view = view
Parameters

field – the patches field, which must be of type fiftyone.core.labels.Detections or fiftyone.core.labels.Polylines

Returns
untag_labels(tags, label_fields=None)

Removes the tag from all labels in the specified label field(s) of this collection, if necessary.

Parameters
• tags – a tag or iterable of tags

• label_fields (None) – an optional name or iterable of names of fiftyone.core.labels.Label fields. By default, all label fields are used

untag_samples(tags)

Removes the tag(s) from all samples in this collection, if necessary.

Parameters

tags – a tag or iterable of tags

validate_field_type(field_name, ftype, embedded_doc_type=None, subfield=None)

Validates that the collection has a field of the given type.

Parameters
Raises

ValueError – if the field does not exist or does not have the expected type

validate_fields_exist(fields, include_private=False)

Validates that the collection has field(s) with the given name(s).

If embedded field names are provided, only the root field is checked.

Parameters
• fields – a field name or iterable of field names

• include_private (False) – whether to include private fields when checking for existence

Raises

ValueError – if one or more of the fields do not exist

values(field_or_expr, expr=None, missing_value=None, unwind=False, _allow_missing=False, _big_result=True, _raw=False)

Extracts the values of a field from all samples in the collection.

Values aggregations are useful for efficiently extracting a slice of field or embedded field values across all samples in a collection. See the examples below for more details.

The dual function of values() is set_values(), which can be used to efficiently set a field or embedded field of all samples in a collection by providing lists of values of same structure returned by this aggregation.

Note

Unlike other aggregations, values() does not automatically unwind list fields, which ensures that the returned values match the potentially-nested structure of the documents.

You can opt-in to unwinding specific list fields using the [] syntax, or you can pass the optional unwind=True parameter to unwind all supported list fields. See Aggregating list fields for more information.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1, 2, 3],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1, 2],
),
fo.Sample(
filepath="/path/to/image3.png",
numeric_field=None,
numeric_list_field=None,
),
]
)

#
# Get all values of a field
#

values = dataset.values("numeric_field")
print(values)  # [1.0, 4.0, None]

#
# Get all values of a list field
#

values = dataset.values("numeric_list_field")
print(values)  # [[1, 2, 3], [1, 2], None]

#
# Get all values of transformed field
#

values = dataset.values(2 * (F("numeric_field") + 1))
print(values)  # [4.0, 10.0, None]

#
# Get values from a label list field
#

# list of Detections
detections = dataset.values("ground_truth")

# list of lists of Detection instances
detections = dataset.values("ground_truth.detections")

# list of lists of detection labels
labels = dataset.values("ground_truth.detections.label")
Parameters
• field_or_expr

a field name, embedded.field.name, fiftyone.core.expressions.ViewExpression, or MongoDB expression defining the field or expression to aggregate. This can also be a list or tuple of such arguments, in which case a tuple of corresponding aggregation results (each receiving the same additional keyword arguments, if any) will be returned

• expr (None) –

a fiftyone.core.expressions.ViewExpression or MongoDB expression to apply to field_or_expr (which must be a field) before aggregating

• missing_value (None) – a value to insert for missing or None-valued fields

• unwind (False) – whether to automatically unwind all recognized list fields

Returns

the list of values

view()

Returns a copy of this view.

Returns

a DatasetView

write_json(json_path, rel_dir=None, frame_labels_dir=None, pretty_print=False)

Writes the colllection to disk in JSON format.

Parameters
• json_path – the path to write the JSON

• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

fiftyone.core.video.make_frames_dataset(sample_collection, sample_frames=True, frames_patt=None, fps=None, max_fps=None, size=None, min_size=None, max_size=None, force_sample=False, sparse=False, verbose=False)

Creates a dataset that contains one sample per video frame in the collection.

By default, samples will be generated for every video frame at full resolution, but this method provides a variety of parameters that can be used to customize the sampling behavior.

The returned dataset will contain all frame-level fields and the tags of each video as sample-level fields, as well as a sample_id field that records the IDs of the parent sample for each frame.

When sample_frames is True (the default), this method samples each video in the collection into a directory of per-frame images with the same basename as the input video with frame numbers/format specified by frames_patt. If this method is run multiple times, existing frames will not be resampled unless force_sample is set to True.

For example, if frames_patt = "%%06d.jpg", then videos with the following paths:

/path/to/video1.mp4
/path/to/video2.mp4
...

would be sampled as follows:

/path/to/video1/
000001.jpg
000002.jpg
...
/path/to/video2/
000001.jpg
000002.jpg
...

Note

The returned dataset is independent from the source collection; modifying it will not affect the source collection.

Parameters
• sample_collection – a fiftyone.core.collections.SampleCollection

• sample_frames (True) – whether to sample the video frames (True) or set the filepath of each sample to the source video (False). Note that datasets generated with this parameter set to False cannot currently be viewed in the App

• frames_patt (None) – a pattern specifying the filename/format to use to store the sampled frames, e.g., "%%06d.jpg". The default value is fiftyone.config.default_sequence_idx + fiftyone.config.default_image_ext

• fps (None) – an optional frame rate at which to sample each video’s frames

• max_fps (None) – an optional maximum frame rate at which to sample. Videos with frame rate exceeding this value are downsampled

• size (None) – an optional (width, height) for each frame. One dimension can be -1, in which case the aspect ratio is preserved

• min_size (None) – an optional minimum (width, height) for each frame. A dimension can be -1 if no constraint should be applied. The frames are resized (aspect-preserving) if necessary to meet this constraint

• max_size (None) – an optional maximum (width, height) for each frame. A dimension can be -1 if no constraint should be applied. The frames are resized (aspect-preserving) if necessary to meet this constraint

• sparse (False) – whether to only generate samples for non-empty frames, i.e., frame numbers for which fiftyone.core.frame.Frame instances have explicitly been created

• force_sample (False) – whether to resample videos whose sampled frames already exist

• verbose (False) – whether to log information about the frames that will be sampled, if any

Returns