# fiftyone.core.view¶

Dataset views.

Classes

 DatasetView(dataset) A view into a fiftyone.core.dataset.Dataset.
class fiftyone.core.view.DatasetView(dataset)

A view into a fiftyone.core.dataset.Dataset.

Dataset views represent ordered collections of subsets of samples in a dataset.

Operations on dataset views are designed to be chained together to yield the desired subset of the dataset, which is then iterated over to directly access the sample views. Each stage in the pipeline defining a dataset view is represented by a fiftyone.core.stages.ViewStage instance.

The stages of a dataset view specify:

• what subset of samples (and their order) should be included

• what “parts” (fields and their elements) of the sample should be included

Samples retrieved from dataset views are returns as fiftyone.core.sample.SampleView objects, as opposed to fiftyone.core.sample.Sample objects, since they may contain a subset of the sample’s content.

Example use:

# Print paths for 5 random samples from the test split of a dataset
view = dataset.match_tag("test").take(5)
for sample in view:
print(sample.filepath)
Parameters

dataset – a fiftyone.core.dataset.Dataset

property media_type

The media type of the underlying dataset.

property name

The name of the view.

property dataset_name

The name of the underlying dataset.

property info

The fiftyone.core.dataset.Dataset.info() dict of the underlying dataset.

property stages

The list of fiftyone.core.stages.ViewStage instances in this view’s pipeline.

summary()

Returns a string summary of the view.

Returns

a string summary

iter_samples()

Returns an iterator over the samples in the view.

Returns

an iterator over fiftyone.core.sample.SampleView instances

get_field_schema(ftype=None, embedded_doc_type=None, include_private=False)

Returns a schema dictionary describing the fields of the samples in the view.

Parameters
• ftype (None) – an optional field type to which to restrict the returned schema. Must be a subclass of fiftyone.core.fields.Field

• embedded_doc_type (None) – an optional embedded document type to which to restrict the returned schema. Must be a subclass of fiftyone.core.odm.BaseEmbeddedDocument

• include_private (False) – whether to include fields that start with _ in the returned schema

Returns

an OrderedDict mapping field names to field types

get_frame_field_schema(ftype=None, embedded_doc_type=None, include_private=False)

Returns a schema dictionary describing the fields of the frames of the samples in the view.

Only applicable for video datasets.

Parameters
• ftype (None) – an optional field type to which to restrict the returned schema. Must be a subclass of fiftyone.core.fields.Field

• embedded_doc_type (None) – an optional embedded document type to which to restrict the returned schema. Must be a subclass of fiftyone.core.odm.BaseEmbeddedDocument

• include_private (False) – whether to include fields that start with _ in the returned schema

Returns

a dictionary mapping field names to field types, or None if the dataset is not a video dataset

list_indexes()

Returns the fields of the dataset that are indexed.

Returns

a list of field names

create_index(field, unique=False)

Creates an index on the given field.

If the given field already has a unique index, it will be retained regardless of the unique value you specify.

If the given field already has a non-unique index but you requested a unique index, the existing index will be dropped.

Indexes enable efficient sorting, merging, and other such operations.

Parameters
• field – the field name or embedded.field.name

• unique (False) – whether to add a uniqueness constraint to the index

drop_index(field)

Drops the index on the given field.

Parameters

field – the field name or embedded.field.name

clone_sample_field(field_name, new_field_name)

Clones the given sample field of the view into a new field of the dataset.

You can use dot notation (embedded.field.name) to clone embedded fields.

Parameters
• field_name – the field name to clone

• new_field_name – the new field name to populate

clone_frame_field(field_name, new_field_name)

Clones the frame-level field of the view into a new field.

You can use dot notation (embedded.field.name) to clone embedded frame fields.

Only applicable to video datasets.

Parameters
• field_name – the field name

• new_field_name – the new field name

clear_sample_field(field_name)

Clears the values of the field from all samples in the view.

The field will remain in the dataset’s schema, and all samples in the view will have the value None for the field.

You can use dot notation (embedded.field.name) to clear embedded fields.

Parameters

field_name – the field name

clear_frame_field(field_name)

Clears the values of the frame field from all samples in the view.

The field will remain in the dataset’s frame schema, and all frames in the view will have the value None for the field.

You can use dot notation (embedded.field.name) to clear embedded frame fields.

Only applicable to video datasets.

Parameters

field_name – the field name

save()

Overwrites the underlying dataset with the contents of the view.

WARNING: this will permanently delete any omitted, filtered, or otherwise modified contents of the dataset.

clone(name=None)

Creates a new dataset containing only the contents of the view.

Parameters

name (None) – a name for the cloned dataset. By default, get_default_dataset_name() is used

Returns

the new Dataset

to_dict(rel_dir=None, frame_labels_dir=None, pretty_print=False)

Returns a JSON dictionary representation of the view.

Parameters
• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render frame labels JSON in human readable format with newlines and indentations. Only applicable to video datasets when a frame_labels_dir is provided

Returns

a JSON dict

Applies the given fiftyone.core.stages.ViewStage to the collection.

Parameters

stage – a fiftyone.core.stages.ViewStage

Returns
Raises

fiftyone.core.stages.ViewStageError – if the stage was not a valid stage for this collection

aggregate(aggregations, _attach_frames=True)

Aggregates one or more fiftyone.core.aggregations.Aggregation instances.

Note that it is best practice to group aggregations into a single call to aggregate(), as this will be more efficient than performing multiple aggregations in series.

Parameters

aggregations – an fiftyone.core.aggregations.Aggregation or iterable of <fiftyone.core.aggregations.Aggregation> instances

Returns

an fiftyone.core.aggregations.AggregationResult or list of fiftyone.core.aggregations.AggregationResult instances corresponding to the input aggregations

apply_model(model, label_field='predictions', confidence_thresh=None, batch_size=None)

Applies the fiftyone.core.models.Model to the samples in the collection.

Parameters
• model – a fiftyone.core.models.Model

• label_field ("predictions") – the name (or prefix) of the field in which to store the model predictions

• confidence_thresh (None) – an optional confidence threshold to apply to any applicable labels generated by the model

• batch_size (None) – an optional batch size to use. Only applicable for image samples

bounds(field_name)

Computes the bounds of a numeric field or numeric list field of a collection.

Examples:

import fiftyone as fo

dataset = fo.Dataset()
[
fo.Sample(
filepath="/path/to/image1.png",
numeric_field=1.0,
numeric_list_field=[1.0, 2.0, 3.0],
),
fo.Sample(
filepath="/path/to/image2.png",
numeric_field=4.0,
numeric_list_field=[1.5, 2.5],
),
]
)

# Add a generic list field

#
# Compute the bounds of a numeric field
#

r = dataset.bounds("numeric_field")
r.bounds  # (min, max)

#
# Compute the a bounds of a numeric list field
#

r = dataset.bounds("numeric_list_field")
r.bounds  # (min, max)

#
# Cannot compute bounds of a generic list field
#

dataset.bounds("list_field")  # error
Parameters

field_name – the name of the field to compute bounds for

Returns

fiftyone.core.aggregations.BoundsResult

compute_embeddings(model, embeddings_field=None, batch_size=None)

Computes embeddings for the samples in the collection using the given fiftyone.core.models.Model.

The model must expose embeddings, i.e., fiftyone.core.models.Model.has_embeddings() must return True.

If an embeddings_field is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.

Parameters
• model – a fiftyone.core.models.Model

• embeddings_field (None) – the name of a field in which to store the embeddings

• batch_size (None) – an optional batch size to use. Only applicable for image samples

Returns

None, if an embeddings_field is provided; otherwise, a numpy array whose first dimension is len(samples) containing the embeddings

Populates the metadata field of all samples in the collection.

Any samples with existing metadata are skipped, unless overwrite == True.

Parameters
• overwrite (False) – whether to overwrite existing metadata

• num_workers (None) – the number of processes to use. By default, multiprocessing.cpu_count() is used

compute_patch_embeddings(model, patches_field, embeddings_field=None, batch_size=None, force_square=False, alpha=None)

Computes embeddings for the image patches defined by patches_field of the samples in the collection using the given fiftyone.core.models.Model.

The model must expose embeddings, i.e., fiftyone.core.models.Model.has_embeddings() must return True.

If an embeddings_field is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.

Parameters
• model – a fiftyone.core.models.Model

• patches_field – a fiftyone.core.labels.Detection, fiftyone.core.labels.Detections, fiftyone.core.labels.Polyline, or fiftyone.core.labels.Polylines field defining the image patches in each sample to embed

• embeddings_field (None) – the name of a field in which to store the embeddings

• batch_size (None) – an optional batch size to use

• force_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction

• alpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in [-1, \infty). If provided, the length and width of the box are expanded (or contracted, when alpha < 0) by (100 * alpha)%. For example, set alpha = 1.1 to expand the boxes by 10%, and set alpha = 0.9 to contract the boxes by 10%

Returns

None, if an embeddings_field is provided; otherwise, a dict mapping sample IDs to arrays of patch embeddings

confidence_bounds(field_name)

Computes the bounds of the confidence of a fiftyone.core.labels.Label field of a collection.

Examples:

import fiftyone as fo

#
# Compute the confidence bounds of a Classification field
#

r = dataset.confidence_bounds("predictions")
r.bounds  # (min, max)

#
# Compute the confidence bounds of a Detections field
#

r = dataset.confidence_bounds("detections")
r.bounds  # (min, max)
Parameters

field_name – the name of the label field to compute confidence bounds for

Returns

fiftyone.core.aggregations.ConfidenceBoundsResult

count(field_name=None)

Counts the number of samples or number of items with respect to a field of a collection.

If a field is provided, it can be a fiftyone.core.fields.ListField or a fiftyone.core.labels.Label list field.

Examples:

import fiftyone as fo

#
# Compute the number of samples in a dataset
#

r = dataset.count()
r.count

#
# Compute the number of objects in a Detections field
#

r = dataset.count("detections")
r.count
Parameters

field_name (None) – the field whose items to count. If no field name is provided, the samples themselves are counted

Returns

fiftyone.core.aggregations.CountResult

count_labels(field_name)

Counts the label values in a fiftyone.core.labels.Label field of a collection.

Examples:

import fiftyone as fo

#
# Compute label counts for a Classification field called "class"
#

r = dataset.count_labels("class")
r.labels  # dict mapping labels to counts

#
# Compute label counts for a Detections field called "objects"
#

r = dataset.count_labels("objects")
r.labels  # dict mapping labels to counts
Parameters

field_name – the name of the label field

Returns

fiftyone.core.aggregations.CountLabelsResult

count_values(field_name)

Counts the occurrences of values in a countable field or list of countable fields of a collection.

Countable fields are:

Examples:

import fiftyone as fo

#
# Compute the tag counts in the dataset
#

r = dataset.count_values("tags")
r.values  # dict mapping tags to counts
Parameters

field_name – the name of the countable field

Returns

fiftyone.core.aggregations.CountValuesResult

distinct(field_name)

Computes the distinct values of a countable field or a list of countable fields of a collection.

Countable fields are:

Examples:

import fiftyone as fo

#
# Compute the distinct values of a StringField named kind
#

r = dataset.distinct("kind")
r.values  # list of distinct values

#
# Compute the a bounds of the tags field
#

r = dataset.distinct("tags")
r.values  # list of distinct values
Parameters

field_name – the name of the field to compute distinct values for

Returns

fiftyone.core.aggregations.DistinctResult

distinct_labels(field_name)

Computes the distinct label values of a fiftyone.core.labels.Label field of a collection.

Examples:

import fiftyone as fo

#
# Compute the distinct labels of a Classification field
#

r = dataset.distinct_labels("predictions")
r.labels  # list of distinct labels

#
# Compute the distinct labels of a Detections field
#

r = dataset.distinct_labels("detections")
r.labels  # list of distinct labels
Parameters

field_name – the name of the label field

Returns

fiftyone.core.aggregations.DistinctLabelsResult

draw_labels(anno_dir, label_fields=None, overwrite=False, annotation_config=None)

Renders annotated versions of the samples in the collection with label field(s) overlaid to the given directory.

The filenames of the sample data are maintained, unless a name conflict would occur in anno_dir, in which case an index of the form "-%d" % count is appended to the base filename.

Images are written in format fo.config.default_image_ext.

Parameters
Returns

the list of paths to the labeled images

exclude(sample_ids)

Excludes the samples with the given IDs from the collection.

Examples:

import fiftyone as fo

#
# Exclude a single sample from a dataset
#

view = dataset.exclude("5f3c298768fd4d3baf422d2f")

#
# Exclude a list of samples from a dataset
#

view = dataset.exclude([
"5f3c298768fd4d3baf422d2f",
"5f3c298768fd4d3baf422d30"
])
Parameters

sample_ids – a sample ID or iterable of sample IDs

Returns
exclude_fields(field_names)

Excludes the fields with the given names from the returned fiftyone.core.sample.SampleView instances.

Note that default fields cannot be excluded.

Examples:

import fiftyone as fo

#
# Exclude a field from all samples in a dataset
#

view = dataset.exclude_fields("predictions")

#
# Exclude a list of fields from all samples in a dataset
#

view = dataset.exclude_fields(["ground_truth", "predictions"])
Parameters

field_names – a field name or iterable of field names to exclude

Returns
exclude_objects(objects)

Excludes the specified objects from the view.

The returned view will omit the objects specified in the provided objects argument, which should have the following format:

[
{
"field": "ground_truth",
},
{
"field": "ground_truth",
},
...
]

Examples:

import fiftyone as fo

#
# Exclude the objects currently selected in the App
#

session = fo.launch_app(dataset)

# Select some objects in the App...

view = dataset.exclude_objects(session.selected_objects)
Parameters

objects – a list of dicts specifying the objects to exclude

Returns
exists(field, bool=True)

Returns a view containing the samples that have (or do not have) a non-None value for the given field.

Examples:

import fiftyone as fo

#
# Only include samples that have a value in their predictions
# field
#

view = dataset.exists("predictions")

#
# Only include samples that do NOT have a value in their
# predictions field
#

view = dataset.exists("predictions", False)
Parameters
• field – the field

• bool (True) – whether to check if the field exists (True) or does not exist (False)

Returns
export(export_dir=None, dataset_type=None, dataset_exporter=None, label_field=None, label_prefix=None, labels_dict=None, frame_labels_field=None, frame_labels_prefix=None, frame_labels_dict=None, overwrite=False, **kwargs)

Exports the samples in the collection to disk.

Provide either export_dir and dataset_type or dataset_exporter to perform an export.

See this guide for more details about exporting datasets in custom formats by defining your own DatasetExporter.

Parameters
• export_dir (None) – the directory to which to export the samples in format dataset_type

• dataset_type (None) – the fiftyone.types.dataset_types.Dataset type to write. If not specified, the default type for label_field is used

• dataset_exporter (None) – a fiftyone.utils.data.exporters.DatasetExporter to use to export the samples

• label_field (None) – the name of the label field to export. Only applicable to labeled image datasets or labeled video datasets with sample-level labels. If none of label_field, label_prefix, and labels_dict are specified and the requested output type is a labeled image dataset or labeled video dataset with sample-level labels, the first field of compatible type for the output format is used

• label_prefix (None) – a label field prefix; all fields whose name starts with the given prefix will be exported (with the prefix removed when constructing the label dicts). Only applicable to labeled image datasets or labeled video datasets with sample-level labels. This parameter can only be used when the exporter can handle dictionaries of labels

• labels_dict (None) – a dictionary mapping label field names to keys to use when constructing the label dict to pass to the exporter. Only applicable to labeled image datasets or labeled video datasets with sample-level labels. This parameter can only be used when the exporter can handle dictionaries of labels

• frame_labels_field (None) – the name of the frame labels field to export. Only applicable for labeled video datasets. If none of frame_labels_field, frame_labels_prefix, and frame_labels_dict are specified and the requested output type is a labeled video dataset with frame-level labels, the first frame-level field of compatible type for the output format is used

• frame_labels_prefix (None) – a frame labels field prefix; all frame-level fields whose name starts with the given prefix will be exported (with the prefix removed when constructing the frame label dicts). Only applicable for labeled video datasets. This parameter can only be used when the exporter can handle dictionaries of frame-level labels

• frame_labels_dict (None) – a dictionary mapping frame-level label field names to keys to use when constructing the frame labels dicts to pass to the exporter. Only applicable for labeled video datasets. This parameter can only be used when the exporter can handle dictionaries of frame-level labels

• overwrite (False) – when an export_dir is provided, whether to delete the existing directory before performing the export

• **kwargs – optional keyword arguments to pass to dataset_type.get_dataset_exporter_cls(export_dir, **kwargs)

filter_classifications(field, filter, only_matches=False)

Filters the classifications of the given fiftyone.core.labels.Classifications field.

Elements of <field>.classifications for which filter returns False are omitted from the field.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Only include classifications in the predictions field whose
# confidence greater than 0.8
#

view = dataset.filter_classifications(
"predictions", F("confidence") > 0.8
)

#
# Only include classifications in the predictions field whose
# label is "cat" or "dog", and only show samples with at least
# one classification after filtering
#

view = dataset.filter_classifications(
"predictions", F("label").is_in(["cat", "dog"]), only_matches=True
)
Parameters
Returns
filter_detections(field, filter, only_matches=False)

Filters the detections of the given fiftyone.core.labels.Detections field.

Elements of <field>.detections for which filter returns False are omitted from the field.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Only include detections in the predictions field whose
# confidence is greater than 0.8
#

view = dataset.filter_detections(
"predictions", F("confidence") > 0.8
)

#
# Only include detections in the predictions field whose label
# is "cat" or "dog", and only show samples with at least one
# detection after filtering
#

view = dataset.filter_detections(
"predictions", F("label").is_in(["cat", "dog"]), only_matches=True
)

#
# Only include detections in the predictions field whose bounding
# box area is smaller than 0.2
#

# bbox is in [top-left-x, top-left-y, width, height] format
bbox_area = F("bounding_box")[2] * F("bounding_box")[3]

view = dataset.filter_detections("predictions", bbox_area < 0.2)
Parameters
Returns
filter_field(field, filter, only_matches=False)

Filters the values of a given sample (or embedded document) field.

Values of field for which filter returns False are replaced with None.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Only include classifications in the predictions field (assume
# it is a Classification field) whose label is "cat"
#

view = dataset.filter_field("predictions", F("label") == "cat")

#
# Only include classifications in the predictions field (assume
# it is a Classification field) whose confidence is greater
# than 0.8
#

view = dataset.filter_field("predictions", F("confidence") > 0.8)
Parameters
Returns
filter_keypoints(field, filter, only_matches=False)

Filters the keypoints of the given fiftyone.core.labels.Keypoints field.

Elements of <field>.keypoints for which filter returns False are omitted from the field.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F
from fiftyone.core.stages import FilterKeypoints

#
# Only include keypoints in the predictions field whose label
# is "face", and only show samples with at least one keypoint after
# filtering
#

stage = FilterKeypoints(
"predictions", F("label") == "face", only_matches=True
)

#
# Only include keypoints in the predictions field with at least
# 10 points
#

stage = FilterKeypoints("predictions", F("points").length() >= 10)
Parameters
Returns
filter_labels(field, filter, only_matches=False)

Filters the fiftyone.core.labels.Label elements in a labels list field of each sample.

The specified field must be one of the following types:

Classifications Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Only include classifications in the predictions field whose
# confidence greater than 0.8
#

view = dataset.filter_labels("predictions", F("confidence") > 0.8)

#
# Only include classifications in the predictions field whose
# label is "cat" or "dog", and only show samples with at least
# one classification after filtering
#

view = dataset.filter_labels(
"predictions",
F("label").is_in(["cat", "dog"]),
only_matches=True,
)

Detections Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Only include detections in the predictions field whose
# confidence is greater than 0.8
#

stage = filter_labels("predictions", F("confidence") > 0.8)

#
# Only include detections in the predictions field whose label
# is "cat" or "dog", and only show samples with at least one
# detection after filtering
#

view = dataset.filter_labels(
"predictions",
F("label").is_in(["cat", "dog"]),
only_matches=True,
)

#
# Only include detections in the predictions field whose bounding
# box area is smaller than 0.2
#

# bbox is in [top-left-x, top-left-y, width, height] format
bbox_area = F("bounding_box")[2] * F("bounding_box")[3]

view = dataset.filter_labels("predictions", bbox_area < 0.2)

Polylines Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Only include polylines in the predictions field that are filled
#

view = dataset.filter_labels("predictions", F("filled"))

#
# Only include polylines in the predictions field whose label
# is "lane", and only show samples with at least one polyline after
# filtering
#

view = dataset.filter_labels(
"predictions", F("label") == "lane", only_matches=True
)

#
# Only include polylines in the predictions field with at least
# 10 vertices
#

num_vertices = F("points").map(F().length()).sum()
view = dataset.filter_labels("predictions", num_vertices >= 10)

Keypoints Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Only include keypoints in the predictions field whose label
# is "face", and only show samples with at least one keypoint after
# filtering
#

view = dataset.filter_labels(
"predictions", F("label") == "face", only_matches=True
)

#
# Only include keypoints in the predictions field with at least
# 10 points
#

view = dataset.filter_labels(
"predictions", F("points").length() >= 10
)
Parameters
Returns
filter_polylines(field, filter, only_matches=False)

Filters the polylines of the given fiftyone.core.labels.Polylines field.

Elements of <field>.polylines for which filter returns False are omitted from the field.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F
from fiftyone.core.stages import FilterPolylines

#
# Only include polylines in the predictions field that are filled
#

stage = FilterPolylines("predictions", F("filled"))

#
# Only include polylines in the predictions field whose label
# is "lane", and only show samples with at least one polyline after
# filtering
#

stage = FilterPolylines(
"predictions", F("label") == "lane", only_matches=True
)

#
# Only include polylines in the predictions field with at least
# 10 vertices
#

num_vertices = F("points").map(F().length()).sum()
stage = FilterPolylines("predictions", num_vertices >= 10)
Parameters
Returns
first()

Returns the first sample in the collection.

Returns
Raises

ValueError – if the collection is empty

Returns a list of the first few samples in the collection.

If fewer than num_samples samples are in the collection, only the available samples are returned.

Parameters

num_samples (3) – the number of samples

Returns

a list of fiftyone.core.sample.Sample objects

histogram_values(field_name, bins=None, range=None)

Computes a histogram of the numeric values in a field or list field of a collection.

Examples:

import fiftyone as fo

#
# Compute a histogram of values in the float field "uniqueness"
#

r = dataset.histogram_values("uniqueness", bins=50, range=(0, 1))
r.counts  # list of counts
r.edges  # list of bin edges
Parameters
• field_name – the name of the field to histogram

• bins (None) – can be either an integer number of bins to generate or a monotonically increasing sequence specifying the bin edges to use. By default, 10 bins are created. If bins is an integer and no range is specified, bin edges are automatically distributed in an attempt to evenly distribute the counts in each bin

• range (None) – a (lower, upper) tuple specifying a range in which to generate equal-width bins. Only applicable when bins is an integer

last()

Returns the last sample in the collection.

Returns
Raises

ValueError – if the collection is empty

limit(limit)

Returns a view with at most the given number of samples.

Examples:

import fiftyone as fo

#
# Only include the first 10 samples in the view
#

view = dataset.limit(10)
Parameters

limit – the maximum number of samples to return. If a non-positive number is provided, an empty view is returned

Returns
limit_labels(field, limit)

Limits the number of fiftyone.core.labels.Label instances in the specified labels list field of each sample.

The specified field must be one of the following types:

Examples:

import fiftyone as fo

#
# Only include the first 5 detections in the ground_truth field of
# the view
#

view = dataset.limit_labels("ground_truth", 5)
Parameters
• field – the labels list field to filter

• limit – the maximum number of labels to include in each labels list. If a non-positive number is provided, all lists will be empty

Returns
classmethod list_view_stages()

Returns a list of all available methods on this collection that apply fiftyone.core.stages.ViewStage operations that return fiftyone.core.view.DatasetView instances.

Returns

a list of SampleCollection method names

make_unique_field_name(root='')

Makes a unique field name with the given root name for the collection.

Parameters

root – an optional root for the output field name

Returns

the field name

map_labels(field, map)

Maps the label values of fiftyone.core.labels.Label fields to new values.

Examples:

import fiftyone as fo

#
# Map "cat" and "dog" label values to "pet"
#

mapping = {"cat": "pet", "dog": "pet"}
view = dataset.map_labels("ground_truth", mapping)
Parameters
• field – the labels field to map

• map – a dict mapping label values to new label values

Returns
match(filter)

Filters the samples in the collection by the given filter.

Samples for which filter returns False are omitted.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Only include samples whose filepath ends with ".jpg"
#

view = dataset.match(F("filepath").ends_with(".jpg"))

#
# Only include samples whose predictions field (assume it is a
# Classification field) has label of "cat"
#

view = dataset.match(F("predictions").label == "cat"))

#
# Only include samples whose predictions field (assume it is a
# Detections field) has at least 5 detections
#

view = dataset.match(F("predictions").detections.length() >= 5)

#
# Only include samples whose predictions field (assume it is a
# Detections field) has at least one detection with area smaller
# than 0.2
#

# bbox is in [top-left-x, top-left-y, width, height] format
pred_bbox = F("predictions.detections.bounding_box")
pred_bbox_area = pred_bbox[2] * pred_bbox[3]

view = dataset.match((pred_bbox_area < 0.2).length() > 0)
Parameters

filter

a fiftyone.core.expressions.ViewExpression or MongoDB expression that returns a boolean describing the filter to apply

Returns
match_tag(tag)

Returns a view containing the samples that have the given tag.

Examples:

import fiftyone as fo

#
# Only include samples that have the "test" tag
#

view = dataset.match_tag("test")
Parameters

tag – a tag

Returns
match_tags(tags)

Returns a view containing the samples that have any of the given tags.

To match samples that must contain multiple tags, chain multiple match_tag() or match_tags() calls together.

Examples:

import fiftyone as fo

#
# Only include samples that have either the "test" or "validation"
# tag
#

view = dataset.match_tags(["test", "validation"])
Parameters

tags – an iterable of tags

Returns
mongo(pipeline)

Adds a view stage defined by a raw MongoDB aggregation pipeline.

See MongoDB aggregation pipelines for more details.

Examples:

import fiftyone as fo

#
# Extract a view containing the 6th through 15th samples in the
# dataset
#

view = dataset.mongo([{"$skip": 5}, {"$limit": 10}])

#
# Sort by the number of detections in the precictions field of
# the samples (assume it is a Detections field)
#

view = dataset.mongo([
{
"$addFields": { "_sort_field": { "$size": {
"$ifNull": ["$predictions.detections", []]
}
}
}
},
{"$sort": {"_sort_field": -1}}, {"$unset": "_sort_field"}
])
Parameters

pipeline – a MongoDB aggregation pipeline (list of dicts)

Returns
select(sample_ids)

Returns a view containing only the samples with the given IDs.

Examples:

import fiftyone as fo

#
# Select the samples with the given IDs from the dataset
#

view = dataset.select([
"5f3c298768fd4d3baf422d34",
"5f3c298768fd4d3baf422d35",
"5f3c298768fd4d3baf422d36",
])

#
# Create a view containing the currently selected samples in the
# App
#

session = fo.launch_app(dataset=dataset)

# Select samples in the App...

view = dataset.select(session.selected)
Parameters

sample_ids – a sample ID or iterable of sample IDs

Returns
select_fields(field_names=None)

Selects the fields with the given names as the only fields present in the returned fiftyone.core.sample.SampleView instances. All other fields are excluded.

Note that default sample fields are always selected and will be added if not included in field_names.

Examples:

import fiftyone as fo

#
# Include only the default fields on each sample
#

view = dataset.select_fields()

#
# Include only the ground_truth field (and the default fields) on
# each sample
#

view = dataset.select_fields("ground_truth")
Parameters

field_names (None) – a field name or iterable of field names to select. If not specified, just the default fields will be selected

Returns
select_objects(objects)

Selects only the specified objects from the view.

The returned view will omit samples, sample fields, and individual objects that do not appear in the provided objects argument, which should have the following format:

[
{
"field": "ground_truth",
},
{
"field": "ground_truth",
},
...
]

Examples:

import fiftyone as fo

#
# Only include the objects currently selected in the App
#

session = fo.launch_app(dataset)

# Select some objects in the App...

view = dataset.select_objects(session.selected_objects)
Parameters

objects – a list of dicts specifying the objects to select

Returns
shuffle(seed=None)

Randomly shuffles the samples in the collection.

Examples:

import fiftyone as fo

#
# Return a view that contains a randomly shuffled version of the
# samples in the dataset
#

view = dataset.shuffle()

#
# Shuffle the samples with a set random seed
#

view = dataset.shuffle(seed=51)
Parameters

seed (None) – an optional random seed to use when shuffling the samples

Returns
skip(skip)

Omits the given number of samples from the head of the collection.

Examples:

import fiftyone as fo

#
# Omit the first 10 samples from the dataset
#

view = dataset.skip(10)
Parameters

skip – the number of samples to skip. If a non-positive number is provided, no samples are omitted

Returns
sort_by(field_or_expr, reverse=False)

Sorts the samples in the collection by the given field or expression.

When sorting by an expression, field_or_expr can either be a fiftyone.core.expressions.ViewExpression or a MongoDB expression that defines the quantity to sort by.

Examples:

import fiftyone as fo
from fiftyone import ViewField as F

#
# Sorts the samples in descending order by the confidence of
# their predictions field (assume it is a Classification field)
#

view = dataset.sort_by("predictions.confidence", reverse=True)

#
# Sorts the samples in ascending order by the number of detections
# in their predictions field (assume it is a Detections` field)
# whose bounding box area is at most 0.2
#

# bbox is in [top-left-x, top-left-y, width, height] format
pred_bbox = F("predictions.detections.bounding_box")
pred_bbox_area = pred_bbox[2] * pred_bbox[3]

view = dataset.sort_by((pred_bbox_area < 0.2).length())
Parameters
• field_or_expr – the field or expression to sort by

• reverse (False) – whether to return the results in descending order

Returns
tail(num_samples=3)

Returns a list of the last few samples in the collection.

If fewer than num_samples samples are in the collection, only the available samples are returned.

Parameters

num_samples (3) – the number of samples

Returns

a list of fiftyone.core.sample.Sample objects

take(size, seed=None)

Randomly samples the given number of samples from the collection.

Examples:

import fiftyone as fo

#
# Take 10 random samples from the dataset
#

view = dataset.take(10)

#
# Take 10 random samples from the dataset with a set seed
#

view = dataset.take(10, seed=51)
Parameters
• size – the number of samples to return. If a non-positive number is provided, an empty view is returned

• seed (None) – an optional random seed to use when selecting the samples

Returns
to_json(rel_dir=None, frame_labels_dir=None, pretty_print=False)

Returns a JSON string representation of the collection.

The samples will be written as a list in a top-level samples field of the returned dictionary.

Parameters
• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations

Returns

a JSON string

validate_field_type(field_name, ftype, embedded_doc_type=None, subfield=None)

Validates that the collection has a field of the given type.

Parameters
Raises

ValueError – if the field does not exist or does not have the expected type

validate_fields_exist(field_or_fields)

Validates that the collection has fields with the given names.

If field_or_fields contains an embedded field name such as field_name.document.field, only the root field_name is checked for existence.

Parameters

field_or_fields – a field name or iterable of field names

Raises

ValueError – if one or more of the fields do not exist

write_json(json_path, rel_dir=None, frame_labels_dir=None, pretty_print=False)

Writes the colllection to disk in JSON format.

Parameters
• json_path – the path to write the JSON

• rel_dir (None) – a relative directory to remove from the filepath of each sample, if possible. The path is converted to an absolute path (if necessary) via os.path.abspath(os.path.expanduser(rel_dir)). The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directory

• frame_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets

• pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations