fiftyone.core.view¶
Dataset views.
Classes
|
A view into a |
-
class
fiftyone.core.view.
DatasetView
(dataset)¶ Bases:
fiftyone.core.collections.SampleCollection
A view into a
fiftyone.core.dataset.Dataset
.Dataset views represent ordered collections of subsets of samples in a dataset.
Operations on dataset views are designed to be chained together to yield the desired subset of the dataset, which is then iterated over to directly access the sample views. Each stage in the pipeline defining a dataset view is represented by a
fiftyone.core.stages.ViewStage
instance.The stages of a dataset view specify:
what subset of samples (and their order) should be included
what “parts” (fields and their elements) of the sample should be included
Samples retrieved from dataset views are returns as
fiftyone.core.sample.SampleView
objects, as opposed tofiftyone.core.sample.Sample
objects, since they may contain a subset of the sample’s content.Example use:
# Print paths for 5 random samples from the test split of a dataset view = dataset.match_tag("test").take(5) for sample in view: print(sample.filepath)
- Parameters
dataset – a
fiftyone.core.dataset.Dataset
-
property
media_type
¶ The media type of the underlying dataset.
-
property
name
¶ The name of the view.
-
property
dataset_name
¶ The name of the underlying dataset.
-
property
info
¶ The
fiftyone.core.dataset.Dataset.info()
dict of the underlying dataset.
-
property
stages
¶ The list of
fiftyone.core.stages.ViewStage
instances in this view’s pipeline.
-
summary
()¶ Returns a string summary of the view.
- Returns
a string summary
-
iter_samples
()¶ Returns an iterator over the samples in the view.
- Returns
an iterator over
fiftyone.core.sample.SampleView
instances
-
get_field_schema
(ftype=None, embedded_doc_type=None, include_private=False)¶ Returns a schema dictionary describing the fields of the samples in the view.
- Parameters
ftype (None) – an optional field type to which to restrict the returned schema. Must be a subclass of
fiftyone.core.fields.Field
embedded_doc_type (None) – an optional embedded document type to which to restrict the returned schema. Must be a subclass of
fiftyone.core.odm.BaseEmbeddedDocument
include_private (False) – whether to include fields that start with _ in the returned schema
- Returns
an
OrderedDict
mapping field names to field types
-
get_frame_field_schema
(ftype=None, embedded_doc_type=None, include_private=False)¶ Returns a schema dictionary describing the fields of the frames of the samples in the view.
Only applicable for video datasets.
- Parameters
ftype (None) – an optional field type to which to restrict the returned schema. Must be a subclass of
fiftyone.core.fields.Field
embedded_doc_type (None) – an optional embedded document type to which to restrict the returned schema. Must be a subclass of
fiftyone.core.odm.BaseEmbeddedDocument
include_private (False) – whether to include fields that start with _ in the returned schema
- Returns
a dictionary mapping field names to field types, or
None
if the dataset is not a video dataset
-
list_indexes
()¶ Returns the fields of the dataset that are indexed.
- Returns
a list of field names
-
create_index
(field, unique=False)¶ Creates an index on the given field.
If the given field already has a unique index, it will be retained regardless of the
unique
value you specify.If the given field already has a non-unique index but you requested a unique index, the existing index will be dropped.
Indexes enable efficient sorting, merging, and other such operations.
- Parameters
field – the field name or
embedded.field.name
unique (False) – whether to add a uniqueness constraint to the index
-
drop_index
(field)¶ Drops the index on the given field.
- Parameters
field – the field name or
embedded.field.name
-
clone_sample_field
(field_name, new_field_name)¶ Clones the given sample field of the view into a new field of the dataset.
You can use dot notation (
embedded.field.name
) to clone embedded fields.- Parameters
field_name – the field name to clone
new_field_name – the new field name to populate
-
clone_frame_field
(field_name, new_field_name)¶ Clones the frame-level field of the view into a new field.
You can use dot notation (
embedded.field.name
) to clone embedded frame fields.Only applicable to video datasets.
- Parameters
field_name – the field name
new_field_name – the new field name
-
clear_sample_field
(field_name)¶ Clears the values of the field from all samples in the view.
The field will remain in the dataset’s schema, and all samples in the view will have the value
None
for the field.You can use dot notation (
embedded.field.name
) to clear embedded fields.- Parameters
field_name – the field name
-
clear_frame_field
(field_name)¶ Clears the values of the frame field from all samples in the view.
The field will remain in the dataset’s frame schema, and all frames in the view will have the value
None
for the field.You can use dot notation (
embedded.field.name
) to clear embedded frame fields.Only applicable to video datasets.
- Parameters
field_name – the field name
-
save
()¶ Overwrites the underlying dataset with the contents of the view.
WARNING: this will permanently delete any omitted, filtered, or otherwise modified contents of the dataset.
-
clone
(name=None)¶ Creates a new dataset containing only the contents of the view.
- Parameters
name (None) – a name for the cloned dataset. By default,
get_default_dataset_name()
is used- Returns
the new
Dataset
-
to_dict
(rel_dir=None, frame_labels_dir=None, pretty_print=False)¶ Returns a JSON dictionary representation of the view.
- Parameters
rel_dir (None) – a relative directory to remove from the
filepath
of each sample, if possible. The path is converted to an absolute path (if necessary) viaos.path.abspath(os.path.expanduser(rel_dir))
. The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directoryframe_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets
pretty_print (False) – whether to render frame labels JSON in human readable format with newlines and indentations. Only applicable to video datasets when a
frame_labels_dir
is provided
- Returns
a JSON dict
-
add_stage
(stage)¶ Applies the given
fiftyone.core.stages.ViewStage
to the collection.- Parameters
stage – a
fiftyone.core.stages.ViewStage
- Returns
- Raises
fiftyone.core.stages.ViewStageError – if the stage was not a valid stage for this collection
-
aggregate
(aggregations, _attach_frames=True)¶ Aggregates one or more
fiftyone.core.aggregations.Aggregation
instances.Note that it is best practice to group aggregations into a single call to
aggregate()
, as this will be more efficient than performing multiple aggregations in series.- Parameters
aggregations – an
fiftyone.core.aggregations.Aggregation
or iterable of<fiftyone.core.aggregations.Aggregation>
instances- Returns
an
fiftyone.core.aggregations.AggregationResult
or list offiftyone.core.aggregations.AggregationResult
instances corresponding to the input aggregations
-
apply_model
(model, label_field='predictions', confidence_thresh=None, batch_size=None)¶ Applies the
fiftyone.core.models.Model
to the samples in the collection.- Parameters
model – a
fiftyone.core.models.Model
label_field ("predictions") – the name (or prefix) of the field in which to store the model predictions
confidence_thresh (None) – an optional confidence threshold to apply to any applicable labels generated by the model
batch_size (None) – an optional batch size to use. Only applicable for image samples
-
bounds
(field_name)¶ Computes the bounds of a numeric field or numeric list field of a collection.
Examples:
import fiftyone as fo dataset = fo.Dataset() dataset.add_samples( [ fo.Sample( filepath="/path/to/image1.png", numeric_field=1.0, numeric_list_field=[1.0, 2.0, 3.0], ), fo.Sample( filepath="/path/to/image2.png", numeric_field=4.0, numeric_list_field=[1.5, 2.5], ), ] ) # Add a generic list field dataset.add_sample_field("list_field", fo.ListField) # # Compute the bounds of a numeric field # r = dataset.bounds("numeric_field") r.bounds # (min, max) # # Compute the a bounds of a numeric list field # r = dataset.bounds("numeric_list_field") r.bounds # (min, max) # # Cannot compute bounds of a generic list field # dataset.bounds("list_field") # error
- Parameters
field_name – the name of the field to compute bounds for
- Returns
-
compute_embeddings
(model, embeddings_field=None, batch_size=None)¶ Computes embeddings for the samples in the collection using the given
fiftyone.core.models.Model
.The
model
must expose embeddings, i.e.,fiftyone.core.models.Model.has_embeddings()
must returnTrue
.If an
embeddings_field
is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.- Parameters
model – a
fiftyone.core.models.Model
embeddings_field (None) – the name of a field in which to store the embeddings
batch_size (None) – an optional batch size to use. Only applicable for image samples
- Returns
None
, if anembeddings_field
is provided; otherwise, a numpy array whose first dimension islen(samples)
containing the embeddings
-
compute_metadata
(overwrite=False, num_workers=None)¶ Populates the
metadata
field of all samples in the collection.Any samples with existing metadata are skipped, unless
overwrite == True
.- Parameters
overwrite (False) – whether to overwrite existing metadata
num_workers (None) – the number of processes to use. By default,
multiprocessing.cpu_count()
is used
-
compute_patch_embeddings
(model, patches_field, embeddings_field=None, batch_size=None, force_square=False, alpha=None)¶ Computes embeddings for the image patches defined by
patches_field
of the samples in the collection using the givenfiftyone.core.models.Model
.The
model
must expose embeddings, i.e.,fiftyone.core.models.Model.has_embeddings()
must returnTrue
.If an
embeddings_field
is provided, the embeddings are saved to the samples; otherwise, the embeddings are returned in-memory.- Parameters
model – a
fiftyone.core.models.Model
patches_field – a
fiftyone.core.labels.Detection
,fiftyone.core.labels.Detections
,fiftyone.core.labels.Polyline
, orfiftyone.core.labels.Polylines
field defining the image patches in each sample to embedembeddings_field (None) – the name of a field in which to store the embeddings
batch_size (None) – an optional batch size to use
force_square (False) – whether to minimally manipulate the patch bounding boxes into squares prior to extraction
alpha (None) – an optional expansion/contraction to apply to the patches before extracting them, in
[-1, \infty)
. If provided, the length and width of the box are expanded (or contracted, whenalpha < 0
) by(100 * alpha)%
. For example, setalpha = 1.1
to expand the boxes by 10%, and setalpha = 0.9
to contract the boxes by 10%
- Returns
None
, if anembeddings_field
is provided; otherwise, a dict mapping sample IDs to arrays of patch embeddings
-
confidence_bounds
(field_name)¶ Computes the bounds of the
confidence
of afiftyone.core.labels.Label
field of a collection.Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Compute the confidence bounds of a `Classification` field # r = dataset.confidence_bounds("predictions") r.bounds # (min, max) # # Compute the confidence bounds of a `Detections` field # r = dataset.confidence_bounds("detections") r.bounds # (min, max)
- Parameters
field_name – the name of the label field to compute confidence bounds for
- Returns
-
count
(field_name=None)¶ Counts the number of samples or number of items with respect to a field of a collection.
If a
field
is provided, it can be afiftyone.core.fields.ListField
or afiftyone.core.labels.Label
list field.Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Compute the number of samples in a dataset # r = dataset.count() r.count # # Compute the number of objects in a `Detections` field # r = dataset.count("detections") r.count
- Parameters
field_name (None) – the field whose items to count. If no field name is provided, the samples themselves are counted
- Returns
-
count_labels
(field_name)¶ Counts the
label
values in afiftyone.core.labels.Label
field of a collection.Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Compute label counts for a `Classification` field called "class" # r = dataset.count_labels("class") r.labels # dict mapping labels to counts # # Compute label counts for a `Detections` field called "objects" # r = dataset.count_labels("objects") r.labels # dict mapping labels to counts
- Parameters
field_name – the name of the label field
- Returns
-
count_values
(field_name)¶ Counts the occurrences of values in a countable field or list of countable fields of a collection.
Countable fields are:
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Compute the tag counts in the dataset # r = dataset.count_values("tags") r.values # dict mapping tags to counts
- Parameters
field_name – the name of the countable field
- Returns
-
distinct
(field_name)¶ Computes the distinct values of a countable field or a list of countable fields of a collection.
Countable fields are:
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Compute the distinct values of a StringField named `kind` # r = dataset.distinct("kind") r.values # list of distinct values # # Compute the a bounds of the `tags field # r = dataset.distinct("tags") r.values # list of distinct values
- Parameters
field_name – the name of the field to compute distinct values for
- Returns
-
distinct_labels
(field_name)¶ Computes the distinct label values of a
fiftyone.core.labels.Label
field of a collection.Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Compute the distinct labels of a `Classification` field # r = dataset.distinct_labels("predictions") r.labels # list of distinct labels # # Compute the distinct labels of a `Detections` field # r = dataset.distinct_labels("detections") r.labels # list of distinct labels
- Parameters
field_name – the name of the label field
- Returns
-
draw_labels
(anno_dir, label_fields=None, overwrite=False, annotation_config=None)¶ Renders annotated versions of the samples in the collection with label field(s) overlaid to the given directory.
The filenames of the sample data are maintained, unless a name conflict would occur in
anno_dir
, in which case an index of the form"-%d" % count
is appended to the base filename.Images are written in format
fo.config.default_image_ext
.- Parameters
anno_dir – the directory to write the annotated files
label_fields (None) – a list of
fiftyone.core.labels.Label
fields to render. By default, allfiftyone.core.labels.Label
fields are drawnoverwrite (False) – whether to delete
anno_dir
if it exists before rendering the labelsannotation_config (None) – an
fiftyone.utils.annotations.AnnotationConfig
specifying how to render the annotations
- Returns
the list of paths to the labeled images
-
exclude
(sample_ids)¶ Excludes the samples with the given IDs from the collection.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Exclude a single sample from a dataset # view = dataset.exclude("5f3c298768fd4d3baf422d2f") # # Exclude a list of samples from a dataset # view = dataset.exclude([ "5f3c298768fd4d3baf422d2f", "5f3c298768fd4d3baf422d30" ])
- Parameters
sample_ids – a sample ID or iterable of sample IDs
- Returns
-
exclude_fields
(field_names)¶ Excludes the fields with the given names from the returned
fiftyone.core.sample.SampleView
instances.Note that default fields cannot be excluded.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Exclude a field from all samples in a dataset # view = dataset.exclude_fields("predictions") # # Exclude a list of fields from all samples in a dataset # view = dataset.exclude_fields(["ground_truth", "predictions"])
- Parameters
field_names – a field name or iterable of field names to exclude
- Returns
-
exclude_objects
(objects)¶ Excludes the specified objects from the view.
The returned view will omit the objects specified in the provided
objects
argument, which should have the following format:[ { "sample_id": "5f8d254a27ad06815ab89df4", "field": "ground_truth", "object_id": "5f8d254a27ad06815ab89df3", }, { "sample_id": "5f8d255e27ad06815ab93bf8", "field": "ground_truth", "object_id": "5f8d255e27ad06815ab93bf6", }, ... ]
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Exclude the objects currently selected in the App # session = fo.launch_app(dataset) # Select some objects in the App... view = dataset.exclude_objects(session.selected_objects)
- Parameters
objects – a list of dicts specifying the objects to exclude
- Returns
-
exists
(field, bool=True)¶ Returns a view containing the samples that have (or do not have) a non-
None
value for the given field.Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Only include samples that have a value in their `predictions` # field # view = dataset.exists("predictions") # # Only include samples that do NOT have a value in their # `predictions` field # view = dataset.exists("predictions", False)
- Parameters
field – the field
bool (True) – whether to check if the field exists (True) or does not exist (False)
- Returns
-
export
(export_dir=None, dataset_type=None, dataset_exporter=None, label_field=None, label_prefix=None, labels_dict=None, frame_labels_field=None, frame_labels_prefix=None, frame_labels_dict=None, overwrite=False, **kwargs)¶ Exports the samples in the collection to disk.
Provide either
export_dir
anddataset_type
ordataset_exporter
to perform an export.See this guide for more details about exporting datasets in custom formats by defining your own
DatasetExporter
.- Parameters
export_dir (None) – the directory to which to export the samples in format
dataset_type
dataset_type (None) – the
fiftyone.types.dataset_types.Dataset
type to write. If not specified, the default type forlabel_field
is useddataset_exporter (None) – a
fiftyone.utils.data.exporters.DatasetExporter
to use to export the sampleslabel_field (None) – the name of the label field to export. Only applicable to labeled image datasets or labeled video datasets with sample-level labels. If none of
label_field
,label_prefix
, andlabels_dict
are specified and the requested output type is a labeled image dataset or labeled video dataset with sample-level labels, the first field of compatible type for the output format is usedlabel_prefix (None) – a label field prefix; all fields whose name starts with the given prefix will be exported (with the prefix removed when constructing the label dicts). Only applicable to labeled image datasets or labeled video datasets with sample-level labels. This parameter can only be used when the exporter can handle dictionaries of labels
labels_dict (None) – a dictionary mapping label field names to keys to use when constructing the label dict to pass to the exporter. Only applicable to labeled image datasets or labeled video datasets with sample-level labels. This parameter can only be used when the exporter can handle dictionaries of labels
frame_labels_field (None) – the name of the frame labels field to export. Only applicable for labeled video datasets. If none of
frame_labels_field
,frame_labels_prefix
, andframe_labels_dict
are specified and the requested output type is a labeled video dataset with frame-level labels, the first frame-level field of compatible type for the output format is usedframe_labels_prefix (None) – a frame labels field prefix; all frame-level fields whose name starts with the given prefix will be exported (with the prefix removed when constructing the frame label dicts). Only applicable for labeled video datasets. This parameter can only be used when the exporter can handle dictionaries of frame-level labels
frame_labels_dict (None) – a dictionary mapping frame-level label field names to keys to use when constructing the frame labels dicts to pass to the exporter. Only applicable for labeled video datasets. This parameter can only be used when the exporter can handle dictionaries of frame-level labels
overwrite (False) – when an
export_dir
is provided, whether to delete the existing directory before performing the export**kwargs – optional keyword arguments to pass to
dataset_type.get_dataset_exporter_cls(export_dir, **kwargs)
-
filter_classifications
(field, filter, only_matches=False)¶ Filters the classifications of the given
fiftyone.core.labels.Classifications
field.Elements of
<field>.classifications
for whichfilter
returnsFalse
are omitted from the field.Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Only include classifications in the `predictions` field whose # `confidence` greater than 0.8 # view = dataset.filter_classifications( "predictions", F("confidence") > 0.8 ) # # Only include classifications in the `predictions` field whose # `label` is "cat" or "dog", and only show samples with at least # one classification after filtering # view = dataset.filter_classifications( "predictions", F("label").is_in(["cat", "dog"]), only_matches=True )
- Parameters
field – the
fiftyone.core.labels.Classifications
fieldfilter – a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to applyonly_matches (False) – whether to only include samples with at least one classification after filtering
- Returns
-
filter_detections
(field, filter, only_matches=False)¶ Filters the detections of the given
fiftyone.core.labels.Detections
field.Elements of
<field>.detections
for whichfilter
returnsFalse
are omitted from the field.Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Only include detections in the `predictions` field whose # `confidence` is greater than 0.8 # view = dataset.filter_detections( "predictions", F("confidence") > 0.8 ) # # Only include detections in the `predictions` field whose `label` # is "cat" or "dog", and only show samples with at least one # detection after filtering # view = dataset.filter_detections( "predictions", F("label").is_in(["cat", "dog"]), only_matches=True ) # # Only include detections in the `predictions` field whose bounding # box area is smaller than 0.2 # # bbox is in [top-left-x, top-left-y, width, height] format bbox_area = F("bounding_box")[2] * F("bounding_box")[3] view = dataset.filter_detections("predictions", bbox_area < 0.2)
- Parameters
field – the
fiftyone.core.labels.Detections
fieldfilter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to applyonly_matches (False) – whether to only include samples with at least one detection after filtering
- Returns
-
filter_field
(field, filter, only_matches=False)¶ Filters the values of a given sample (or embedded document) field.
Values of
field
for whichfilter
returnsFalse
are replaced withNone
.Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Only include classifications in the `predictions` field (assume # it is a `Classification` field) whose `label` is "cat" # view = dataset.filter_field("predictions", F("label") == "cat") # # Only include classifications in the `predictions` field (assume # it is a `Classification` field) whose `confidence` is greater # than 0.8 # view = dataset.filter_field("predictions", F("confidence") > 0.8)
- Parameters
field – the field to filter
filter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to applyonly_matches (False) – whether to only include samples that match the filter
- Returns
-
filter_keypoints
(field, filter, only_matches=False)¶ Filters the keypoints of the given
fiftyone.core.labels.Keypoints
field.Elements of
<field>.keypoints
for whichfilter
returnsFalse
are omitted from the field.Examples:
import fiftyone as fo from fiftyone import ViewField as F from fiftyone.core.stages import FilterKeypoints dataset = fo.load_dataset(...) # # Only include keypoints in the `predictions` field whose `label` # is "face", and only show samples with at least one keypoint after # filtering # stage = FilterKeypoints( "predictions", F("label") == "face", only_matches=True ) view = dataset.add_stage(stage) # # Only include keypoints in the `predictions` field with at least # 10 points # stage = FilterKeypoints("predictions", F("points").length() >= 10) view = dataset.add_stage(stage)
- Parameters
field – the
fiftyone.core.labels.Keypoints
fieldfilter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to applyonly_matches (False) – whether to only include samples with at least one keypoint after filtering
- Returns
-
filter_labels
(field, filter, only_matches=False)¶ Filters the
fiftyone.core.labels.Label
elements in a labels list field of each sample.The specified
field
must be one of the following types:Classifications Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Only include classifications in the `predictions` field whose # `confidence` greater than 0.8 # view = dataset.filter_labels("predictions", F("confidence") > 0.8) # # Only include classifications in the `predictions` field whose # `label` is "cat" or "dog", and only show samples with at least # one classification after filtering # view = dataset.filter_labels( "predictions", F("label").is_in(["cat", "dog"]), only_matches=True, )
Detections Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Only include detections in the `predictions` field whose # `confidence` is greater than 0.8 # stage = filter_labels("predictions", F("confidence") > 0.8) view = dataset.add_stage(stage) # # Only include detections in the `predictions` field whose `label` # is "cat" or "dog", and only show samples with at least one # detection after filtering # view = dataset.filter_labels( "predictions", F("label").is_in(["cat", "dog"]), only_matches=True, ) # # Only include detections in the `predictions` field whose bounding # box area is smaller than 0.2 # # bbox is in [top-left-x, top-left-y, width, height] format bbox_area = F("bounding_box")[2] * F("bounding_box")[3] view = dataset.filter_labels("predictions", bbox_area < 0.2)
Polylines Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Only include polylines in the `predictions` field that are filled # view = dataset.filter_labels("predictions", F("filled")) # # Only include polylines in the `predictions` field whose `label` # is "lane", and only show samples with at least one polyline after # filtering # view = dataset.filter_labels( "predictions", F("label") == "lane", only_matches=True ) # # Only include polylines in the `predictions` field with at least # 10 vertices # num_vertices = F("points").map(F().length()).sum() view = dataset.filter_labels("predictions", num_vertices >= 10)
Keypoints Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Only include keypoints in the `predictions` field whose `label` # is "face", and only show samples with at least one keypoint after # filtering # view = dataset.filter_labels( "predictions", F("label") == "face", only_matches=True ) # # Only include keypoints in the `predictions` field with at least # 10 points # view = dataset.filter_labels( "predictions", F("points").length() >= 10 )
- Parameters
field – the labels list field to filter
filter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to applyonly_matches (False) – whether to only include samples with at least one label after filtering
- Returns
-
filter_polylines
(field, filter, only_matches=False)¶ Filters the polylines of the given
fiftyone.core.labels.Polylines
field.Elements of
<field>.polylines
for whichfilter
returnsFalse
are omitted from the field.Examples:
import fiftyone as fo from fiftyone import ViewField as F from fiftyone.core.stages import FilterPolylines dataset = fo.load_dataset(...) # # Only include polylines in the `predictions` field that are filled # stage = FilterPolylines("predictions", F("filled")) view = dataset.add_stage(stage) # # Only include polylines in the `predictions` field whose `label` # is "lane", and only show samples with at least one polyline after # filtering # stage = FilterPolylines( "predictions", F("label") == "lane", only_matches=True ) view = dataset.add_stage(stage) # # Only include polylines in the `predictions` field with at least # 10 vertices # num_vertices = F("points").map(F().length()).sum() stage = FilterPolylines("predictions", num_vertices >= 10) view = dataset.add_stage(stage)
- Parameters
field – the
fiftyone.core.labels.Polylines
fieldfilter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to applyonly_matches (False) – whether to only include samples with at least one polyline after filtering
- Returns
-
first
()¶ Returns the first sample in the collection.
- Returns
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
- Raises
ValueError – if the collection is empty
-
head
(num_samples=3)¶ Returns a list of the first few samples in the collection.
If fewer than
num_samples
samples are in the collection, only the available samples are returned.- Parameters
num_samples (3) – the number of samples
- Returns
a list of
fiftyone.core.sample.Sample
objects
-
histogram_values
(field_name, bins=None, range=None)¶ Computes a histogram of the numeric values in a field or list field of a collection.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Compute a histogram of values in the float field "uniqueness" # r = dataset.histogram_values("uniqueness", bins=50, range=(0, 1)) r.counts # list of counts r.edges # list of bin edges
- Parameters
field_name – the name of the field to histogram
bins (None) – can be either an integer number of bins to generate or a monotonically increasing sequence specifying the bin edges to use. By default, 10 bins are created. If
bins
is an integer and norange
is specified, bin edges are automatically distributed in an attempt to evenly distribute the counts in each binrange (None) – a
(lower, upper)
tuple specifying a range in which to generate equal-width bins. Only applicable whenbins
is an integer
-
last
()¶ Returns the last sample in the collection.
- Returns
a
fiftyone.core.sample.Sample
orfiftyone.core.sample.SampleView
- Raises
ValueError – if the collection is empty
-
limit
(limit)¶ Returns a view with at most the given number of samples.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Only include the first 10 samples in the view # view = dataset.limit(10)
- Parameters
limit – the maximum number of samples to return. If a non-positive number is provided, an empty view is returned
- Returns
-
limit_labels
(field, limit)¶ Limits the number of
fiftyone.core.labels.Label
instances in the specified labels list field of each sample.The specified
field
must be one of the following types:Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Only include the first 5 detections in the `ground_truth` field of # the view # view = dataset.limit_labels("ground_truth", 5)
- Parameters
field – the labels list field to filter
limit – the maximum number of labels to include in each labels list. If a non-positive number is provided, all lists will be empty
- Returns
-
classmethod
list_view_stages
()¶ Returns a list of all available methods on this collection that apply
fiftyone.core.stages.ViewStage
operations that returnfiftyone.core.view.DatasetView
instances.- Returns
a list of
SampleCollection
method names
-
make_unique_field_name
(root='')¶ Makes a unique field name with the given root name for the collection.
- Parameters
root – an optional root for the output field name
- Returns
the field name
-
map_labels
(field, map)¶ Maps the
label
values offiftyone.core.labels.Label
fields to new values.Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Map "cat" and "dog" label values to "pet" # mapping = {"cat": "pet", "dog": "pet"} view = dataset.map_labels("ground_truth", mapping)
- Parameters
field – the labels field to map
map – a
dict
mapping label values to new label values
- Returns
-
match
(filter)¶ Filters the samples in the collection by the given filter.
Samples for which
filter
returnsFalse
are omitted.Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Only include samples whose `filepath` ends with ".jpg" # view = dataset.match(F("filepath").ends_with(".jpg")) # # Only include samples whose `predictions` field (assume it is a # `Classification` field) has `label` of "cat" # view = dataset.match(F("predictions").label == "cat")) # # Only include samples whose `predictions` field (assume it is a # `Detections` field) has at least 5 detections # view = dataset.match(F("predictions").detections.length() >= 5) # # Only include samples whose `predictions` field (assume it is a # `Detections` field) has at least one detection with area smaller # than 0.2 # # bbox is in [top-left-x, top-left-y, width, height] format pred_bbox = F("predictions.detections.bounding_box") pred_bbox_area = pred_bbox[2] * pred_bbox[3] view = dataset.match((pred_bbox_area < 0.2).length() > 0)
- Parameters
filter –
a
fiftyone.core.expressions.ViewExpression
or MongoDB expression that returns a boolean describing the filter to apply- Returns
-
match_tag
(tag)¶ Returns a view containing the samples that have the given tag.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Only include samples that have the "test" tag # view = dataset.match_tag("test")
- Parameters
tag – a tag
- Returns
Returns a view containing the samples that have any of the given tags.
To match samples that must contain multiple tags, chain multiple
match_tag()
ormatch_tags()
calls together.Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Only include samples that have either the "test" or "validation" # tag # view = dataset.match_tags(["test", "validation"])
- Parameters
tags – an iterable of tags
- Returns
-
mongo
(pipeline)¶ Adds a view stage defined by a raw MongoDB aggregation pipeline.
See MongoDB aggregation pipelines for more details.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Extract a view containing the 6th through 15th samples in the # dataset # view = dataset.mongo([{"$skip": 5}, {"$limit": 10}]) # # Sort by the number of detections in the `precictions` field of # the samples (assume it is a `Detections` field) # view = dataset.mongo([ { "$addFields": { "_sort_field": { "$size": { "$ifNull": ["$predictions.detections", []] } } } }, {"$sort": {"_sort_field": -1}}, {"$unset": "_sort_field"} ])
- Parameters
pipeline – a MongoDB aggregation pipeline (list of dicts)
- Returns
-
select
(sample_ids)¶ Returns a view containing only the samples with the given IDs.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Select the samples with the given IDs from the dataset # view = dataset.select([ "5f3c298768fd4d3baf422d34", "5f3c298768fd4d3baf422d35", "5f3c298768fd4d3baf422d36", ]) # # Create a view containing the currently selected samples in the # App # session = fo.launch_app(dataset=dataset) # Select samples in the App... view = dataset.select(session.selected)
- Parameters
sample_ids – a sample ID or iterable of sample IDs
- Returns
-
select_fields
(field_names=None)¶ Selects the fields with the given names as the only fields present in the returned
fiftyone.core.sample.SampleView
instances. All other fields are excluded.Note that default sample fields are always selected and will be added if not included in
field_names
.Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Include only the default fields on each sample # view = dataset.select_fields() # # Include only the `ground_truth` field (and the default fields) on # each sample # view = dataset.select_fields("ground_truth")
- Parameters
field_names (None) – a field name or iterable of field names to select. If not specified, just the default fields will be selected
- Returns
-
select_objects
(objects)¶ Selects only the specified objects from the view.
The returned view will omit samples, sample fields, and individual objects that do not appear in the provided
objects
argument, which should have the following format:[ { "sample_id": "5f8d254a27ad06815ab89df4", "field": "ground_truth", "object_id": "5f8d254a27ad06815ab89df3", }, { "sample_id": "5f8d255e27ad06815ab93bf8", "field": "ground_truth", "object_id": "5f8d255e27ad06815ab93bf6", }, ... ]
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Only include the objects currently selected in the App # session = fo.launch_app(dataset) # Select some objects in the App... view = dataset.select_objects(session.selected_objects)
- Parameters
objects – a list of dicts specifying the objects to select
- Returns
-
shuffle
(seed=None)¶ Randomly shuffles the samples in the collection.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Return a view that contains a randomly shuffled version of the # samples in the dataset # view = dataset.shuffle() # # Shuffle the samples with a set random seed # view = dataset.shuffle(seed=51)
- Parameters
seed (None) – an optional random seed to use when shuffling the samples
- Returns
-
skip
(skip)¶ Omits the given number of samples from the head of the collection.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Omit the first 10 samples from the dataset # view = dataset.skip(10)
- Parameters
skip – the number of samples to skip. If a non-positive number is provided, no samples are omitted
- Returns
-
sort_by
(field_or_expr, reverse=False)¶ Sorts the samples in the collection by the given field or expression.
When sorting by an expression,
field_or_expr
can either be afiftyone.core.expressions.ViewExpression
or a MongoDB expression that defines the quantity to sort by.Examples:
import fiftyone as fo from fiftyone import ViewField as F dataset = fo.load_dataset(...) # # Sorts the samples in descending order by the `confidence` of # their `predictions` field (assume it is a `Classification` field) # view = dataset.sort_by("predictions.confidence", reverse=True) # # Sorts the samples in ascending order by the number of detections # in their `predictions` field (assume it is a `Detections` field) # whose bounding box area is at most 0.2 # # bbox is in [top-left-x, top-left-y, width, height] format pred_bbox = F("predictions.detections.bounding_box") pred_bbox_area = pred_bbox[2] * pred_bbox[3] view = dataset.sort_by((pred_bbox_area < 0.2).length())
- Parameters
field_or_expr – the field or expression to sort by
reverse (False) – whether to return the results in descending order
- Returns
-
tail
(num_samples=3)¶ Returns a list of the last few samples in the collection.
If fewer than
num_samples
samples are in the collection, only the available samples are returned.- Parameters
num_samples (3) – the number of samples
- Returns
a list of
fiftyone.core.sample.Sample
objects
-
take
(size, seed=None)¶ Randomly samples the given number of samples from the collection.
Examples:
import fiftyone as fo dataset = fo.load_dataset(...) # # Take 10 random samples from the dataset # view = dataset.take(10) # # Take 10 random samples from the dataset with a set seed # view = dataset.take(10, seed=51)
- Parameters
size – the number of samples to return. If a non-positive number is provided, an empty view is returned
seed (None) – an optional random seed to use when selecting the samples
- Returns
-
to_json
(rel_dir=None, frame_labels_dir=None, pretty_print=False)¶ Returns a JSON string representation of the collection.
The samples will be written as a list in a top-level
samples
field of the returned dictionary.- Parameters
rel_dir (None) – a relative directory to remove from the
filepath
of each sample, if possible. The path is converted to an absolute path (if necessary) viaos.path.abspath(os.path.expanduser(rel_dir))
. The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directoryframe_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets
pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations
- Returns
a JSON string
-
validate_field_type
(field_name, ftype, embedded_doc_type=None, subfield=None)¶ Validates that the collection has a field of the given type.
- Parameters
field_name – the field name
ftype – the expected field type. Must be a subclass of
fiftyone.core.fields.Field
embedded_doc_type (None) – the
fiftyone.core.odm.BaseEmbeddedDocument
type of the field. Used only whenftype
is an embeddedfiftyone.core.fields.EmbeddedDocumentField
subfield (None) – the type of the contained field. Used only when
ftype
is afiftyone.core.fields.ListField
orfiftyone.core.fields.DictField
- Raises
ValueError – if the field does not exist or does not have the expected type
-
validate_fields_exist
(field_or_fields)¶ Validates that the collection has fields with the given names.
If
field_or_fields
contains an embedded field name such asfield_name.document.field
, only the rootfield_name
is checked for existence.- Parameters
field_or_fields – a field name or iterable of field names
- Raises
ValueError – if one or more of the fields do not exist
-
write_json
(json_path, rel_dir=None, frame_labels_dir=None, pretty_print=False)¶ Writes the colllection to disk in JSON format.
- Parameters
json_path – the path to write the JSON
rel_dir (None) – a relative directory to remove from the
filepath
of each sample, if possible. The path is converted to an absolute path (if necessary) viaos.path.abspath(os.path.expanduser(rel_dir))
. The typical use case for this argument is that your source data lives in a single directory and you wish to serialize relative, rather than absolute, paths to the data within that directoryframe_labels_dir (None) – a directory in which to write per-sample JSON files containing the frame labels for video samples. If omitted, frame labels will be included directly in the returned JSON dict (which can be quite quite large for video datasets containing many frames). Only applicable to video datasets
pretty_print (False) – whether to render the JSON in human readable format with newlines and indentations