Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
As an open source community, the FiftyOne community is open to all. This means everyone is welcome to ask questions, and everyone is welcome to answer them. Continue reading to see the latest questions asked and answers provided!
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.
Ok, let’s dive into this week’s tips and tricks!
Using dataset.export to specify arbitrary file paths
Community Slack member Dimitrios asked:
Is there a way to add an arbitrary file’s path as a sample field, and have the file copied over to a new location when running dataset.export?
One approach is to edit one of the exporters that currently exist for media files or write your own custom exporter that does this for you. Check out the “Exporting FiftyOne Datasets” section in the Docs as well as the “Custom Formats” section.
Clearing selected samples in the FiftyOne App programmatically
Community Slack member Nadav asked:
Is there SDK code that allows me to programmatically clear all the selected samples in the Fiftyone App?
Depending on your use case, you have two possible options.
- You can use the
session.clear_selected()
method or ctx.trigger('clear_selected_samples')
within a plugin, if you don’t have the session object initialized. Learn more in the Docs.
Adding custom attributes
Community Slack member Villus asked:
I would like to extend FiftyOne’s metadata fields. Is there a way to do this?
Yes, you can make use of FiftyOne’s dataset.add_dynamic_sample_fields()
method. You can learn more working with arbitrary custom attributes in the “Dynamic Attributes” section of the Docs.
Working with very long video samples
Community Slack member Daniel asked:
I’m using the FiftyOne app to retrieve a view of video samples that have a certain tag. Unfortunately, I get an error when I click on the tag. I suspect I am getting this error because the size of the frames field is likely too big (around an hour.) Is there a way to solve this or should I just split the big videos into smaller chunks?
There’s a couple of workarounds to consider here:
- You can increase the maximum BSON document size limit in MongoDB, but this is generally not recommended as it can lead to performance issues and other unintended consequences. MongoDB’s limits are set for good reasons related to performance and stability.
- You can optimize your aggregation pipeline so that it is more efficient. This might involve filtering data earlier in the pipeline to reduce the amount of data being processed in the
$lookup
stage. - As you suggest, you can split up the videos. Here’s some code to get you started:
import subprocess import os import fiftyone as fo def split_video(video_path, segment_duration): # Ensure the output directory exists output_dir = "split_videos" os.makedirs(output_dir, exist_ok=True) # Construct the ffmpeg command base_name = os.path.basename(video_path).split('.')[0] command = [ "ffmpeg", "-i", video_path, "-c", "copy", "-map", "0", "-segment_time", str(segment_duration), # segment duration in seconds "-f", "segment", "-reset_timestamps", "1", os.path.join(output_dir, f"{base_name}_%03d.mp4") ] # Run the command subprocess.run(command) # Load your dataset dataset = fo.load_dataset("your_dataset_name") # Function to create subsets of frames def create_frame_subsets(sample, frame_step=1000): frames = sample.frames # Determine the range of frames frame_numbers = sorted(frames.keys()) max_frame = frame_numbers[-1] # Create subsets for start_frame in range(1, max_frame, frame_step): end_frame = min(start_frame + frame_step - 1, max_frame) # Create a view with the frame range frame_view = dataset.filter_labels("frames", (foe.FrameNumber >= start_frame) & (foe.FrameNumber <= end_frame)) # Process this frame_view or add it to a new dataset # ... # Apply to each sample in the dataset for sample in dataset: create_frame_subsets(sample)
How to filter all unlabeled samples programmatically
Community Slack member Nadav asked:
I have a classification field in my data. What is the best way to filter all unlabeled samples? Currently I’m using ctx.dataset.match(F(source_field) != None) but Pycharm raises a warning about that.
Three possible solutions:
- pycharm is probably saying you should do
is not
instead of!=
when usingNone
- You could also use something like
dataset.match(F(source_field).exists(False))
- And finally, you could try
dataset.exists(source_field, bool=False)
. Check out the “exists” section on the “FiftyOne Core Collections” Docs.