Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
Wait, What’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
- If you like what you see on GitHub, give the project a star
- Get started! We’ve made it easy to get up and running in a few minutes
- Join the FiftyOne Slack community, we’re always happy to help
Ok, let’s dive into this week’s tips and tricks!
Viewing the FiftyOne App with Interactive Plots
Community Slack member Patrick Rowsome asked,
“Is it possible to open the FiftyOne App alongside an interactive plot in JupyterLab?
Yes! There are multiple ways of accomplishing this.
By right-clicking the output of a cell in a JupyterLab notebook and selecting ‘Create New View for Output’, you can view the output — namely the FiftyOne GUI — in another JupyterLab tab. You can then drag the tab horizontally or vertically, as illustrated here, to create a split view.
Alternatively, you can view the FiftyOne App in a separate browser window or tab. To do this, pass the option auto = False
when you create a session:
session = fo.launch_app(…, auto=False)
You can find the URL for the session by running session.url
, which you can then copy and paste into your browser.
Learn more about InteractivePlot
and interactive plotting in Jupyter notebooks in the FiftyOne Docs.
Importing Data from CVAT
Community Slack member Joy Timmermans asked,
“I currently already have some data loaded into CVAT. is it possible to connect that directly to FiftyOne?”
FiftyOne has a very tight integration with CVAT which makes it easy to both create a FiftyOne dataset from a list of CVAT tasks, and to create CVAT annotation tasks directly from FiftyOne.
If the media files are not downloaded to disk, then you can import annotations from CVAT using our provided import_annotations
method, as in the example below.
import fiftyone as fo
import fiftyone.utils.cvat as fouc
dataset = fo.Dataset("my-dataset")
fouc.import_annotations(
dataset,
task_ids=[...],
download_media=True,
)
Alternatively, if the media is downloaded to disk, you can provide the data_path
argument instead of download_media
.
Learn more about FiftyOne’s CVAT integration in the FiftyOne Docs.
Selecting Samples with Attribute Set
Community Slack member Geoffrey Keating asked,
“I want to pull only the samples that have an attribute set on a Detections field. The field is a bool and will only belong to detections of a certain label. What is the best way to do this?”
While this can also be accomplished using filter_field
, it is best to use the match
stage. This is the case any time you want to match samples based on a condition. The matching operation can be performed with the following syntax:
view = dataset.match(
F("ground_truth.detections")
.filter(F("condition") == True)
.length() > 0
)
Learn more about DatasetView
and matching conditions in the FiftyOne Docs.
Using FiftyOne with Images from the Internet
An anonymous user on Stack Overflow asked,
“Is it possible to use FiftyOne with images available at external URLs (in google images for example) without downloading the images first?
This functionality is available in FiftyOne Teams, with which you can point samples directly to an https://
image (as well as to media stored in Amazon S3, Google Cloud Storage, Azure, MinIO, etc.).
If you’re using the open source FiftyOne library, then you should download the images first. Learn more about loading a dataset from disk in the FiftyOne Docs.
Sending Large Annotation Jobs to Label Studio
Community Slack member Brett Israelsen asked,
“I’m trying to send an annotation job out to Label Studio. My dataset is not currently labeled, so I’m sending the whole thing. Is there a limit to how many files can be sent in a single batch?”
The FiftyOne Label Studio integration uploads all media files at once. If you have a large dataset (either large images, many images, or both), then it is probably best to batch the images you send for annotation, for example:
for batch in range(0, len(dataset), batch_size):
view = dataset[batch*batch_size : (batch+1) * batch_size]
results = view.annotate(..., backend="labelstudio", project_name="my_project")
It is important that the annotation key passed in to view.annotate()
is unique for each batch. Learn more about FiftyOne’s Label Studio integration in the FiftyOne Docs.
Alternatively, FiftyOne Teams supports cloud-backed datasets, which allows for the creation of tasks with minimal moving around of media files.
What’s Next?
- If you like what you see on GitHub, give the project a star
- Get started! We’ve made it easy to get up and running in a few minutes
- Join the FiftyOne Slack community, we’re always happy to help