Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
Ok, let’s dive into this week’s tips and tricks!
Sorting samples in a collection by fields or expressions
Community Slack member Sybil Lyu asked,
“Can I use SortBy
with an expression in FiftyOne via the add stage
button in the FiftyOne App?”
Today, the best way to sort by an expression is via Python. If you run some of the examples from the Docs, you’ll see the equivalent JSON that you’d need to type into the SortBy
stage in the view bar of the App to create the view:
# Sort by number of GT objects
view = dataset.sort_by(F("ground_truth.detections").length(), reverse=True)
# Click into view bar to see equivalent JSON
session = fo.launch_app(view)
Connecting the FiftyOne client to MongoDB
Community Slack member Naman Gupta asked,
“Is it possible to connect to an already running local FiftyOne instance from a Jupyter notebook without having to spin up another FiftyOne server? I want to connect to the locally running instance and either load, filter or export datasets on the same machine.”
You can run MongoDB in a separate container and then configure your FiftyOne client in the Jupyter container to connect to it. Other than MongoDB, there’s no FiftyOne “server” in the open source package. FiftyOne Teams, on the other hand, provides a centralized MongoDB database and a FiftyOne App server allowing everyone on your team to easily load the same datasets in Python and the App.
Learn more about working with data and notebooks in the FiftyOne Docs.
Mapping ground truth labels in a bounding box problem
Community Slack member Raghav Mecheri asked,
“I’m trying to map a set of ground truth labels to a broader set of categories for a bounding box problem. For example turning bounding boxes that have labels for “audi”, “bmw”, “mercedes” all into “car”. I could iterate through each image as I load it, but I feel that there’s probably a “right” way to do this in FiftyOne — any good starting points?”
You can use map_labels()
for this! view = dataset.map_labels(...)
This will give you a view that dynamically renames the labels when you iterate over/visualize it in the App. If you want to save the changes to the actual dataset, just add: view.save()
Merging samples and updating labels
Community Slack member Jason Barbee asked,
“I cloned a dataset, changed the ground_truth labels on samples, but running main_dataset.merge(working_dataset) doesn’t seem to overwrite my existing labels. Is there a replace_sample type API?”
By default, when using merge_samples()
, the merge_lists
attribute is True
, meaning that for lists of labels like detections, the two lists will be merged based on label ID rather than the working dataset overwriting all main dataset labels. If you set merge_lists=False
, then it will discard all existing labels and keep only the labels from the dataset being merged in.
Learn more about merge_samples
in the FiftyOne Docs.
Using FiftyOne datasets with the PyTorch dataloader
Community Slack member Sidney Guaro asked,
“Is it possible to use a FiftyOne dataset in PyTorch dataloader?”
We do have some integrations with PyTorch Lightning Flash, as well as a Detectron2 tutorial. But you can also always integrate FiftyOne datasets right into PyTorch dataloaders (check out this blog). Here is an example from the blog that sets up a torch dataset from FiftyOne:
import torch
import fiftyone.utils.coco as fouc
from PIL import Image
class FiftyOneTorchDataset(torch.utils.data.Dataset):
"""A class to construct a PyTorch dataset from a FiftyOne dataset.
Args:
fiftyone_dataset: a FiftyOne dataset or view that will be used for training or testing
transforms (None): a list of PyTorch transforms to apply to images and targets when loading
gt_field ("ground_truth"): the name of the field in fiftyone_dataset that contains the
desired labels to load
classes (None): a list of class strings that are used to define the mapping between
class names and indices. If None, it will use all classes present in the given fiftyone_dataset.
"""
def __init__(
self,
fiftyone_dataset,
transforms=None,
gt_field="ground_truth",
classes=None,
):
self.samples = fiftyone_dataset
self.transforms = transforms
self.gt_field = gt_field
self.img_paths = self.samples.values("filepath")
self.classes = classes
if not self.classes:
# Get list of distinct labels that exist in the view
self.classes = self.samples.distinct(
"%s.detections.label" % gt_field
)
if self.classes[0] != "background":
self.classes = ["background"] + self.classes
self.labels_map_rev = {c: i for i, c in enumerate(self.classes)}
def __getitem__(self, idx):
img_path = self.img_paths[idx]
sample = self.samples[img_path]
metadata = sample.metadata
img = Image.open(img_path).convert("RGB")
boxes = []
labels = []
area = []
iscrowd = []
detections = sample[self.gt_field].detections
for det in detections:
category_id = self.labels_map_rev[det.label]
coco_obj = fouc.COCOObject.from_label(
det, metadata, category_id=category_id,
)
x, y, w, h = coco_obj.bbox
boxes.append([x, y, x + w, y + h])
labels.append(coco_obj.category_id)
area.append(coco_obj.area)
iscrowd.append(coco_obj.iscrowd)
target = {}
target["boxes"] = torch.as_tensor(boxes, dtype=torch.float32)
target["labels"] = torch.as_tensor(labels, dtype=torch.int64)
target["image_id"] = torch.as_tensor([idx])
target["area"] = torch.as_tensor(area, dtype=torch.float32)
target["iscrowd"] = torch.as_tensor(iscrowd, dtype=torch.int64)
if self.transforms is not None:
img, target = self.transforms(img, target)
return img, target
def __len__(self):
return len(self.img_paths)
def get_classes(self):
return self.classes
What’s next?
- If you like what you see on GitHub, give the project a star
- Get started! We’ve made it easy to get up and running in a few minutes
- Join the FiftyOne Slack community, we’re always happy to help