Welcome to the latest installment of our ongoing blog series where we highlight notebooks and datasets from the FiftyOne Examples GitHub repository. The fiftyone-examples repository contains over 30 notebooks that make it easy to try various computer vision workflows and explore their associated datasets. In this post, we take a look at how to use the Segment Anything Model (SAM) for predictions and segmentations on Kaggle’s Football Player Segmentation dataset.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
About the Football Player Segmentation Dataset
The Football Player Segmentation dataset is intended for use on computer vision tasks related to player detection and segmentation in football matches. The dataset contains images of players in different playing positions, such as goalkeepers, defenders, midfielders, and forwards, captured from various angles and distances. The images are annotated with pixel-level masks that indicate the player’s location and segmentation boundaries, making it ideal for training deep learning models for player segmentation.
- Maintainer: Yaroslav Isaienkov
- Download: Kaggle
- License: CC0 – Public Domain
- Size: 333 MB (527 images)
Can’t wait? You can preview the dataset in your browser at try.fiftyone.ai.
What is SAM?
In the realm of computer vision, segmentation helps us identify which image pixels belong to an object. This is a common task in computer vision applicable to a broad range of applications, from analyzing scientific imagery to editing photos. Previously, creating an accurate segmentation model for specific tasks required deep technical expertise, plus access to computing infrastructure and large volumes of carefully annotated data.
Earlier this year, Meta announced the Apache 2.0 licensed Segment Anything Model project that aims to democratize segmentation by making it more accessible to data scientists. The project includes both a task dataset and model for image segmentation.
About the Football Player Segmentation notebook
The Football Player Segmentation notebook was authored by Kishan Savant and can be cloned, downloaded (or viewed in nbviewer or Google Colab) here.
Step 1: Install FiftyOne
If you don’t already have FiftyOne installed on your laptop, it takes just a few minutes! For example on MacOs:
- Verify your version of Python
- Create and activate a virtual environment
- Install IPython (optional)
- Upgrade your
Setuptools
- Install FiftyOne
Learn more about how to get up and running with FiftyOne in the Docs.
Step 2: Install the required Python libraries
Now that you have FiftyOne installed, let’s install the Python libraries (including SAM) that we’ll need to successfully work with the notebook.
pip install pycocotools pip install fiftyone umap-learn pip install kaggle pip install torchvision pip install wget pip install opencv-python pip install shapely pip install git+https://github.com/facebookresearch/segment-anything.git
Step 3: Imports
We’ll need to import the FiftyOne Brain, which is a separate Python package for FiftyOne that gives you the ability to visualize embeddings, find similar images, uniqueness, and mistakenness. We’ll also need to import a variety of utilities like os
, cv2
, wget
, matplotlib
, torchvision
, PIL
and numpy
.
import fiftyone as fo import fiftyone.brain as fob from fiftyone import ViewField as F import os import cv2 import wget import matplotlib.pyplot as plt from zipfile import ZipFile import torch import torchvision import numpy as np from segment_anything import SamPredictor, sam_model_registry, SamAutomaticMaskGenerator import PIL
Step 4: Get the current working directory
Next, let’s get the current working directory.
cwd = os.path.abspath(os.getcwd())
Step 5: Download and extract dataset from Kaggle
If you are not already a Kaggle user, you’ll need to create a Kaggle account. After the creation of the account, go to your Kaggle Account page and scroll down to the API section. Here you’ll need to click on “Create New API Token.” A new API token will be created in the form of a kaggle.json
file. This kaggle.json
file contains your Kaggle username and key.
The final steps here are to download this kaggle.json
file to your current working directory, download and extract the dataset.
os.environ['KAGGLE_CONFIG_DIR']=cwd # Download the dataset !kaggle datasets download -d ihelon/football-player-segmentation # Extract the dataset to the current working directory !unzip football-player-segmentation.zip
Step 6: Load the dataset
The football player segmentation dataset is already formatted in the COCO dataset format. This means we can import the dataset into FiftyOne using the fo.types.COCODetectionDataset
common format.
# The directory containing the source images data_path = "./images" # The path to the COCO labels JSON file labels_path = "./annotations/instances_default.json" # name of dataset name = "football-player-segmentation" # Import the dataset dataset = fo.Dataset.from_dir( dataset_type=fo.types.COCODetectionDataset, data_path=data_path, labels_path=labels_path, name=name ) dataset.compute_metadata()
Step 7: Add embeddings
Next, let’s use the FiftyOne Brain’s embedding similarity capability to visualize several scenarios in a football game and launch the results inside the FiftyOne App.
fob.compute_visualization( dataset, model="clip-vit-base32-torch", brain_key="img_sim", ) import fiftyone as fo dataset = fo.Dataset('football-player-segmentation') session = fo.launch_app(dataset)
Overview: Football Player Segmentation dataset in the FiftyOne App
The sample detail view of the Football Player Segmentation dataset in the FiftyOne App
Filtering data
Filter by detection field
With the dataset loaded in the FiftyOne App and the embeddings computed, let’s start exploring some interesting views, filters and segmentations. First, let’s create a simple filter by detection field.
detection_view = dataset.select_fields('detections') session.view=detection_view
Overview: Filter by detection field in the FiftyOne App
Filter by detection field in the sample detail view of the FiftyOne App
Filter by segmentations field
Let’s create another simple filter. In this case let’s do so by segmentations.
segmentation_view = dataset.select_fields('segmentations') session.view=segmentation_view
Overview: Filter by segmentations field in the FiftyOne App
Filter by segmentations field in the sample detail view of the FiftyOne App
Filter by id
Finally, let’s filter by the different positions of the people on the field. In this case, we’ll filter by an Id which is associated with one of the referees.
Filter by Id in the sample detail view of the FiftyOne App
Selecting clusters of samples in the embeddings view
Up next, let’s use the lasso tool in the FiftyOne App’s embeddings view to investigate some interesting clusters of samples.
In the GIF above, we lasso a selection that shows 13 samples of what looks like positions of footballers during a corner kick at a certain side. This set of similar images helps to track the positions of the players before and after the kick. Similar clusters can be used to analyze the player tracking for other corner kicks taken on same and opposite sides during the game.
In the second GIF we lasso a cluster that contains similar images showing the player positions during a throw-in.
Working with SAM
Cool! Now, let’s add segmentation predictions to a subset of dataset using SAM and evaluate them against ground_truths
. First thing to do is download the Segment Anything model.
# Create a new directory named 'model' os.mkdir(cwd+'/model') checkpoint = "sam_vit_b_01ec64.pth" model_url = "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth" model_type = "default" # Set the path to the checkpoint checkpoint_path = cwd+'/model/sam_vit_h_4b8939.pth' # Download the files to their respective paths wget.download(model_url, out = checkpoint_path)
Next, let’s create a predictions
view from a subset of the dataset.
predictions_view = dataset.take(30)
A predictions view comprised of 30 samples
Now, let’s load the SAM model and predictor.
# Set path to the checkpoint running on CPU sam = sam_model_registry[model_type](checkpoint=checkpoint_path) device=torch.device('cuda' if torch.cuda.is_available() else 'cpu') sam.to(device) # Instantiate SAM predictor model predictor = SamPredictor(sam)
Since we have the bounding box available from the detection, we can use the bounding boxes to generate segmentation masks. For an in-depth explanation on the code and how instance segmentation with SAM works, check out this article written by Jacob Marks.
# Converts from fiftyone relative coordinates to absolute def fo_to_sam(box, img_width, img_height): new_box = np.copy(np.array(box)) new_box[0] *= img_width new_box[2] *= img_width new_box[1] *= img_height new_box[3] *= img_height new_box[2] += new_box[0] new_box[3] += new_box[1] return np.round(new_box).astype(int) def add_SAM_mask_to_detection(detection, mask, img_width, img_height): y0, x0, y1, x1 = fo_to_sam(detection.bounding_box, img_width, img_height) mask_trimmed = mask[x0:x1+1, y0:y1+1] detection["mask"] = np.array(mask_trimmed) return detection def add_SAM_instance_segmentation(sample): w, h = sample.metadata.width, sample.metadata.height image = np.array(PIL.Image.open(sample.filepath)) # process the image to produce an image embedding predictor.set_image(image) if sample.detections is None: return dets = sample.detections.detections boxes = [d.bounding_box for d in dets] sam_boxes = np.array([fo_to_sam(box, w, h) for box in boxes]) input_boxes = torch.tensor(sam_boxes, device=predictor.device) transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2]) masks, _, _ = predictor.predict_torch( point_coords=None, point_labels=None, boxes=transformed_boxes, multimask_output=False, ) new_dets = [] for i, det in enumerate(dets): mask = masks[i, 0] new_dets.append(add_SAM_mask_to_detection(det, mask, w, h)) sample['predictions'] = fo.Detections(detections = new_dets) sample.save() def add_SAM_instance_segmentations(dataset): for sample in dataset.iter_samples(autosave=True, progress=True): add_SAM_instance_segmentation(sample) add_SAM_instance_segmentations(predictions_view) session.view = predictions_view
Instance segmentations predictions view in the FiftyOne App
Next, let’s evaluate the predictions.
results = predictions_view.evaluate_detections( "predictions", gt_field="segmentations", eval_key="eval", use_masks = True ) # Convert to evaluation patches eval_patches = predictions_view.to_evaluation_patches("eval") print(eval_patches)
Example output:
Dataset: football-player-segmentation Media type: image Num patches: 445 Patch fields: id: fiftyone.core.fields.ObjectIdField sample_id: fiftyone.core.fields.ObjectIdField filepath: fiftyone.core.fields.StringField tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField) metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata) segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections) predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections) crowd: fiftyone.core.fields.BooleanField type: fiftyone.core.fields.StringField iou: fiftyone.core.fields.FloatField View stages: 1. Take(size=30, seed=None) 2. ToEvaluationPatches(eval_key='eval', config=None)
Dataset: football-player-segmentation Media type: image Num patches: 445 Patch fields: id: fiftyone.core.fields.ObjectIdField sample_id: fiftyone.core.fields.ObjectIdField filepath: fiftyone.core.fields.StringField tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField) metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata) segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections) predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections) crowd: fiftyone.core.fields.BooleanField type: fiftyone.core.fields.StringField iou: fiftyone.core.fields.FloatField View stages: 1. Take(size=30, seed=None) 2. ToEvaluationPatches(eval_key='eval', config=None) print(eval_patches.count_values("type")) {'tp': 425, 'fp': 10, 'fn': 10}
View the patches in the FiftyOne App.
# View patches in the App session.view = eval_patches
Viewing the patches in the FiftyOne App