Skip to content

Using SAM for the Prediction of Segmentations on the Kaggle Football Player Segmentation Dataset

Welcome to the latest installment of our ongoing blog series where we highlight notebooks and datasets from the FiftyOne Examples GitHub repository. The fiftyone-examples repository contains over 30 notebooks that make it easy to try various computer vision workflows and explore their associated datasets. In this post, we take a look at how to use the Segment Anything Model (SAM) for predictions and segmentations on Kaggle’s Football Player Segmentation dataset.

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

About the Football Player Segmentation Dataset

The Football Player Segmentation dataset is intended for use on computer vision tasks related to player detection and segmentation in football matches. The dataset contains images of players in different playing positions, such as goalkeepers, defenders, midfielders, and forwards, captured from various angles and distances. The images are annotated with pixel-level masks that indicate the player’s location and segmentation boundaries, making it ideal for training deep learning models for player segmentation.

Can’t wait? You can preview the dataset in your browser at try.fiftyone.ai.

What is SAM?

In the realm of computer vision, segmentation helps us identify which image pixels belong to an object. This is a common task in computer vision applicable to a broad range of applications, from analyzing scientific imagery to editing photos. Previously, creating an accurate segmentation model for specific tasks required deep technical expertise, plus access to computing infrastructure and large volumes of carefully annotated data.

Earlier this year, Meta announced the Apache 2.0 licensed Segment Anything Model project that aims to democratize segmentation by making it more accessible to data scientists. The project includes both a task dataset and model for image segmentation.

About the Football Player Segmentation notebook

The Football Player Segmentation notebook was authored by Kishan Savant and can be cloned, downloaded (or viewed in nbviewer or Google Colab) here.

Step 1: Install FiftyOne

If you don’t already have FiftyOne installed on your laptop, it takes just a few minutes! For example on MacOs:

Learn more about how to get up and running with FiftyOne in the Docs.

Step 2: Install the required Python libraries

Now that you have FiftyOne installed, let’s install the Python libraries (including SAM) that we’ll need to successfully work with the notebook.

pip install pycocotools
pip install fiftyone umap-learn
pip install kaggle
pip install torchvision 
pip install wget
pip install opencv-python
pip install shapely
pip install git+https://github.com/facebookresearch/segment-anything.git

Step 3: Imports

We’ll need to import the FiftyOne Brain, which is a separate Python package for FiftyOne that gives you the ability to visualize embeddings, find similar images, uniqueness, and mistakenness. We’ll also need to import a variety of utilities like os, cv2, wget, matplotlib, torchvision, PIL and numpy.

import fiftyone as fo
import fiftyone.brain as fob
from fiftyone import ViewField as F
import os
import cv2
import wget
import matplotlib.pyplot as plt
from zipfile import ZipFile
import torch
import torchvision
import numpy as np
from segment_anything import SamPredictor, sam_model_registry, SamAutomaticMaskGenerator
import PIL

Step 4: Get the current working directory

Next, let’s get the current working directory.

cwd = os.path.abspath(os.getcwd())

Step 5: Download and extract dataset from Kaggle

If you are not already a Kaggle user, you’ll need to create a Kaggle account. After the creation of the account, go to your Kaggle Account page and scroll down to the API section. Here you’ll need to click on “Create New API Token.” A new API token will be created in the form of a kaggle.json file. This kaggle.json file contains your Kaggle username and key.

The final steps here are to download this kaggle.json file to your current working directory, download and extract the dataset.

os.environ['KAGGLE_CONFIG_DIR']=cwd

# Download the dataset
!kaggle datasets download -d ihelon/football-player-segmentation

# Extract the dataset to the current working directory
!unzip football-player-segmentation.zip

Step 6: Load the dataset

The football player segmentation dataset is already formatted in the COCO dataset format. This means we can import the dataset into FiftyOne using the fo.types.COCODetectionDataset  common format.

# The directory containing the source images
data_path = "./images"

# The path to the COCO labels JSON file
labels_path = "./annotations/instances_default.json"

# name of dataset
name = "football-player-segmentation"

# Import the dataset
dataset = fo.Dataset.from_dir(
    dataset_type=fo.types.COCODetectionDataset,
    data_path=data_path,
    labels_path=labels_path,
    name=name
)
dataset.compute_metadata()

Step 7: Add embeddings

Next, let’s use the FiftyOne Brain’s embedding similarity capability to visualize several scenarios in a football game and launch the results inside the FiftyOne App.

fob.compute_visualization(
    dataset,
    model="clip-vit-base32-torch",
    brain_key="img_sim",
)

import fiftyone as fo

dataset = fo.Dataset('football-player-segmentation')
session = fo.launch_app(dataset)

Overview: Football Player Segmentation dataset in the FiftyOne App

The sample detail view of the Football Player Segmentation dataset in the FiftyOne App

Filtering data

Filter by detection field

With the dataset loaded in the FiftyOne App and the embeddings computed, let’s start exploring some interesting views, filters and segmentations. First, let’s create a simple filter by detection field.

detection_view = dataset.select_fields('detections')
session.view=detection_view

Overview: Filter by detection field in the FiftyOne App

Filter by detection field in the sample detail view of the FiftyOne App

Filter by segmentations field

Let’s create another simple filter. In this case let’s do so by segmentations.

segmentation_view = dataset.select_fields('segmentations')
session.view=segmentation_view

Overview: Filter by segmentations field in the FiftyOne App

Filter by segmentations field in the sample detail view of the FiftyOne App

Filter by id

Finally, let’s filter by the different positions of the people on the field. In this case, we’ll filter by an Id which is associated with one of the referees.

Filter by Id in the sample detail view of the FiftyOne App

Selecting clusters of samples in the embeddings view

Up next, let’s use the lasso tool in the FiftyOne App’s embeddings view to investigate some interesting clusters of samples.

In the GIF above, we lasso a selection that shows 13 samples of what looks like positions of footballers during a corner kick at a certain side. This set of similar images helps to track the positions of the players before and after the kick. Similar clusters can be used to analyze the player tracking for other corner kicks taken on same and opposite sides during the game.

In the second GIF we lasso a cluster that contains similar images showing the player positions during a throw-in.

Working with SAM

Cool! Now, let’s add segmentation predictions to a subset of dataset using SAM and evaluate them against ground_truths. First thing to do is download the Segment Anything model.

# Create a new directory named 'model'
os.mkdir(cwd+'/model')

checkpoint = "sam_vit_b_01ec64.pth"
model_url = "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth"
model_type = "default"

# Set the path to the checkpoint
checkpoint_path = cwd+'/model/sam_vit_h_4b8939.pth'

# Download the files to their respective paths
wget.download(model_url, out = checkpoint_path)

Next, let’s create a predictions view from a subset of the dataset.

predictions_view = dataset.take(30)

A predictions view comprised of 30 samples

Now, let’s load the SAM model and predictor. 

# Set path to the checkpoint running on CPU
sam = sam_model_registry[model_type](checkpoint=checkpoint_path)
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
sam.to(device)
# Instantiate SAM predictor model
predictor = SamPredictor(sam)

Since we have the bounding box available from the detection, we can use the bounding boxes to generate segmentation masks. For an in-depth explanation on the code and how instance segmentation with SAM works, check out this article written by Jacob Marks.

# Converts from fiftyone relative coordinates to absolute
def fo_to_sam(box, img_width, img_height):
    new_box = np.copy(np.array(box))
    new_box[0] *= img_width
    new_box[2] *= img_width
    new_box[1] *= img_height
    new_box[3] *= img_height
    new_box[2] += new_box[0]
    new_box[3] += new_box[1]
    return np.round(new_box).astype(int)

def add_SAM_mask_to_detection(detection, mask, img_width, img_height):
    y0, x0, y1, x1 = fo_to_sam(detection.bounding_box, img_width, img_height)    
    mask_trimmed = mask[x0:x1+1, y0:y1+1]
    detection["mask"] = np.array(mask_trimmed)
    return detection

def add_SAM_instance_segmentation(sample):
    w, h = sample.metadata.width, sample.metadata.height
    image = np.array(PIL.Image.open(sample.filepath))
    # process the image to produce an image embedding
    predictor.set_image(image)
    
    if sample.detections is None:
        return
    dets = sample.detections.detections
    boxes = [d.bounding_box for d in dets]
    sam_boxes = np.array([fo_to_sam(box, w, h) for box in boxes])
    
    input_boxes = torch.tensor(sam_boxes, device=predictor.device)
    transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
    
    masks, _, _ = predictor.predict_torch(
            point_coords=None,
            point_labels=None,
            boxes=transformed_boxes,
            multimask_output=False,
        )
    
    new_dets = []
    for i, det in enumerate(dets):
        mask = masks[i, 0]
        new_dets.append(add_SAM_mask_to_detection(det, mask, w, h))

    sample['predictions'] = fo.Detections(detections = new_dets)
    sample.save()    

def add_SAM_instance_segmentations(dataset):
    for sample in dataset.iter_samples(autosave=True, progress=True):
        add_SAM_instance_segmentation(sample)

add_SAM_instance_segmentations(predictions_view) 

session.view = predictions_view

Instance segmentations predictions view in the FiftyOne App

Next, let’s evaluate the predictions.

results = predictions_view.evaluate_detections(
    "predictions", gt_field="segmentations", eval_key="eval", use_masks = True
)

# Convert to evaluation patches
eval_patches = predictions_view.to_evaluation_patches("eval")
print(eval_patches)

Example output:

Dataset:     football-player-segmentation
Media type:  image
Num patches: 445
Patch fields:
    id:            fiftyone.core.fields.ObjectIdField
    sample_id:     fiftyone.core.fields.ObjectIdField
    filepath:      fiftyone.core.fields.StringField
    tags:          fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:      fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    predictions:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    crowd:         fiftyone.core.fields.BooleanField
    type:          fiftyone.core.fields.StringField
    iou:           fiftyone.core.fields.FloatField
View stages:
    1. Take(size=30, seed=None)
    2. ToEvaluationPatches(eval_key='eval', config=None)
Dataset:     football-player-segmentation
Media type:  image
Num patches: 445
Patch fields:
    id:            fiftyone.core.fields.ObjectIdField
    sample_id:     fiftyone.core.fields.ObjectIdField
    filepath:      fiftyone.core.fields.StringField
    tags:          fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:      fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    predictions:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    crowd:         fiftyone.core.fields.BooleanField
    type:          fiftyone.core.fields.StringField
    iou:           fiftyone.core.fields.FloatField
View stages:
    1. Take(size=30, seed=None)
    2. ToEvaluationPatches(eval_key='eval', config=None)

print(eval_patches.count_values("type"))

{'tp': 425, 'fp': 10, 'fn': 10}

View the patches in the FiftyOne App.

# View patches in the App
session.view = eval_patches

Viewing the patches in the FiftyOne App