Skip to content

SAM 2 Is Now Available in FiftyOne!

Segment Anything 2 (SAM 2) which was released on July 29th, 2024 represents a major leap forward in segmentation algorithms. Offering cutting-edge performance in both images and videos, SAM 2 not only enhances image segmentation but also introduces advanced video capabilities. With SAM 2, users can achieve precise segmentation and tracking in video sequences using simple prompts—like bounding boxes or points—from a single frame. This enhanced functionality opens up exciting new possibilities for a wide array of video applications.

In this post you will learn about SAM 2 and how to load and apply SAM 2 models on both images and videos in FiftyOne.

What is SAM 2

SAM 2 builds on the foundation of the original Segment Anything Model, which Meta released in April 2023 for zero-shot segmentation of still images. SAM 2 not only enhances the image segmentation of the original SAM, but also introduces advanced video capabilities. With SAM 2, users can achieve precise segmentation and tracking in video sequences using simple prompts—like bounding boxes or points—from a single frame. This enhanced functionality opens up exciting new possibilities for a wide array of video applications.

Using SAM 2 in FiftyOne for Images

FiftyOne makes it easy for AI builders to work with visual data. With SAM 2 in FiftyOne you can now seamlessly generate segmentation labels for images and video and visualize them on your datasets. With just a few simple commands you can download SAM 2 models and run inference on your FiftyOne datasets directly from the FiftyOne Model Zoo, which contains a collection of pre-trained models.

To get started, ensure that you have FiftyOne installed:

pip install fiftyone
pip install fiftyone
pip install fiftyone

 You also need to install SAM 2 using the instructions in the segment-anything-2 github repository.

The following code snippet demonstrates how you can load a dataset in Fiftyone and provide bounding box prompts to a SAM 2 model to generate segmentations.

import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"quickstart", max_samples=25, shuffle=True, seed=51
)
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")
# Prompt with boxes
dataset.apply_model(
model,
label_field="segmentations",
prompt_field="ground_truth",
)
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset( "quickstart", max_samples=25, shuffle=True, seed=51 ) model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch") # Prompt with boxes dataset.apply_model( model, label_field="segmentations", prompt_field="ground_truth", )
import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "quickstart", max_samples=25, shuffle=True, seed=51
)

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Prompt with boxes
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="ground_truth",
)

We can now look at our data with the segmentation labels created by SAM 2.

session = fo.launch_app(dataset)
session = fo.launch_app(dataset)
session = fo.launch_app(dataset)

We can see that the inference of the SAM 2 model prompted with bounding box detections are stored under the field segmentations.

You can also prompt with keypoints instead of bounding boxes. To do this, we first filter the images in the quickstart dataset that contain the label person

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart")
dataset = dataset.filter_labels("ground_truth", F("label") == "person")
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart") dataset = dataset.filter_labels("ground_truth", F("label") == "person")
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart")
dataset = dataset.filter_labels("ground_truth", F("label") == "person")

Next we need to generate keypoints on this dataset. We can use another FiftyOne Zoo model to generate these keypoints.

# Generate some keypoints
model = foz.load_zoo_model("keypoint-rcnn-resnet50-fpn-coco-torch")
dataset.default_skeleton = model.skeleton
dataset.apply_model(model, label_field="gt")
# Generate some keypoints model = foz.load_zoo_model("keypoint-rcnn-resnet50-fpn-coco-torch") dataset.default_skeleton = model.skeleton dataset.apply_model(model, label_field="gt")
# Generate some keypoints
model = foz.load_zoo_model("keypoint-rcnn-resnet50-fpn-coco-torch")
dataset.default_skeleton = model.skeleton
dataset.apply_model(model, label_field="gt")

Let us look at this dataset and the keypoints that were generated.

session = fo.launch_app(dataset)
session = fo.launch_app(dataset)
session = fo.launch_app(dataset)

Now we can run a SAM 2 model on this dataset using the keypoints field gt_keypoints to prompt the model.

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")
# Prompt with keypoints
dataset.apply_model(
model,
label_field="segmentations",
prompt_field="gt_keypoints",
)
session = fo.launch_app(dataset)
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch") # Prompt with keypoints dataset.apply_model( model, label_field="segmentations", prompt_field="gt_keypoints", ) session = fo.launch_app(dataset)
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Prompt with keypoints
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="gt_keypoints",
)

session = fo.launch_app(dataset)

You can also use SAM 2 to automatically generate masks for the whole image without any prompts!

import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset(
"quickstart", max_samples=5, shuffle=True, seed=51
)
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")
# Automatic segmentation
dataset.apply_model(model, label_field="auto")
session = fo.launch_app(dataset)
import fiftyone as fo import fiftyone.zoo as foz dataset = foz.load_zoo_dataset( "quickstart", max_samples=5, shuffle=True, seed=51 ) model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch") # Automatic segmentation dataset.apply_model(model, label_field="auto") session = fo.launch_app(dataset)
import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "quickstart", max_samples=5, shuffle=True, seed=51
)

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-image-torch")

# Automatic segmentation
dataset.apply_model(model, label_field="auto")

session = fo.launch_app(dataset)

Using SAM 2 in FiftyOne for Video

SAM 2’s video segmentation and tracking capabilities make the process of propagating masks from one frame to another seamless. Let’s load a video dataset and only retain the bounding boxes for the first frame. 

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F
dataset = foz.load_zoo_dataset("quickstart-video", max_samples=2)
# Only retain detections on the first frame of each video
(
dataset
.match_frames(F("frame_number") > 1)
.set_field("frames.detections", None)
.save()
)
session = fo.launch_app(dataset)
import fiftyone as fo import fiftyone.zoo as foz from fiftyone import ViewField as F dataset = foz.load_zoo_dataset("quickstart-video", max_samples=2) # Only retain detections on the first frame of each video ( dataset .match_frames(F("frame_number") > 1) .set_field("frames.detections", None) .save() ) session = fo.launch_app(dataset)
import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart-video", max_samples=2)

# Only retain detections on the first frame of each video
(
    dataset
    .match_frames(F("frame_number") > 1)
    .set_field("frames.detections", None)
    .save()
)

session = fo.launch_app(dataset)

We see that only the first frame has annotations retained. So now we can use this prompt to generate segmentations using SAM 2 for the first frame and propagate it to all the frames of the video. It is as simple as calling apply_model on the dataset.

model = foz.load_zoo_model("segment-anything-2-hiera-tiny-video-torch")
# Prompt with boxes
dataset.apply_model(
model,
label_field="segmentations",
prompt_field="frames.detections", # Can be a detections or a keypoint field
)
session = fo.launch_app(dataset)
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-video-torch") # Prompt with boxes dataset.apply_model( model, label_field="segmentations", prompt_field="frames.detections", # Can be a detections or a keypoint field ) session = fo.launch_app(dataset)
model = foz.load_zoo_model("segment-anything-2-hiera-tiny-video-torch")

# Prompt with boxes
dataset.apply_model(
    model,
    label_field="segmentations",
    prompt_field="frames.detections", # Can be a detections or a keypoint field
)

session = fo.launch_app(dataset)

SAM 2’s segmentation and tracking capabilities in videos are very powerful. In this tutorial we have used the sam2_hiera_tiny model but you can use any of the following models now available in the Fiftyone Model Zoo:

Image models:

  1. segment-anything-2-hiera-tiny-image-torch
  2. segment-anything-2-hiera-small-image-torch
  3. segment-anything-2-hiera-base-plus-image-torch
  4. Segment-anything-2-hiera-large-image-torch

Video models:

  1. segment-anything-2-hiera-tiny-video-torch
  2. segment-anything-2-hiera-small-video-torch
  3. segment-anything-2-hiera-base-plus-video-torch
  4. Segment-anything-2-hiera-large-video-torch

Conclusion & Next Steps

In this tutorial we showed you how to, with just a few commands, download SAM 2 models and run inference on your FiftyOne image or video datasets. If you’d like to learn more, here are a few ways to get started:

  • Join the 3000+ AI builders in the FiftyOne Community Slack. This is the place to ask questions and get answers from fellow developers and scientists working on Visual AI in production.
  • Attend one of our Getting Started Workshops that cover all the topics you need to get up and running with FiftyOne and your datasets and models.
  • Hit up the FiftyOne GitHub repo to find everything you need to use FiftyOne for your Visual AI projects.