How to Use Segment Anything Model | SAM 3 Guide for FiftyOne

Introduction

Meta's Segment Anything Model 3 (SAM 3) is a foundation model for detecting, segmenting, and tracking objects in images and videos based on concept prompts. Released on November 19th, 2025, SAM 3 represents a major advancement in computer vision, transforming segmentation from a manual, click-based process into an intelligent system that understands text-based prompts.

This comprehensive guide covers everything you need to know about how to use Segment Anything Model 3:

What is SAM 3?
How does Segment Anything work?
How to use Segment Anything Model 3
Segment anything examples: Real-world applications
Frequently asked questions

Quickstart: Try SAM 3 Now

What is Segment Anything Model 3?

SAM 3 is Meta's third-generation foundation model for visual segmentation. Unlike its predecessors SAM and SAM 2, which required manual single-instance visual prompts, SAM 3 understands open-vocabulary concepts and can segment all matching instances simultaneously. With Promptable Concept Segmentation (PCS), you can find and segment every instance of a concept using simple text descriptions like "yellow school bus."

The model consists of 848 million parameters and combines three tightly-coupled components: a high-capacity Meta Perception Encoder for text and image encoding, a DETR-based promptable detector, and a dense-memory video tracker inherited from SAM 2.

Why SAM 3 is more powerful

SAM 3 introduces several breakthrough capabilities:

Open-vocabulary understanding: Rather than being limited to predefined classes, SAM 3 can segment any concept described in natural language.
Exhaustive instance detection: When you prompt with "car," SAM 3 finds and segments every car in the image, not just one.
Unified architecture: Unified model handles images, videos, text prompts, visual prompts (boxes/points), and exemplar-based queries.
Presence detection: A specialized "presence head" determines whether a concept exists in the scene before attempting localization, dramatically improving accuracy on hard negatives.
Video tracking: SAM 3 maintains object identities across video frames with temporal consistency, handling occlusions and re-appearances seamlessly.

How does Segment Anything work?

Architecture Overview

SAM 3 model architecture diagram, taken from Meta’s SAM 3: Segment Anything with Concepts paper.

Understanding how Segment Anything works requires examining its multi-stage pipeline that combines detection, segmentation, and tracking."

Vision and text encoder: A vision transformer based on the perception encoder architecture extracts spatial features from input images. The model combines a vision encoder (450M parameters) and text encoder (300M parameters) into a unified representation, allowing it to comprehend both the visual characteristics of objects and their textual descriptions.
Image-level detector: The detector uses a DETR-based architecture, which is the first to use transformers for object detection. Presence head was introduced to determine if the queried concept exists in the scene. By decoupling recognition and localization, detection accuracy was significantly improved.
Memory-based video tracker: Inheriting SAM 2's transformer-based memory architecture, the tracker propagates "masklets" (spatial-temporal masks) across frames using self-attention and cross-attention mechanisms. Strategies like track confirmation delays and periodic re-prompting prevent drift and maintain identity integrity.

SAM 3 task types

Promptable concept segmentation (PCS): Find all matching instances of a concept using text prompt (single string or list).
Promptable visual segmentation (PVS): Segment specific instances using spatial prompts, such as points (positive/negative clicks), bounding boxes or mask refinement.
Automatic segmentation: Generate all possible masks without prompts, with quality filtering and deduplication. While no prompts are needed, it is memory intensive.

How to use Segment Anything Model 3

SAM 3 is an open-source model available now. Follow our step-by-step guide on how to use Segment Anything to get started.

How to use SAM 3 with FiftyOne

While SAM 3 can be used standalone, engineers often need additional tools to manage segmentation results at scale. FiftyOne is an open-source data-centric AI platform that provides native integration with SAM 3, addressing common challenges in production segmentation workflows:

Batching support: Process your dataset efficiently with automatic batching, essential for large-scale applications.
Visual embeddings: Visualize your data to analyze your data patterns, perform similarity search, and identify dataset biases.
Data curation: Get insights into your dataset's distribution, diversity, and coverage to identify your best-performing samples and weed out low-quality data.
Streamline visual data discovery: Query data lake and retrieve relevant samples in seconds using Data Lens
Labeling quality analysis: Prioritize data for manual review or re-annotation

Get started with SAM 3

Segment Anything examples: Real-world applications

SAM 3's open-vocabulary capabilities enable rapid deployment in production environments. Below are Segment Anything examples across industries.

Medical imaging example

SAM 3 concept segmentation on medical imaging, detecting anomalies.

Retail and inventory management example

SAM 3 concept segmentation on retail database.

Autonomous vehicles: Road scene understanding example

[Insert AV Image in brief]

SAM 3 with the prompt “cars” on a driving dataset.

Manufacturing & quality control example

[Insert metal bolt Image in brief]

SAM 3 successfully detected defects: scratches on metal bolts.

How is SAM 3 different from previous models?

Understanding the evolution from SAM to SAM 2 to SAM 3 clarifies when to use each model:

Key architectural differences

Presence token: SAM 3's detector includes a specialized head that determines whether a concept exists before localizing it. This innovation dramatically reduces false positives when prompting with similar concepts like "player in white jersey" vs "player in red jersey."
Perception Encoder (PE): SAM 3 introduces joint vision-language encoding through Meta’s PE, which was pre-trained on 5.4 billion image-text pairs. PE provides both strong semantic understanding and robustness across diverse visual domains. The aligned encoders enable zero-shot open-vocabulary capabilities that earlier SAM models lacked.
Decoupled detector tracker: Unlike SAM 2's monolithic architecture, SAM 3 separates detection and tracking components, reducing task interference and scaling more efficiently with data.

When to use each SAM 3 model

Use SAM 3 when:

You need text-based prompting for any concept
Finding all instances of an object class
Working with rare or long-tail concepts
Tracking multiple objects across video
Building zero-shot applications

Use SAM 2 when:

Interactive single-object segmentation suffices
You have strong spatial priors (know exact locations)
Lower model size is critical
Working with well-defined visual tasks

Use SAM when:

Working with still images only
Basic point-and-click segmentation
Compatibility with existing SAM pipelines required

SAM 3 frequently asked questions

How fast is SAM 3?

SAM 3 processes a single image with 100+ detected objects in approximately 30 milliseconds on an NVIDIA H200 GPU.

Is Segment Anything Model 3 open source?

Yes, Segment Anything is open source. Meta released the model under a custom SAM License that allows both research and commercial applications with certain restrictions. Check out the SAM License for full license terms.

When was Meta SAM 3 released?

Meta released SAM 3 on November 19, 2025, alongside a benchmark dataset called SA-Co (Segment Anything with Concepts) contains 120K images and 1.7K videos annotated with exhaustive concept-level masks spanning over 200K unique concepts.

What are the installation requirements for SAM 3?

Python 3.12 or higher
PyTorch 2.7 or higher
CUDA-compatible GPU with CUDA 12.6 or higher
Model checkpoint access approval via HuggingFace

Can I fine-tune SAM 3 on my dataset?

Absolutely. Fine-tuning improves performance on domain-specific tasks where zero-shot results don't meet requirements.

Deploying SAM 3: Why understanding your data matters more than ever

SAM 3 represents a fundamental shift in how we approach segmentation. Enabling natural language interaction with visual data removes the barrier between human intent and machine understanding. For machine learning engineers, SAM 3 offers:

Rapid prototyping: Auto-label datasets 10x faster than manual annotation
Zero-shot deployment: Handle edge cases and rare classes out of the box
Unified pipeline: Detect, segment, and track with a single model

As foundation models continue to evolve, SAM 3 demonstrates that "segment anything" can truly mean anything you can describe. Yet with these increasingly powerful models, data quality matters more than ever. The bottleneck has shifted from model capabilities to understanding your data's strengths and weaknesses—identifying edge cases, discovering annotation errors, and curating the right training examples.

This is why integrating with tools like FiftyOne becomes crucial for leveraging SAM 3's full potential. Whether you're just learning how to use Segment Anything Model or scaling to production, FiftyOne's batching support and visual embeddings provide the foundation for efficient workflows.

Get started today

Talk to a computer vision expert