SAM vs SAM 3 on Aerial Datasets: Zero-Shot Segmentation Benchmark Comparison

What Is SAM?

SAM (Segment Anything Model) is Meta's segmentation model that generates pixel-level masks from simple prompts. These prompts can be a point or a box. No task-specific training needed.

What Is SAM 3?

SAM 3 takes it further with open-vocabulary segmentation. You pass a text prompt like "building" or "water." The model then detects and segments all matching objects across the image in one inference pass.

Aerial Datasets

Aerial datasets are collections of images captured from satellites, drones, or aircraft, used to train and evaluate segmentation models. They cover real-world tasks like building detection, road mapping, and flood segmentation.

Compared to regular images, they are significantly harder to segment due to top-down perspective, small and densely packed objects, and complex terrain.

Benchmark Results on Aerial Datasets with SAM

The table below shows how SAM performed across different aerial and satellite datasets in zero-shot mode. Source: SegEarth-OV3, arXiv:2512.08730

Dataset	Task	# Files	IoU	Dataset Source
WBS-SI	Flood / Water Body	~1,000+	39.8%	Kaggle – Water Body Segmentation
WHU Aerial	Building	~8,189	29.8%	Ji et al., IEEE TGRS 2018
Inria	Building	~360	33.4%	Maggiori et al., IGARSS 2017
DeepGlobe	Road	~6,226	13.2%	Demir et al., CVPR Workshops 2018
UAVid	Semantic Seg.	~300	28.6% mIoU	Kaggle UAVid Dataset
iSAID	Instance Seg.	~2,806	14.5% mIoU	Zamir et al., CVPR Workshops 2019

mIoU(Mean Intersection over Union)

Benchmark Results on Aerial Datasets with SAM 3

SAM 3 results below are all zero-shot. SAM 3 consistently outperforms SAM across every task. Source: SegEarth-OV3, arXiv:2512.08730

Dataset	Task	# Files	IoU	Dataset Source
WBS-SI	Flood / Water Body	~1,000+	75.6%	Kaggle – Water Body Segmentation
WHU Aerial	Building	~8,189	86.9%	Ji et al., IEEE TGRS 2018
Inria	Building	~360	72.4%	Maggiori et al., IGARSS 2017
DeepGlobe	Road	~6,226	39.3%	Demir et al., CVPR Workshops 2018
UAVid	Semantic Seg.	~300	54.7% mIoU	Kaggle UAVid Dataset
iSAID	Instance Seg.	~2,806	27.6% mIoU	Zamir et al., CVPR Workshops 2019

SAM vs. SAM 3 Quick Comparison

SAM: Strengths

Generates accurate pixel-level masks with box or click prompts on high-resolution aerial imagery.
On high contrastive images, the segmented masks have clear boundaries.

SAM: Limitations

Requires a separate prompt for every individual object inefficient at scale across large datasets.
Performance drops significantly on low-resolution satellite images, dense urban scenes, or objects with weak visual boundaries, and smaller objects.

SAM 3: Strengths

A single text or box prompt segments all matching objects across the entire image.
Compared to SAM, SAM 3 produces fewer fragmented or overlapping masks when multiple objects appear close together.

SAM 3: Limitations

Limited performance on small or low-contrast targets, which are common in satellite imagery, where blurred boundaries can lead to missed or incomplete detections.
Difficulty separating closely packed instances, such as dense urban buildings, where adjacent objects may be merged into a single segmentation mask.

Using SAM & SAM 3 in Labellerr

Here's what the annotation workflow looks like in Labellerr using SAM and SAM 3:

Step 1: Upload Your Dataset or use Public Datasets

Labellerr offers public datasets you can use directly, for this demo, select the “Water bodies” dataset from Labellerr's public datasets. No upload needed.

Step 2: Build the Annotation Template

Once the dataset is loaded, create a new annotation template. Set the label class to Pond and the annotation type to Polygon.

Step 3: Annotate with SAM (One Click per Object)

Enable SAM in Labellerr and click once on a water body the model instantly generates a segmentation mask around it. If your image contains multiple object types (water, land, vegetation), you need to click on each object individually to annotate them one-by-one.

Step 4: Annotate with SAM 3 (One Click for the Whole Image)

Switch to SAM 3 in Labellerr, click once on any object, and press I to run the prediction. SAM 3 automatically detects and segments all similar objects across the entire image in one go. This avoids the need to click on each object individually to annotate them one-by-one.

Final output

Watch the Full Demo

See SAM and SAM 3 side by side in Labellerr:

Why Use Labellerr for This?

Switch between SAM and SAM 3 without leaving the platform.
Export annotations as COCO JSON, JSON, and more.
Team collaboration with role-based access and review workflows.
Built-in quality checks to validate your segmentation masks.

In cases where zero-shot models fall short, our human-in-the-loop support can be leveraged to consistently reach >99% accuracy.

Q1: What is the main difference between SAM and SAM 3?

SAM requires manual prompts like clicks or bounding boxes for each object, while SAM 3 supports open-vocabulary segmentation using a text prompt to segment all matching objects in a single inference pass.

Q2: Why is segmentation harder on aerial and satellite imagery?

Aerial imagery includes small, densely packed objects, top-down perspectives, and complex terrain, making object boundaries harder to distinguish compared to natural images.

Q3: Is SAM 3 better than SAM for zero-shot geospatial segmentation?

Yes. According to SegEarth-OV3 , SAM 3 significantly improves mean IoU across multiple remote sensing benchmarks without retraining.