A segmentation map is a visual depiction caused by dividing an image into many parts, or segments, to get a pixel-level understanding. Segmenting is crucial our daily lives, from helping children recognize sounds to helping businesses understand their customers’ preferences. It helps us make sense of chaos in various contexts of sounds, markets, or pixels.
The evolution of image segmentation began in the 1950s-1970s with some early milestones and, later, in 2012, with biological neural networks. It has significantly advanced medical imaging (see ISBI’12 Challenge) and formed the backbone of modern computer vision. Today, segmentation provides a detailed lens for visual analysis.
In computer vision, traditional object detection methods can struggle with nuanced boundaries. Segmentation maps address this by identifying “what” an object is and “where” it begins and ends, providing pixel-level granularity for better scene interpretation. A segmentation map is a visual representation of an image where each pixel is labeled a class.
In this article, we’ll explore the power of segmentation maps in autonomous driving with the help of FiftyOne. We will explore segmentation tools (using FiftyOne and the KITTI dataset) as well provide practical code snippets in-line here and in the linked notebook. Here’s what you’ll learn:
Image segmentation has progressed steadily from its original deterministic methods. Earlier segmentation methods included classical techniques such as Support Vector Machines (SVMs), k-means clustering, edge detection, and graph-based approaches. These foundational methods paved the way for today’s practices.
The year 2015 marked a turning point with the rise of deep learning-based techniques like Fully Convolutional Networks (FCNs). The success of these models relies on their ability to learn intricate feature representations through feature maps. Feature maps are essentially sets of arrays generated by layers within these models. Each array highlights specific features detected in the input image, such as edges, textures, or shapes. These feature maps collectively form a rich feature representation, which is a transformed and more abstract encoding of the original image, optimized for the segmentation task.
Later, the introduction of U-Net architecture revolutionized the field with its encoder-decoder architecture and efficient upsampling. This made segmentation more robust and suitable for real-time applications. U-Net’s architecture is particularly effective in segmentation because its encoder path progressively extracts feature maps at different scales. These feature maps are then upsampled and combined in the decoder path to generate precise segmentation masks.
Fast-forward to 2023, and the Segment Anything Model (SAM) has emerged as a pinnacle segmentation model. SAM uses vision transformers to enable zero-shot segmentation across diverse datasets, and it sets a new standard for segmentation capabilities. By learning highly generalizable feature maps, SAM can effectively segment objects in diverse and unseen images.
Let’s now briefly discuss segmentation maps before exploring SAM on the Virtual KITTI dataset.
Segmentation maps label each pixel in an image based on objects or categories, capturing more precise boundaries and offering clarity beyond simpler object detection techniques.
Segmentation techniques vary depending on the complexity and the choice depends on the details required for a task. Here are the key approaches:
To create and visualize such segmentation maps, we need to prepare a high-quality annotated dataset and train and evaluate a model for our custom use case. Here, we are developing a SAM model fine-tuned on the KITTI dataset using FiftyOne.
FiftyOne is a tool that enables you to visualize datasets and interpret model performance faster and more effectively (for example, curating, visualizing, and analyzing segmentation maps). The key value proposition is that better data ultimately leads to better performing models. For image segmentation specifically, FiftyOne offers a few key capabilities:
In our example, we can start by loading the KITTI dataset from the FiftyOne Dataset Zoo. The KITTI dataset provides left-camera images and 2D object detections, ideal for autonomous driving tasks.
kitti_data = foz.load_zoo_dataset("kitti", split="train", )
We can isolate specific classes (e.g., “van”), creating subsets of data for targeted analysis.
van_frames = kitti_data.filter_labels('ground_truth', (F('label') == "Van") ) van_frames
Then we can isolate images with at least ten segmented objects detected above 80% confidence.
view = sub_data.filter_labels( "segmentations", (F("confidence") > 0.8) ).match( F("segmentations.detections").length() > 10 )
This filtering is only made possible if we apply a segmentation model and view the predictions. Let’s do that now.
Note that applying SAM to generate segmentation masks requires the transformers and segment-anything libraries.
In the following code, we’ll select 500 random images on which to run predictions, apply the model, and save the segmentation results in a label_field for further analysis.
# Segment inside boxes sub_data = kitti_data.take(500) sub_data.apply_model(model, label_field="segmentations", prompt_field="ground_truth",)
Unlike other models, SAM allows location-specific segmentation. This approach avoids clutter by running segmentation only within specified bounding box areas. Semantic Segmentation classifies each pixel in an image into a category. Here, all cars are labeled as “cars” and people as “pedestrians.”
The following Patches View allows close inspection of bounding box areas in the image.
Instance Segmentation goes beyond classification to differentiate between individual objects. Here, every car in a scene is labeled as a unique entity.
And then we can create an equivalent Patches View of the same instance segmentation task.
By harnessing FiftyOne and SAM, image segmentation becomes more targeted and efficient, and yields actionable insights (such as evaluating model performance vs. the ground truth).
Producing high-quality segmentation maps requires well-annotated data, diverse perspectives, and balanced classes. Here’s how to ensure your segmentation maps are accurate, robust, and ready for real-world applications.
High-quality annotations are the foundation of any segmentation model. Even though it is expensive to annotate every object, it is pretty important as poor labeling can mislead the model’s predictions and skew the results.
Real-world images often show variability due to lighting, perspective, and spatial resolution changes. Introducing diversity through data augmentation enhances model robustness and generalization across varied conditions. Key techniques include:
Tools like FiftyOne’s integration with Albumentations can simplify these processes.
A class imbalance can introduce bias in the model’s decision curve. For example, training solely on daytime urban images may harm performance in other settings. It’s best to ensure that your dataset has a diverse representation with an almost equal proportion of classes. If not, then the underrepresented classes can be augmented to achieve balance.
Segmentation maps provide a significant amount of information, but their true value lies in how we act on them. Let’s explore a few practical workflows.
A raw segmentation mask may appear abstract to us, especially to non-technical audiences, as it is just a colorful mask of binary digits. We combine binary information with the original image and make sense of it by overlaying segmentation masks onto each object. This approach enhances clarity and makes the relationships between objects and their surroundings more visible.
For example, an image of a van with a segmentation mask (in the second image) shows its position and its precise boundaries, complementing its exact size and shape.
It doesn’t stop there; these masks allow for accurate measurements of object areas as each pixel is associated with a class label. It might be helpful in various applications, such as estimating the proportion of road space occupied by vehicles in a scene to monitor traffic flow.
By comparing the Manhattan distance between bounding boxes and filtering based on thresholds (e.g., selecting only those with a distance greater than 1), we can clearly distinguish occupied from free space, which is beneficial for urban planning and autonomous driving.
# Bboxes are in [top-left-x, top-left-y, width, height] format manhattan_dist = F("bounding_box")[0] + F("bounding_box")[1] # Only contains predictions whose bounding boxes' upper left corner # is a Manhattan distance of at least 1 from the origin view = sub_data.filter_labels( "segmentations", manhattan_dist > 1 )
Instance segmentation identifies and localizes individual objects, aiding analysis in areas such as:
Here is an example of the dataset’s segmented masks of different “pedestrian” objects.
And now cutting to the chase: why is segmentation superior to bounding boxes? While bounding boxes estimate object locations relatively precisely by enclosing them in a rectangle, segmentation maps offer additional pixel-level precision. Segmentation maps provide clearer shapes than bounding boxes, eliminating irrelevant background elements. Minor imperfections (e.g., a missing part of a foot in the below image) can have a cascading effect on model accuracy. Overall, segmentation maps deliver the granularity needed for detailed analyses.
Another example shows segmented objects from the cyclist class with their masks overlayed. Bounding boxes can be a poor geometric fit for this sort of shape. Segmentation mapping solves this with pixel-level labeling.
Segmentation maps are revolutionizing how we analyze satellite data for environmental monitoring and urban planning. Planet Labs is leading this space with key use cases, including regular updates on building infrastructure changes, insights into new and existing road developments, and early identification of construction activities.
Segmentation maps also empower advanced robotic capabilities across various industries. For example, in cooking, vision-guided segmentation helps identify food items and assists in food preparation. Imagine experiencing Michelin-star dining with robotic chefs delivering precision and artistry. Chef Robotics is pioneering this vision with its ChefOS platform, which features pick-and-place robots and other innovative tools.
The paper Attention is All You Need highlights significant progress in segmentation, with deep learning models now using transformers and attention mechanisms. Vision Transformers (ViTs) and Segmenters use self-attention for better global feature understanding, outperforming traditional convolutional methods. Hybrid architectures (Swin Transformers) combine CNNs and transformers, enhancing accuracy through hierarchical representations and finer spatial details.
Other approaches like weakly-supervised learning reduces dependency on extensive labeled datasets. Some key methods – Class Activation Maps (CAMs) and Grad-CAM++ help identify object regions with minimal supervision. The CLIMS model (2023) integrates text and images, achieving robust segmentation without pixel-level annotations.
Segmentation models are increasingly tailored to specific industries:
These advancements promise to make segmentation more accessible and impactful in real-world applications.
In this article, we explored segmentation as a computer vision task using the KITTI dataset for autonomous navigation. Segmentation allows for a deeper analysis of images, while object detection provides a high-level overview of the objects present. We covered:
With tools like FiftyOne, you can truly harness the power of segmentation maps in your projects! We invite you to dive into FiftyOne and visualize your datasets—discover how engaging and insightful this can be.
Don’t forget to check out our related blogs, where you can find advanced features and inspiring real-world case studies!
We’re eager to hear your thoughts—how do you envision segmentation maps influencing the future of AI? Please share your insights or any challenges you face..
Keep in touch for more on using FiftyOne in advanced AI workflows. Whether you’re aiming to boost model performance or learn new techniques, we’re here to support your journey!
Talk to an expert about FiftyOne for your enterprise.
Like what you see on GitHub? Give the Open Source FiftyOne project a star
Get answers and ask questions in a variety of use case-specific channels on Discord