Welcome to the latest installment of our ongoing blog series where we explore computer vision related datasets. In this post we’ll use the open source FiftyOne toolset to visualize Amazon’s
recently released dataset for training “pick and place” robots, plus we’ll create embeddings with the OpenAI CLIP model to explore defects.
Wait, what’s FiftyOne?
FiftyOne is an open source computer vision toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
What are “pick and place” robots?
These days, pick and place robots are becoming more and more common in manufacturing and logistics environments. Leveraging robots that don’t require downtime (besides regularly scheduled maintenance) can boost overall production efficiency and free up humans to work on safer, less repetitive tasks. In some cases, humans and robots work side-by-side, leveraging the strengths of each.
Pick and place robots come in a variety of forms, with the most common type likely being the 5- or 6-axis articulated arm. There are also more specialized robots that can pick groups of items and place them in specific positions, robots designed to work at very high speeds, and the aforementioned collaborative robots or “cobots” that work in tandem with humans.
What are the practical applications of pick and place robots? Although manufacturing is the most obvious use case, you can also find these types of robots performing packaging, sorting, or inspection tasks that need to be done with speed and accuracy.
About the dataset
Earlier this month, Amazon publicly released the largest computer vision dataset ever captured in an industrial product-sorting setting. In contrast to previous datasets for robotic manipulation that might be limited either in the number of object types or in scene heterogeneity and realism, this new dataset, called
ARMBench (Amazon Robotic Manipulation Benchmark), features more than 235,000 pick and place activities on 190,000 objects taking place in the context of an operating Amazon warehouse. This massive dataset can be used to train pick and place robots that are better able to generalize to new products and contexts.
The basic scenario for ARMBench is one in which a robotic arm must retrieve a single item from a bin full of items and transfer it to a tray on a conveyor belt. According to the authors, “The variety of objects and their configurations and interactions in the context of the robotic system made for a uniquely challenging task.”
The ARMBench dataset is broken down as follows:
Defect detection
- Image defect detection (66 GB): This dataset comprises 13,303 images of objects with defects taken through multiple view-points (Transfer-images). For image defect detection, multi-pick and package-defects are the two defect classes. 100,000 images of objects with no defects or are available in the dataset. Multi-pick is used to describe activities where multiple objects were picked and transferred from the source container to the destination container. Package-defect is used to describe activities where the object packaging opened and/or the object separated into multiple parts. Two subclasses, open and deconstruction, are defined for package-defect.
- Video defect detection (255 GB): This dataset comprises 4,075 videos of objects with defects. Multi-picks are not as observable in videos and are excluded from this dataset. At the same time, open and deconstruction defects are observable in videos and are annotated. 100,000 videos of activities that did not result in a defect are available in the dataset.
Object identification
With Object identification the task is to identify an image segment as one of the objects within a database. In the pre-pick stage, identifying an object segment within the tote allows accessing any stored models or attributes of the object from past experience which can be used for manipulation planning purposes. In the post-pick stage, the ID has access to the segment of the object being manipulated both within the tote as well as when it is attached to the robotic arm.
- Picks: 235,000 pick activities with images of the picked object in tote and in robotic arm.
- Reference-images: Up to 6 images (1.jpg-6.jpg) corresponding to different product-ids.
Object segmentation
With this dataset, instance segmentation is used to identify and define distinct objects that are stored in containers. The outcomes of instance segmentation can be used to provide information to subsequent robotic processes, such as the identification of objects and generation of grasping strategies.
- Mix-Object-Tote (14 GB): This subset consists of close-up images of mixed objects that are stored in either yellow or blue totes. Mix-Object-Tote comprises a total of 44,253 images of size 2448 by 2048 pixels and 467,225 annotations, with an average of 10.5 instances per tote.
- Zoomed-Out-Tote-Transfer-Set (1.5 GB): This subset includes mixed objects placed in a yellow tote that were captured with sensors positioned further away from the tote, under different lighting conditions. The dataset contains 5,837 images of size 2046 by 2046 pixels and 43,401 annotations, with an average of 7.5 objects per tote.
- Same-Object-Transfer-Set (3 GB): This subset consists of multiple same objects placed in close proximity within various storage units. The Same-Object-Transfer-Set comprises 3,323 images of size 2048 by 1500 pixels and 12,664 annotations, with an average of 3.8 objects per scene.
Dataset quick facts
Up next, let’s download the dataset, install FiftyOne, and import the dataset into the App so we can visualize it!
https://www.youtube.com/watch?v=5FTd4aYWr-k
Step 1: Download the dataset
In order to load the ARMBench dataset into FiftyOne, you’ll need to
request a download link from Amazon. For the purposes of this blog, we’ll be focusing on the Image Defect Detection subset of data that is part of the larger ARMBench dataset.
Step 2: Install FiftyOne
https://www.youtube.com/watch?v=7mmH-ql_-zg
Step 3: Import the dataset
Now that you have the dataset downloaded and FiftyOne installed, let’s import the dataset, make it compatible with FiftyOne and launch the
FiftyOne App.
To get started, we import FiftyOne and set up some paths and utility functions. The images and annotations live in subfolders under data_root
.
The basic building block of a FiftyOne dataset is a
sample – in this case, an image along with its annotations and metadata. This dataset has additional structure, as each
data
subfolder contains multiple images (typically, four) taken of a single object. This structure lends itself naturally to using a
grouped dataset in FiftyOne. The
max_groups
parameter limits the number of groups (or subfolders) of data that are imported, as this is a large dataset.
parse_data_dir
parses a single subfolder of data
. Each image has a corresponding json file that contains a segmentation polygon of the object, as well as labels indicating whether the transfer was without defect, or if a defect occurred, the type and subtype of defect observed.
We augment the dictionaries read from the annotation files with image metadata and a slice
identifier, which just specifies the camera view (1-4) for that image.
The image height and width will be important because the annotation polygons in this dataset are stored as absolute pixel values, whereas FiftyOne stores vertices
normalized to image dimensions. We will perform the conversion when we create our samples.
Let’s create our FiftyOne dataset!
The basic recipe for loading a dataset into FiftyOne is simple: create a
Dataset, then create and add
Samples! Here we have groups as well, and we follow this
basic recipe for adding samples to a grouped dataset.
We’ll use FiftyOne
tags to store our defect annotations, as well as membership in train and test splits. This makes it a snap to visualize and filter by these elements in the App.
We have a couple options for our object polygons. We could represent these in FiftyOne as either
polylines or
instance segmentations. The bounding box will come in handy later, so we use instance segmentations in the end. But to get there, we first load into a Polyline and then do a conversion, as Polylines easily accept the list-of-vertices format of these annotations. FiftyOne has support for a huge variety of
label types and
dataset formats, making it a snap to work with all types of data in the way that works best for you.
Step 4: Launch the FiftyOne App to visualize the dataset
With our dataset created, let’s launch the FiftyOne App in a browser. You should see the following initial view of the ARMBench-Image-Defect-Detection
dataset by default in the App:
Tags
We’ve loaded our annotations as FiftyOne
tags. In the sidebar, click on
Tags > sample tags to have these tags populate the samples in the App. By selecting certain tags, you can restrict your view to show (for instance) only those tags of interest. Here, we are focusing on open book jackets.
Histograms
We can explore the distribution of tags or annotations using FiftyOne’s
histogram panel. In this view, we’ve excluded nominal samples and are focusing on defects. The histogram shows us a breakdown of the distribution of the various defect annotations.
Note that the defect tags are not mutually exclusive. As
the paper describes, the defects in this dataset fall into two overall categories:
package_defect and
multi_pick. Some of the tags then further detail the type of defect within these categories. Open book jackets are relatively common, while crushed boxes (fortunately for customers!) are less so.
Sample details
Click on any of the samples to get a larger view with additional
view tools and details available. In this image, we’ve used the crop tool to zoom tightly around the object of interest.
The image carousel at the top of the interface shows the other views or slices for this sample. Any of these images may be selected to become the primary displayed image. Note that while the first three slices show the same defect tags, the fourth view is listed as nominal. The partial box defect is only visible from certain angles, making multi-view grouped datasets like this one crucial in this type of application.
Embeddings and the FiftyOne Brain
By browsing our data in the FiftyOne App, we can get a pretty good sense of what our data looks like, including how the various defects manifest and how camera views differ from each other.
The ARMBench dataset comes with tasks or challenges associated with each data subset. In our case, the task at hand is
defect detection. This is a difficult problem, as defects are often rare and unpredictable in nature, challenging supervised learning approaches. The difficulty is magnified here given the huge variety of objects and packaging present. Can analyzing embeddings in FiftyOne help shed some light on the challenge of finding and identifying pick and place defects?
To start things off, we’ll import the
FiftyOne Brain. We’ll focus on the fourth slice of our data, which from our exploration in FiftyOne seems qualitatively less noisy in background and lighting heterogeneity. Our analysis will
compute embeddings on detection patches, so we’ll also filter out the relatively small number of samples that do not have object detections. (This includes the
multi-pick defects, as these do not have accompanying segmentations.) For simplicity, we clone a new, smaller dataset restricted to this slice of samples.
To assist in visualizing our embeddings, we’ll assign a label field to our object detections based on our tags. In addition to the nominal label, there are two basic types of defects, multi-pick and package-defect. Package-defects are broken down further for books, boxes, and bags. To simplify the visualizations, we’ll collapse the book-related and bag-related defects into single categories.
In this method call:
patches_field
specifies that we will compute embeddings on the bounding box patches (or crops) for the detections stored in the object
field.embeddings
names a field to store our computed embedding vectors. For the CLIP model, these are 512-dimensional vectors.brain_key
gives an identifier to this visualization, so we can refer to it later and in the FiftyOne App.model
specifies a model from the FiftyOne Model Zoo. The pixels from our detection crops are passed through this model to generate our embedding vectors.
In the FiftyOne App, we’ll enter a
Patches view to focus on our detected objects and their corresponding embeddings. Using the FiftyOne
Embeddings panel, it’s easy to visualize these patch embeddings, colored by label type:
As it turns out, our book defects are clustered quite visibly in the lower-left-hand corner! Using the
lasso tool, we can select this group of samples. In the samples grid, it is clear that the majority of these samples are indeed defects, and in particular, book defects:
Setting rigor aside for a moment, this lasso-ed selection of 1830 objects captures 1560 book-related defects out of a total of 1922 in the entire dataset, for a recall of 1560/1922 ~ 81%. The remaining 270 selected samples represent false positives in a pool of 31257 negatives, for a false positive rate of 270/31257 ~ .8%. (Recall and false positive rate are the preferred metrics described in
the paper). Of course, this is only a rough exploration, but it is suggestive of the rich structure and information available in these embeddings.
What if we just focus on samples with defects? We’ll re-use our computed embeddings, but re-compute the visualization against the smaller subset of data:
Again, some clear structure is evident in our embeddings plot. Bag-related defects, for instance, are clustered quite tightly, as selected and shown here.
Pretty cool stuff! It’s not the end of the story, but this analysis has definitely generated some insights for us and kick-started our effort on detecting defects in this novel dataset.
Let’s take a step back and return to our first embeddings plot, which includes all samples from slice 4. While we have detailed labels for defects, the large mass of nominal samples lacks annotations. You may have noticed however that the embeddings plot already gives some clues about structure in this sea of data. Selecting a cluster near the top of the mass, for instance, reveals a distinct clustering of plastic bottles!
The other ‘lobes’ of the visualization are semantically meaningful as well, representing distinct clusters of cardboard boxes and objects wrapped in plastic.
As a final example of the tools available in the FiftyOne Brain, let’s utilize the natural language capabilities of our CLIP embeddings to
search by the text prompt “medicine bottle”, returning the top 100 matches. The returned results are quite consistent, and overwhelmingly located in the cluster of bottles we found earlier. Depending on our analysis, we could leverage this
zero-shot labeling capability to automatically add annotations to our dataset to give us more to work with in the large sea of
nominal samples.
You can see more in this short video.
https://www.youtube.com/watch?v=6n5OqLfXbe8
Start working with the dataset
That’s a wrap for now! We hope you enjoyed this quick exploration of defect detection and the new
ARMBench dataset. We’ll be adding this massive dataset to the
FiftyOne Dataset Zoo in the near future, so you’ll be able to explore it on your own in just a couple lines of Python!