Skip to content

How Computer Vision Is Changing Manufacturing

Industrial components with bounding boxes. Image from MVTech Industrial 3D Object Detection Dataset, courtesy of MVTech.

Welcome to the second installment of Voxel51’s computer vision industry spotlight blog series. Each month, we highlight how different industries – from construction to climate tech, from retail to robotics, and more – are using computer vision, machine learning, and artificial intelligence to drive innovation. We’ll dive deep into the main computer vision tasks being put to use, current and future challenges, and companies at the forefront.

In this edition, we’ll focus on manufacturing! Read on to learn about computer vision in manufacturing and industrial automation.

Industry overview

Key facts and figures:

Manufacturing is integral in the production of most of the physical goods that make up our modern world, from food and beverages to laptops, toys, and tools. The sector is also a critical component for both developing and developed economies, driving technological advancements and offering employment to millions. 

The manufacturing industry is currently undergoing a massive transformation referred to as the fourth industrial revolution (4IR), headlined by the adoption of computer vision, artificial intelligence, robotics, and the industrial internet of things (IIoT). 4IR technologies present an estimated multi-trillion dollar opportunity and will enable factories to operate more accurately, efficiently, and safely. Computer vision is already playing a central role in this transformation, with companies using machine vision techniques to automate strenuous tasks, identify issues in both product and machinery, and improve safety conditions for workers.

Before we dive into several popular applications of computer vision-based AI technologies in manufacturing, here are some of the industry’s key challenges.

Key industry challenges in manufacturing

  • Labor shortages: According to projections by Deloitte, there will be more than two million unfilled manufacturing jobs in the United States by 2030. Globally, the industry is expected to reach a deficit of up to 7.9 million jobs over the same period, according to a study by Korn-Ferry.
  • Inflation: Rising costs for materials, energy, and transportation put pressure on manufacturers to streamline their operations.
  • Scope and diversity: By its very nature, manufacturing touches almost every product we encounter, from screws, to shoes, to cars. Each product has its own manufacturing requirements and challenges, and each factory uses its own combination of cameras and sensors, so there are no one-size-fits all solutions.

Continue reading for some ways computer vision is enabling industrial automation and improving safety in manufacturing.

Applications of computer vision in manufacturing

Bin picking

Bin picking relies on the combination of robotics and computer vision. Image: ATRIA Innovation

A common industrial robotics application, bin picking is the action of selecting an object from a bin, picking it up, and placing it in another location. 

For a robot to be successful with this task, it needs to precisely navigate and interact with objects of different shapes, sizes, and materials in a potentially cluttered, occluded, and poorly lit environment. Machine vision systems make this possible by mapping the environment and guiding the robotic arm’s motion.

In the simplest cases, camera images are passed into object detection routines. In many cases, however, grabbing an object from a bin requires mapping depth information and performing 3D object detection to situate the object in three dimensional space. Some computer vision systems use point clouds from LiDAR sensors to generate these 3D representations. One additional complication is that some objects can only be easily grabbed and held in certain orientations. To overcome this, some machine vision systems estimate the “pose” of objects and use this information to orient the robotic arm for picking.

Here are a few papers on computer vision in bin picking:

Palletizing and depalletizing

Pallets stacked in a warehouse. Image courtesy of Arno Senoner.

Pallets have been called the unsung heroes of our modern age. These flat, typically wooden, platforms are essential to the scale and economy of global logistics and transportation. Before pallets make their way to shipping containers, they are loaded up with goods. These stacks of goods can reach up to 15 feet tall and weigh more than two tons. The process of loading and stacking products onto a pallet is known as palletizing. Similarly, once products have reached their destination, they must be unloaded from the pallet. This unloading process is referred to as depalletizing

To reduce injury and error, manufacturers have been automating palletizing and depalletizing tasks through the use of robotic arms equipped with computer vision systems. Computer vision is a great fit for this problem, as the items being loaded and unloaded are typically nearly similar, so object detection models can be trained with very high accuracy. 

Another crucial enabling element is calibration. As the robot arm loads or unloads items, it takes note of the disparity between where it estimated the object to be located and where it was actually located as feedback to refine its future predictions.

A few resources to get you started:

Machine tending

A robotic arm loading boxes onto a conveyor belt. Image courtesy of Mech-Mind Robotics.

Machine tending is the process of automatically loading raw material or components for input into a machine. This includes placing parts on a conveyor belt, as well as preparing materials for welding, grinding, milling, or injection molds. 

Automated machine tending has multiple advantages, from reducing injury risk to improving consistency. Computer vision-enabled automation in machine tending also allows for higher precision in these applications. With real-time monitoring, object localization can be used to precisely situate the input materials relative to the machine being tended, and a robotic arm can adjust positioning accordingly. 

As with many computer vision applications in manufacturing, data availability and quality in machine tending is a significant challenge. Machine vision systems for machine tending often need to be built with limited labeled training data. This means that data cleaning and curation are essential, and techniques like data augmentation and transfer learning can be incredibly important.

Here are a few preliminary resources:

Defect detection

Examples of manufactured products without (first two rows) and with (last row) defects. Image originally appears in MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection, and is courtesy of MVTech.

Computer vision has already become indispensable in ensuring quality control in industrial processes. Instance segmentation, for instance, is used in conjunction with high-resolution sensor data, to check if a manufactured part has the desired spatial dimensions, within an allowed tolerance. 

One area of quality control where computer vision features prominently is defect detection. Manufacturers want to identify defective parts and products as early in their journey as possible. In some applications, where the possible varieties of defects are known, object detection and classification are used to identify problems.

In other cases, the full variety of possible defects is either unknown, or too complicated to easily categorize. In manufacturing, defects can range from minutiae like small scratches to entirely missing components, like a missing screw. This can also be exacerbated by class imbalance, where there are many more examples of “normal” products than “defective” products.

Anomaly detection provides an alternative, unsupervised approach that takes normal and defective examples as input and predicts a new product as normal, or “nominal”, if it is likely to have come from the same distribution as the previously seen normal samples. To make this determination, anomaly detection models learn an approximate representation of this nominal distribution, which may involve using density-based models like DBSCAN, support vector machines, or deep learning models like autoencoders and generative adversarial networks (GANs). Libraries like Intel’s Anomalib provide tools for implementing and benchmarking anomaly detection algorithms.

Here are a few papers on defect detection and anomaly detection in manufacturing:

Predictive maintenance

Saw blade condition monitoring, with red triangle markers detecting the head of each tooth. Image courtesy of A computer vision system for saw blade condition monitoring.

Tools and machinery in manufacturing plants gradually accumulate wear and tear which, if left untreated, could lead to complete breakdown, as well as potential injuries and lost productivity. To avoid such failure, manufacturers have historically performed routine maintenance, cleaning, refurbishing, and changing out old parts on a regular basis. 

But routine maintenance has two predominant downsides: first, it introduces downtime and costly upkeep when maintenance may not be necessary; second, it is not able to pick up on issues that arise and escalate between maintenance intervals.

Preventive maintenance (PdM) seeks to address these issues with more active monitoring. Computer vision and predictive analytics help manufacturers to save on maintenance costs and avert catastrophe. PdM is already applied to a wide variety of machines and mechanical parts, from blades and bearings to gears and gaskets.

In some cases, as with the saw blade pictured above, computer vision techniques like segmentation, object detection, and classification suffice to identify and predict all likely failure modes. Common signs include cracks, corrosion, and leaks. As with defect detection, when the spectrum of failure modes is complex or unknown, anomaly detection is applied to the state of the machines. 

Companies at the cutting edge of computer vision in manufacturing

Mech-Mind Robotics

Left: randomly piled track links in a bin. Right: 3D reconstruction of the scene with point clouds generated by Mech-Mind’s Mech-Eye industrial 3D cameras. Image courtesy of Mech-Mind Robotics. 

With more than 700 employees, 1000+ customers, and more than $200M in funding, Mech-Mind Robotics is the largest 3D vision company in China, and one of the world’s largest providers of 3D vision cameras and machine vision software for robotic automation.

Mech-Mind’s integrated hardware and software solutions are used in a wide range of manufacturing applications, including bin picking, machine tending, palletizing and depalletizing, assembly, and gluing. The Mech-Eye industrial 3D camera uses structured light technology to generate high resolution, high accuracy point clouds.

Mech-Mind’s Mech-Vision machine vision software provides a platform for customers to build industrial computer vision applications with Mech-Eye cameras and customers’ robots. Mech-Vision has built-in support for common computer vision tasks like pose adjustment, and 2D and 3D matching, wherein the customer generates a point cloud model of the object to be recognized, either from a CAD file or directly from a camera image, and this model is recognized in the scene. Built-in support for common industrial robots means that the robot calibration process, while traditionally time consuming (on the scale of hours), can be completed in less than 20 minutes.

To round things out, Mech-Mind’s deep learning software allows customers to fine-tune computer vision models for their specific use cases. Customers load in their own data, which are automatically pre-labeled, and can then be rapidly edited and revised. Typically, Mech-Mind’s deep learning software only needs 20-50 images of an object to train a model to recognize it in a scene. 


Instrumental dashboard displaying normal and anomalous products. Image courtesy of Instrumental.

Founded by two Stanford and MIT grads and former Apple employees in 2015 and based in Palo Alto, CA, Instrumental is leading the way in ensuring product quality in electronics manufacturing. They use computer vision in conjunction with predictive analytics to provide real-time monitoring and alerts, as well as root cause analysis for prior failures.

Instrumental’s AI-based computer vision suite supports both new product introduction (NPI) manufacturing, which is characterized by low volume, and mass production (MP) manufacturing, which is high volume.

Even within electronics manufacturing, the wide variety of products and substantial variation from product to product means that general purpose computer vision models have seen very little success. Nevertheless, manufacturers want to detect defects and issues on their particular use case as quickly as possible.

Instrumental’s suite of computer vision tools is designed to achieve high performance application-specific defect detection given as few samples as possible. To do this, they use techniques like data augmentation, transfer learning, and active learning to build a robust dataset that they use to train an anomaly detection model. Their models are built easily with no coding. Once deployed, these models run real-time inference on the edge in the factory and create a record that can be shared, inspected, and evaluated.

Protex AI

Founded in 2020 and backed by YCombinator, Notion Capital, and Playfair Capital, Irish startup Protex AI is helping enterprise safety teams to revolutionize how they make proactive safety decisions that contribute to a safer work environment.

Their AI-powered technology is enabling businesses to gain greater visibility of unsafe behaviors in their facilities. The privacy-preserving platform plugs into existing CCTV infrastructure and uses its computer vision technologies to capture unsafe events autonomously in settings such as warehouses, manufacturing facilities, and ports.

Protex AI provides a simple interface so that each user can create their own “rules”, including setting exclusion zones, speed limits for forklifts, or even minimum distances workers must maintain between themselves and machines. Protex then uses computer vision techniques, including object detection, object tracking, and pose estimation, in order to check these rules. For rules involving speeds or distances, the vision system employs calibration. Typically, calibration is performed using inputs from multiple cameras, but Protex uses special routines to estimate calibration from a sole CCTV camera.

Due to privacy concerns surrounding customer image and video data, Protex AI runs all of their models on the edge on Nvidia powered devices. For power and compute efficiency, they quantize their model weights.

As use cases can differ greatly, Protex AI deploys custom models for each customer. Their base model is trained on hundreds of thousands of images, and then a unique version is fine-tuned on a given customer’s data. In their line of work, data quantity is not an issue. The most important factor in model performance is having a clean, high quality dataset.  


Cognex’s Blue Read optical character recognition (OCR) tool. Image courtesy of Cognex. 

More than forty years old but still on the cutting edge, Nasdaq-listed (CGNX) Cognex is a world leader in machine vision for industrial automation. Their 2000+ employee team has a hand in almost every step of industrial automation processes, from sensors and barcode scanners to industrial cameras and fully integrated vision systems.

Cognex has machine vision tools for rule-based applications, such as monitoring object location and detecting edges, as well as deep learning tools for cloud connected and edge devices. Their VisionPro Deep Learning software supports standard tasks like defect detection and segmentation, and assembly verification, as well as burgeoning tasks like material classification.

Beyond specific tasks, Cognex’s VisionPro software expedites time to deployment with AutoML capabilities. The label checker automatically verifies the vast majority of labels and flags the remaining images for manual review, minimizing the number of samples a user needs to assess. 

During training, parameter autotune will use input example images to determine the optimal set of hyperparameters. In optical character recognition (OCR), for instance, it can be difficult to recognize text due to the wide spectrum of fonts and potential distortions. Traditional OCR systems require that users specify segmentation hyperparameters to achieve high precision and recall. Cognex Blue Read eliminates this requirement by comparing an input image to the library of hundreds of fonts on which it was trained, and automatically selecting the best hyperparameters.

RIOS Intelligent Machines

RIOS machine trending and material handling robots.

RIOS Intelligent Machines is on a mission to transform labor-intensive factories into smart factories powered by robotics and AI. The company helps its global customers automate their factories, warehouses, and supply chain operations by deploying AI-powered end-to-end robotic workcells that integrate within existing workflows. The Menlo Park, CA-based company was founded by former Xerox PARC engineers who saw a massive failure of traditional robots and predicted that factories over reliance on labor would soon reach a breaking point. 

RIOS has developed some of the most advanced hardware and AI/software platforms in robotics, including human-like tactile sensors for robots, haptics intelligence platform, and highest performance end-of-arm tooling and food-grade grippers. Their AI Controlled Robotics platform delivers fixed, programmable, flexible and integrated automation. They also offer palletizing robots to load and unload products on or off of pallets, plus robotic packaging systems. 

A few more 

It’s impossible to highlight every company doing amazing work at the intersection of computer vision and manufacturing and industrial automation. Here are a few more companies that are pushing the boundaries:

  • PreML GmBH: German startup founded in 2020 focused on automated visual quality inspection.
  • Prophesee: French series C startup pioneering event-based vision.
  • Datalogic: Italy-based leader in automated data capture, barcode readers, sensors, and vision systems.
  • Stemmer Imaging: Based in Germany, S9I is Europe’s largest imaging technology provider, with a hand in everything from photography to factory floor vision systems.
  • Pickit 3D: 2016 spinout of NASA robotics software provider Intermodalics, focused on 3D vision systems for robotic guidance. 
  • Matroid: End-to-end no-code computer vision solutions for quality assurance, assembly verification, and safety and compliance founded by Stanford adjunct professor Reza Zadeh.


Due to the highly proprietary nature of manufacturing and industrial automation processes, public computer vision datasets are few and far between. Hopefully these datasets will help you get started: 

If you would like to see any of these, or other computer vision manufacturing datasets added to the FiftyOne Dataset Zoo, get in touch and we can work together to make this happen!

Join the FiftyOne community!

Developers of manufacturing and industrial automation applications can benefit from FiftyOne’s ability to easily filter through the huge amounts of visual data collected daily from farms and other sources. Using FiftyOne, this data can be curated into datasets for model training, or to share with experts for annotation or analysis of CV models. Join the thousands of engineers and data scientists already using FiftyOne to solve some of the most challenging problems in computer vision today!