Visual AI in Manufacturing Meetup - July 15, 2026

Name: Visual AI in Manufacturing Meetup - July 15, 2026
Start: 2026-07-15
End: 2026-07-15

Jul 15, 2026

9:00 AM - 11:00 AM PST

Online. Register for the Zoom!

Speakers

About this event

Join our virtual meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. View more Computer Vision events here.

Schedule

Agentic Open-Vocabulary Annotation for Industrial Defects

In manufacturing inspection, defects can be open-ended and expensive to label. This leaves detectors brittle as new failure modes and imaging conditions appear. This talk shows how open-vocabulary segmentation and the FiftyOne Agent turn first-pass masks into governed, versioned annotations at scale, with conditional ontologies and risk-level guardrails keeping the loop auditable.

This talk ties the approach to CVPR 2026's VAND 4.0 work on zero-shot VLMs and robustness under distribution shift.

Enabling Multimodal Agents on the Edge

The next generation of AI agents is moving beyond cloud-based text-only models and will interact with the physical multimodal world in real-time. For example in the vision domain, AI agents rely on Vision-Language Models (VLMs) in their backbone. However, deploying massive VLMs with billions of parameters on the edge devices remains a significant engineering hurdle.

Drawing on our recent ICML and CVPR research papers, this session explores advancements in agentic model optimizations, specifically how distillation and pruning transform 'heavyweight' models into lean, edge-ready engines. Lastly, I present our UI agent running on the actual phone that is being developed by our lab's team.

When the Camera Can’t Be Trusted: Health-Aware Visual AI for Reliable Near-Miss Detection

Near-miss detection systems are often evaluated as though every camera frame is equally trustworthy, even though blur, poor exposure, occlusion, contamination, and changing lighting can silently degrade the visual evidence used to make safety decisions. This talk presents an online camera-health framework that estimates visual reliability before downstream perception performance significantly deteriorates.

I will discuss how camera-health signals can support condition-aware evaluation, prioritize human review, reduce unreliable alerts, and trigger appropriate fallback behavior. Drawing from research in safety-critical visual perception, the talk will demonstrate how these principles can be adapted to industrial video systems operating across different cameras, shifts, layouts, and environmental conditions.

The presentation will also connect camera-health monitoring with rare-event discovery and failure-driven dataset improvement for more trustworthy near-miss detection.

Agentic VLM applications in manufacturing

Vision Language Models (VLMs) introduce net-new functionality to vision workloads in manufacturing that traditional computer vision models simply do not offer (e.g., open-vocabulary detection, in-context-learning). Even so, fine-tuned models like YOLO offer a level of precision and recall that today's VLMs struggle to match out-of-the-box.

Through agentic harnesses that coordinate calls to VLMs, we can start to deliver similar reliability on manufacturing-relevant tasks (e.g., many-class, many-instance detection), while also supporting the net new functionalities (e.g., multimodal search) that make VLMs distinct. In this talk, we walk through the design of these harnesses, how you serve them efficiently, and how they deliver value in manufacturing.