Best of ICRA - July 22, 2026

Name: Best of ICRA - July 22, 2026
Start: 2026-07-22
End: 2026-07-22

Jul 22, 2026

9:00 AM - 11:00 AM PST

Online. Register for the Zoom!

Day 1 Day 2 Day 3

Speakers

About this event

The Best of ICRA is a three-day virtual meetup series featuring researchers presenting their accepted papers from the 2026 International Conference on Robotics and Automation (ICRA).

👉 Register for any day to get access to all three days of the Best of ICRA.

Each session features a curated lineup of speakers sharing cutting-edge research across robotics, computer vision, and AI — straight from papers accepted at one of the field's top conferences.

Whether you're a researcher, engineer, or practitioner, you'll leave with a sharper view of where the field is heading.

Schedule

Contrastive learning on 3d point clouds for geometric defect detection

Reliable 3D defect detection in manufacturing is hard: the input is a point cloud — an unordered set that standard neural backbones cannot process directly; high-quality training data is scarce; and real scans are noisy and arrive in arbitrary orientations. We address these challenges in COSARAD, a contrastive learning framework that learns highly discriminative representations of object surface geometry under weak supervision.

When a test object arrives, we extract its features and compare them against a library of defect-free reference shapes for precise, interpretable defect localization — achieving state-of-the-art accuracy on industrial benchmarks such as Real3D-AD. In my talk, I'll cover the design choices behind the system, why contrastive representation learning is the right fit for sparse 3D data, and open problems in scaling inspection to production.

A Semantic and Occlusion-Aware Gaussian Mixture Probability Hypothesis Density Filter

Reliable and resilient multi-target tracking is foundational for safe autonomous driving, yet most perception pipelines frequently struggle with sensor noise, heavy clutter, and severe environmental occlusions. To resolve these limitations, this talk presents a novel Semantic-Occlusion Aware (S-OA) Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter.

By combining geometric occlusion reasoning with deep learning-derived environmental semantics, the proposed framework adaptively initializes target tracking in regions where new targets are likely to appear. Evaluations demonstrate that this context-aware tracking system minimizes track initiation latency and preserves high tracking precision even under intense clutter.

Ultimately, this work demonstrates how embedding spatial and semantic structure into filtering yields a significantly more robust and resilient perception stack for autonomous navigation.

An Annotation-to-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

Autonomous robots struggle to detect objects in unstructured fields, requiring in-domain tuning with laborious manual data collection. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data.

Our method combines cross-modal annotation transfer, early sensor fusion, and a multi-stage detection architecture to train and enhance multi-modal detection. Validated on vineyard trunk detection and paired with a custom LOAM algorithm, it localised over 70% of trees in one pass with under 0.37 m mean error.

Our system demonstrated that robust detection is achievable even with minimal initial annotations and human intervention.

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding

We introduce vS-Graphs, a novel real-time VSLAM framework that integrates vision-based scene understanding with map reconstruction and comprehensible graph-based representation. The framework infers structural elements (i.e., rooms and floors) from detected building components (i.e., walls and ground surfaces) and incorporates them into optimizable 3D scene graphs.

This solution enhances the reconstructed map's semantic richness, comprehensibility, and localization accuracy.