Welcome to the Best of ICCV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
Schedule
DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation
We explore a quaternion adjugate matrix-based representation for rotational motion in the Perspective-n-Point (PnP) problem. Leveraging quadratic quaternion terms within a Determinant Ratio Matrix (DRaM) estimation framework, we extend its application to perspective scenarios, providing a robust and efficient initialization for iterative PnP pose estimation. Notably, by solving the orthographic projection least-squares problem, DRaM provides a reliable initialization that enhances the accuracy and stability of iterative PnP solvers. Experiments on synthetic and real data demonstrate its efficiency, accuracy, and robustness, particularly under high noise conditions. Furthermore, our nonminimal formulation ensures numerical stability, making it effective for real-world applications.
GECO: Geometrically Consistent Embedding with Lightspeed Inference
Recent advances in feature learning have shown that self-supervised vision foundation models can capture semantic correspondences but often lack awareness of underlying 3D geometry. GECO addresses this gap by producing geometrically coherent features that semantically distinguish parts based on geometry (e.g., left/right eyes, front/back legs).
We propose a training framework based on optimal transport, enabling supervision beyond keypoints, even under occlusions and disocclusions. With a lightweight architecture, GECO runs at 30 fps, 98.2% faster than prior methods, while achieving state-of-the-art performance on PFPascal, APK, and CUB, improving PCK by 6.0%, 6.2%, and 4.1%, respectively. Finally, we show that PCK alone is insufficient to capture geometric quality and introduce new metrics and insights for more geometry-aware feature learning.
Proactive Comorbidity Prediction in HIV: Towards Fair and Trustworthy Care
HIV is a chronic infection that weakens the immune system and exposes patients to a high burden of comorbidities. While antiretroviral therapy has improved life expectancy, comorbidities remain a major challenge, and traditional screening protocols often fail to capture subtle risk patterns early enough. To address this, we develop a novel method trained on lab tests and demographic data from 2,200 patients in SE London. The method integrates feature interaction modeling, attention mechanisms, residual fusion and label-specific attention heads, outperforming TabNet, MLPs and classical machine learning models. Our experiments show that incorporating demographic information improves predictive performance, though demographic recoverability analyses reveal that age and gender can still be inferred from lab data alone, raising fairness concerns. Finally, robustness checks confirm stable feature importance across cross-validation folds, reinforcing the trustworthiness of our approach.
Toward Trustworthy Embodied Agents: From Individuals to Teams
Modern intelligent embodied agents, such as service robots and autonomous vehicles, interact frequently with humans in dynamic, uncertain environments. They may also collaborate with each other as a team through effective communication to enhance task success, safety, and efficiency. These brings a few significant challenges. First, building reliable agents that safely navigate multi-agent scenarios requires scalable and generalizable prediction of surrounding agents’ behaviors and robust decision making under environmental uncertainty in out-of-distribution (OOD) scenarios. Second, effective cooperation between agents requires efficient communication and information fusion strategies and reliable task planning for complex long-horizon tasks. In this talk, I will introduce a series of our recent work that addresses these challenges to enable safe and trustworthy embodied agents and their application to autonomous driving and service robots. Specifically, I will first demonstrate principled uncertainty quantification techniques and how they enable generalizable prediction and planning in out-of-distribution scenarios. Then, I will talk about effective approaches to enable efficient multi-agent communication and cooperation in centralized and decentralized settings.