Day 3 of our Best of ICCV series shifts focus to a critical theme: building trustworthy and dependable AI. From camera tracking in robotics to medical diagnosis in HIV care, today's ICCV papers tackle a question that matters beyond benchmarks—how do we build accurate AI models that work reliably, with noisy data, when real stakes are involved?

This is the third blog in a four-part series. We're keeping things accessible and practical, with insights from four excellent papers that push the boundaries of geometric reasoning, computational efficiency, healthcare AI, and multi-agent systems.

DRaM-LHM: How to tell a camera where it is, faster and more accurately [1]

Paper title: DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation

Paper authors: Chen Lin, Weizhi Du, Zhixiang Min, Baochen She, Enrique Dunn, Sonya M. Hanson

Institutions: Flatiron Institute, University of Michigan, Stevens Institute of Technology, Stanford University

What it’s about

Robots, AR headsets, and autonomous vehicles constantly need to answer one question: where am I? Camera pose estimation, figuring out exactly where a camera is positioned and oriented in 3D space, is fundamental to everything from SLAM systems to visual relocalization. But existing methods often struggle with a tradeoff between speed and accuracy, especially when data gets noisy.

This paper introduces DRaM-LHM, a framework for solving the Perspective-n-Point (PnP) problem using quaternion mathematics. The approach leverages a Determinant Ratio Matrix (DRaM) to solve an orthographic projection problem first, providing a robust initialization for iterative pose refinement. By parametrizing rotations through quaternion adjugate matrices, this method enhances AI model accuracy through numerical stability and computational efficiency.

Why it matters for trustworthy AI

Camera pose estimation is everywhere in robotics and AR/VR, but it's surprisingly hard to get right. The challenge lies in representing rotations on the SO(3) manifold while avoiding singularities and maintaining AI accuracy under noise. Poor initialization can cause iterative solvers to converge slowly or get stuck in local minima. DRaM-LHM addresses this by providing a mathematically principled starting point that's both fast to compute and resistant to data noise—critical for real-time applications where every millisecond counts.

How it works

Quaternion adjugate representation: Uses a quadratic quaternion mapping that transforms rotations into a 10-dimensional space, reducing polynomial complexity in the optimization
Two-stage approach: First solves a simplified orthographic projection problem using DRaM to get a strong initial pose estimate, then refines using object-space collinearity error minimization
Bar-Itzhack correction: Applies orthogonal correction to ensure the estimated rotation matrix stays on the proper rotation manifold, even with noisy data
Adaptive initialization: Automatically switches between DRaM and traditional methods based on scene geometry, handling both well-distributed and quasi-planar configurations

Key result

On synthetic data with high noise (σ = 5), DRaM-LHM matched or exceeded the accuracy of SQPnP (the current state-of-the-art) while requiring fewer than 10 iterations. On real-world SLAM datasets (TUM RGB-D and EuRoC MAV), it achieved AUC scores of 0.9353 on challenging sequences, outperforming EPnP, DLS, and UPnP while maintaining competitive runtime with significantly more robust performance under motion blur and camera shake.

Broader impact

DRaM-LHM provides a practical solution for pose estimation in resource-constrained robotics applications. Its ability to handle high levels of data noise makes it valuable for low-cost camera systems, while its computational efficiency enables real-time operation on embedded platforms. The method's mathematical rigor and numerical stability also make it promising for scientific imaging applications like cryo-electron microscopy and aerial mapping, where AI accuracy is paramount.

GECO: Teaching AI how to tell left from right geometric properties [2]

Paper title: GECO: Geometrically Consistent Embedding with Lightspeed Inference

Paper authors: Regine Hartwig, Dominik Muhle, Riccardo Marin, Daniel Cremers

Institution: Technical University of Munich, Munich Center for Machine Learning

What it’s about

Self-supervised vision models like DINO and DINOv2 are powerful at finding semantic matches between images—but they have an embarrassing blindspot. Ask them to distinguish a left eye from a right eye, and they often fail spectacularly. This "Janus problem" has plagued 3D reconstruction pipelines, sometimes producing animals with five legs or chairs with misaligned parts.

GECO introduces a lightweight training framework that teaches vision foundation models to understand geometric properties while maintaining blazing-fast inference. Using optimal transport-based supervision on sparse keypoint annotations, GECO produces features that can distinguish symmetric parts (left vs. right eyes, front vs. back legs) without sacrificing the semantic understanding that makes foundation models useful. Critically, it achieves this at 30 fps—nearly 100 times faster than previous geometry-aware methods.

Why it matters for trustworthy AI

Most vision models are trained with image augmentations like horizontal flipping, which actively teaches them to ignore left-right distinctions. This creates problems in applications requiring geometric AI model accuracy: 3D reconstruction, robotic manipulation, medical imaging, and correspondence estimation. Previous solutions like Geo addressed geometric awareness but took over 2 seconds per image, making them impractical for real-time use. GECO proves you can have both geometric understanding and real-time performance.

How it works

Optimal transport loss: Replaces traditional argmax matching with a differentiable Sinkhorn algorithm that provides dense supervision across all image patches, not just annotated keypoints
Handling occlusions: Introduces a "dustbin" category in the assignment matrix, explicitly teaching the model what to do when a point's match is hidden in the target view
Lightweight architecture: Fine-tunes DINOv2-B using LoRA (Low-Rank Adaptation) with only 1.4 MB of additional parameters, adding less than 0.5 ms to inference time
Training on sparse annotations: Works with limited keypoint labels but generates dense feature improvements through the structured optimal transport formulation

Key result

GECO achieved state-of-the-art performance on multiple benchmarks: 92.1% PCK on PFPascal dataset (+6.0%), 86.7% on APK dataset (+6.2%), and 92.5% on CUB dataset (+4.1%), while running at 30-45 ms per image compared to Geo's 2,127 ms—a 98.2% speed improvement. The method also excelled at part segmentation, correctly distinguishing challenging symmetric parts like left/right eyes and wings where less accurate baseline AI models failed.

Broader impact

GECO enables geometry-aware computer vision at the speed required for real-world robotics and interactive systems. Its efficiency makes it practical for applications previously limited by computational constraints: real-time object manipulation, AR/VR systems requiring precise correspondence, and 3D reconstruction pipelines. By achieving geometric awareness without sacrificing the semantic richness of foundation models, GECO opens pathways for more capable vision systems across robotics, graphics, and spatial computing.

Proactive HIV Care: Can AI accurately predict HIV before symptoms appear? [3]

Paper Title: Proactive HIV Care: AI-Based Comorbidity Prediction from Routine EHR Data

Paper Authors: Solomon Russom, Dimitrios Kollias, Qianni Zhang

Institution: Center for Multimodal AI, Digital Environment Research Institute, Queen Mary University of London

What it’s about

People living with HIV face elevated risks for multiple comorbidities—cardiovascular disease, kidney dysfunction, metabolic disorders, mental health conditions. Traditional care relies on symptom-driven screening, but what if AI could accurately identify at-risk patients earlier, before symptoms emerge, using data collected during routine visits?

This paper develops machine learning and deep learning models to predict multiple comorbidities simultaneously from Electronic Health Records (EHR) of HIV patients. Using data from 2,200 patients in South East London spanning 2012-2023, the study evaluates six different model architectures on a multi-label classification task. The research specifically examines whether including demographic information (age, gender, race) alongside laboratory markers improves AI prediction accuracy, while also investigating fairness implications.

Why it matters for AI accuracy

Higher AI accuracy can improve patient outcomes, but only if deployed responsibly. This work highlights both the potential and the ethical complexity of predictive clinical AI. Antiretroviral therapy has transformed HIV into a manageable chronic condition, but longer life expectancy means HIV patients now face complex comorbidity burdens. Current screening approaches are reactive—conditions are identified after symptoms appear. An accurate AI system that proactively identifies high-risk patients from routine blood tests could enable earlier interventions, personalized treatment adjustments, and better resource allocation. However, using demographic data in medical AI raises critical questions about bias, fairness, and whether models might inadvertently discriminate.

How it works

Multi-label framework: Trained models to predict 12 ICD-10 disease categories simultaneously from 30 laboratory markers (liver enzymes, kidney function, immune cell counts, etc.) and 7 demographic/social variables
Two experimental settings: Compared "demographic-aware" models (using lab tests + demographics) against "demographic-unaware" models (lab tests only)
Multiple architectures: Evaluated Logistic Regression, Random Forest, XGBoost, LightGBM, MLPs, and TabNet using stratified k-fold cross-validation
Fairness analysis: Conducted "demographic recoverability" experiments to measure whether gender, age, and race could be inferred from lab data alone

Key result

Demographic-aware models consistently outperformed demographic-unaware counterparts across all architectures. XGBoost achieved the highest macro F1 score of 45.8% compared to 43.5% without demographics. However, recoverability experiments revealed that gender could be predicted with 92.8% accuracy and age with 45.4% accuracy from laboratory data alone, indicating that demographic information is already implicitly encoded in biomarkers—raising important fairness considerations even for "demographic-blind" models.

Broader impact

This work demonstrates AI's potential for proactive, population-scale health screening in HIV care, enabling clinicians to identify at-risk patients before conditions become symptomatic. The demographic recoverability findings, however, highlight a crucial tension: while demographic context improves clinical predictions (reflecting real biological differences in disease risk), it also raises concerns about proxy discrimination and bias. The research underscores the need for careful consideration of fairness, interpretability, and privacy when deploying clinical AI systems, particularly for vulnerable patient populations.

UniOcc: Building autonomous embodied AI agents you can actually trust [4]

Paper title: UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

Paper authors: Yuping Wang, Xiangyu Huang, Xiaokang Sun, Mingxuan Yan, Shuo Xing, Zhengzhong Tu, Jiachen Li

Institution: University of California Riverside, University of Michigan, University of Wisconsin Madison, Texas A&M University

What it’s about

Modern intelligent embodied agents—service robots, delivery drones, autonomous vehicles—increasingly operate alongside humans in dynamic, unpredictable environments. They don't just need to work; they need to work safely, even when facing scenarios they've never encountered before. And when multiple agents collaborate, the challenges multiply.

This work addresses fundamental challenges in building trustworthy embodied AI agents, from individual systems to multi-agent teams. The research tackles two critical problems: first, enabling agents to safely navigate multi-agent scenarios through scalable behavior prediction and robust decision-making under uncertainty, particularly in out-of-distribution (OOD) scenarios; second, facilitating effective cooperation between agents through efficient communication strategies and reliable task planning for complex, long-horizon objectives.

Why it matters for trustworthy AI

Trustworthy AI requires more than perception accuracy, it requires systems that know when they’re uncertain, adapt gracefully, and coordinate reliably with others. As robots and autonomous vehicles move from controlled environments into the real world, they encounter situations not covered in their training data. A delivery robot navigating a crowded sidewalk faces pedestrians behaving unpredictably. An autonomous vehicle at an intersection must reason about other drivers' intentions while managing uncertainty about their actions. When these systems fail to generalize or coordinate effectively, the consequences can be dangerous. Building agents that can handle noisy data and uncertainty gracefully while cooperating reliably is essential for safe accurate AI deployment at scale.

How it works

Principled uncertainty quantification: Develops techniques to measure and reason about prediction uncertainty, enabling agents to make safer decisions when facing unfamiliar scenarios
Generalizable prediction: Creates models that can forecast surrounding agents' behaviors even in OOD conditions, improving robustness across diverse environments
Multi-agent communication: Designs efficient information fusion strategies for both centralized (single decision-maker) and decentralized (distributed control) cooperative systems
Long-horizon planning: Addresses task decomposition and coordination for complex objectives requiring sustained multi-agent collaboration

Key result

The presented techniques enable embodied agents to maintain safety and task performance in scenarios significantly different from their training distributions. The uncertainty quantification methods provide calibrated confidence estimates that allow agents to recognize when they're operating beyond their competence, enabling graceful degradation rather than catastrophic failure. In multi-agent settings, the communication strategies improve coordination efficiency while reducing noisy data information overhead.

Broader impact

This work provides foundational techniques for deploying embodied AI in safety-critical applications. By addressing both single-agent robustness and multi-agent coordination, it advances the state of autonomous driving, service robotics, and collaborative robot systems. The emphasis on uncertainty quantification and OOD generalization directly addresses regulatory and safety concerns that currently limit real-world deployment, while the multi-agent cooperation frameworks enable more efficient and scalable autonomous systems across logistics, manufacturing, and transportation domains.

Wrapping up day 3

The four papers we've explored today establish that AI trust and accuracy are becoming a priority in computer vision research. Whether it's a camera tracking system that maintains accuracy under extreme data noise, features that can be computed 100× faster while preserving geometric understanding, clinical AI that proactively identifies disease risk while grappling with fairness concerns, or autonomous agents that quantify their own uncertainty, each contribution addresses the gap between what works in the lab and what's needed in production.

Together, they signal important directions for the field:

From approximation to precision: Mathematical frameworks that maintain geometric and numerical AI accuracy under real-world conditions
From slow to real-time: Achieving state-of-the-art results at speeds that enable practical deployment
From reactive to proactive: Systems that anticipate problems before they become critical
From isolated to collaborative: Agents that work together reliably and communicate efficiently

As computer vision moves deeper into healthcare, robotics, and autonomous systems, we need more than impressive metrics. We need AI methods that are fast enough for real-time use, robust enough for noisy data sensors, interpretable enough for clinical trust, and accurate enough for safety-critical decisions.

Continue exploring the breakthroughs from The Best of ICCV 2025 Series:

Register for our virtual meetup to connect and discover how accurate AI models that can read noisy data are shaping the next era of trust in computer vision.

References

[1] Lin, C., Du, W., Min, Z., She, B., Dunn, E., and Hanson, S.M. "DRaM-LHM: A Quaternion Framework for Iterative Camera Pose Estimation,"in Proceedings of the IEEE/CVF Int. Conf. Computer Vision (ICCV), 2025.

[2] Hartwig, R., Muhle, D., Marin, R., and Cremers, D. "GECO: Geometrically Consistent Embedding with Lightspeed Inference," in Proceedings of the IEEE/CVF Int. Conf. Computer Vision (ICCV), 2025. arXiv:2508.00746.

[3] Russom, S., Kollias, D., and Zhang, Q. "Proactive HIV Care: AI-Based Comorbidity Prediction from Routine EHR Data," in Proceedings of the IEEE/CVF Int. Conf. Computer Vision (ICCV), 2025. arXiv:2508.20133.

[4] Wang, Y., Huang, X., Sun, X., Yan, M., Xing, S., Tu, Z., & Li, J. “UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving,” in Proceedings of the IEEE/CVF Int. Conf. Computer Vision (ICCV), 2025. arXiv:2503.24381

Talk to a computer vision expert

DRaM-LHM: How to tell a camera where it is, faster and more accurately [1]

What it’s about

Why it matters for trustworthy AI

How it works

Key result

Broader impact

GECO: Teaching AI how to tell left from right geometric properties [2]

What it’s about

Why it matters for trustworthy AI

How it works

Key result

Broader impact

Proactive HIV Care: Can AI accurately predict HIV before symptoms appear? [3]

What it’s about

Why it matters for AI accuracy

How it works

Key result

Broader impact

UniOcc: Building autonomous embodied AI agents you can actually trust [4]

What it’s about

Why it matters for trustworthy AI

How it works

Key result

Broader impact

Wrapping up day 3

References

Talk to a computer vision expert