Skip to content

Best of CVPR

July 10, 2025 | 9 AM Pacific

When

July 10, 2025 at 9 AM Pacific

Where

Online. Register for the Zoom!

Register for the Zoom

OFER : Occluded Face Expression Reconstruction

Pratheba Selvaraju, PhD

Max Planck Institute for Intelligent Systems

Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature. In this paper we introduce OFER, a novel approach for single-image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces by training two diffusion models to generate a shape and expression coefficients of face parametric model, conditioned on the input image. To maintain consistency across diverse expressions, the challenge is to select the best matching shape. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on predicted shape accuracy scores.

SmartHome-Bench: Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal LMMs

Xinyi Zhao

University of Washington

Video anomaly detection is crucial for ensuring safety and security, yet existing benchmarks overlook the unique context of smart home environments. We introduce SmartHome-Bench, a dataset of 1,203 smart home videos annotated according to a novel taxonomy of seven anomaly categories, such as Wildlife, Senior Care, and Baby Monitoring. We evaluate state-of-the-art closed- and open-source multimodal LLMs with various prompting techniques, revealing significant performance gaps. To address these limitations, we propose the Taxonomy-Driven Reflective LLM Chain (TRLC), which boosts detection accuracy by 11.62%.

Interactive Medical Image Analysis with Concept-based Similarity Reasoning

Vu Minh Hieu Phan, PhD

University of Adelaide

What if you could tell an AI model exactly “𝘸𝘩𝘦𝘳𝘦 𝘵𝘰 𝘧𝘰𝘤𝘶𝘴” and “𝘸𝘩𝘦𝘳𝘦 𝘵𝘰 𝘪𝘨𝘯𝘰𝘳𝘦” on a medical image? Our work enables radiologists to interactively guide AI models at test time for more transparent and trustworthy decision-making. This paper introduces the novel Concept-based Similarity Reasoning network (CSR), which offers (i) patch-level prototype with intrinsic concept interpretation, and (ii) spatial interactivity. First, the proposed CSR provides localized explanation by grounding prototypes of each concept on image regions. Second, our model introduces novel spatial-level interaction, allowing doctors to engage directly with specific image areas, making it an intuitive and transparent tool for medical imaging.

Multi-view Anomaly Detection: From Static to Probabilistic Modelling

Mathis Kruse

Leibniz Universität Hannover

The advent of 3D Gaussian Splatting has revolutionized and re-vitalized the interest in multi-view image data. Applications of these techniques to fields such as anomaly detection have been a logical next step. However, some of the limitations of these models may warrant a return to already applied probabilistic techniques. New approaches, difficulties and possibilities in this field will be explored in this talk.

Find a Meetup Near You

Join the AI and ML enthusiasts who have already become members

The goal of the AI, Machine Learning, and Computer Vision Meetup network is to bring together a community of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies. If that’s you, we invite you to join the Meetup closest to your timezone.