Virtual

Americas

Meetups

Best of CVPR – July 10, 2025

Jul 10, 2025

9 AM Pacific

Online. Register for the Zoom!

Speakers

About this event

Welcome to the Best of CVPR series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Schedule

OFER : Occluded Face Expression Reconstruction

Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature. In this paper we introduce OFER, a novel approach for single-image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces by training two diffusion models to generate a shape and expression coefficients of face parametric model, conditioned on the input image. To maintain consistency across diverse expressions, the challenge is to select the best matching shape. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on predicted shape accuracy scores.

SmartHome-Bench: Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal LMMs

Video anomaly detection is crucial for ensuring safety and security, yet existing benchmarks overlook the unique context of smart home environments. We introduce SmartHome-Bench, a dataset of 1,203 smart home videos annotated according to a novel taxonomy of seven anomaly categories, such as Wildlife, Senior Care, and Baby Monitoring. We evaluate state-of-the-art closed- and open-source multimodal LLMs with various prompting techniques, revealing significant performance gaps. To address these limitations, we propose the Taxonomy-Driven Reflective LLM Chain (TRLC), which boosts detection accuracy by 11.62%.

Interactive Medical Image Analysis with Concept-based Similarity Reasoning

What if you could tell an AI model exactly "𝘸𝘩𝘦𝘳𝘦 𝘵𝘰 𝘧𝘰𝘤𝘶𝘴" and "𝘸𝘩𝘦𝘳𝘦 𝘵𝘰 𝘪𝘨𝘯𝘰𝘳𝘦" on a medical image? Our work enables radiologists to interactively guide AI models at test time for more transparent and trustworthy decision-making. This paper introduces the novel Concept-based Similarity Reasoning network (CSR), which offers (i) patch-level prototype with intrinsic concept interpretation, and (ii) spatial interactivity. First, the proposed CSR provides localized explanation by grounding prototypes of each concept on image regions. Second, our model introduces novel spatial-level interaction, allowing doctors to engage directly with specific image areas, making it an intuitive and transparent tool for medical imaging.

Multi-view Anomaly Detection: From Static to Probabilistic Modelling

The advent of 3D Gaussian Splatting has revolutionized and re-vitalized the interest in multi-view image data. Applications of these techniques to fields such as anomaly detection have been a logical next step. However, some of the limitations of these models may warrant a return to already applied probabilistic techniques. New approaches, difficulties and possibilities in this field will be explored in this talk.