Welcome to the Best of 3DV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. View more CV events here.
Schedule
Navigating a 3D Vision Conference with VLMs and Embeddings
Attending the 3D Vision Conference means confronting 177 accepted papers across 3.5 days, far more than any one person can absorb. Skimming titles the night before isn't enough.
This talk shows how to build a systematic, interactive map of an entire conference using modern open-source tools. We load all 177 papers from 3DV 2026 (full PDF page images plus metadata) into a FiftyOne grouped dataset. We then run three annotation passes using Qwen3.5-9B on each cover page: topic classification, author affiliation extraction, and project page detection. Document embeddings from Jina v4 are computed across all 3,019 page images, pooled to paper-level vectors, and fed into FiftyOne Brain for UMAP visualization, similarity search, representativeness scoring, and uniqueness scoring.
The result is an interactive dataset you can query, filter, and explore in the FiftyOne App. Sort by uniqueness to find distinctive work, filter by topic and sort by representativeness to understand each research area, and cross-reference with scheduling metadata to build a personal agenda.
I demonstrate the end-to-end pipeline and discuss design decisions regarding grouped datasets, reasoning model output parsing, and embedding pooling strategies.
INTERIORAGENT: LLM Agent for Interior Design-Aware 3D Layout Generation
Creating interior layout designs has numerous applications, including virtual reality, architectural visualization and real estate planning. Generating realistic and functional indoor scenes requires a nuanced understanding of spatial configurations and human-centered design principles. We propose INTERIORAGENT , an LLM-agent-driven framework for text-to-3D indoor scene generation that produces scenes with visual quality and functional utility that significantly surpass prior works. We achieve this through several key advantages of INTERIORAGENT : (1) encoding of interior design principles with a novel scene description language, (2) aesthetics and functionality through synthesis tools that satisfy design principles, (3) realism and prompt adherence with optimization tools that ensure ergonomics and iterative constraint satisfaction, (4) extensibility with a framework that allows incorporating even mature, complex tools like diffusion models, LLMs and 3D generation repositories. We evaluate INTERIORAGENT through a user study, where participants strongly favor its generated scenes over prior state-of-the-art methods. Additionally, we demonstrate novel applications uniquely enabled by INTERIORAGENT, including language-based scene editing and seamless tool integration for new tasks. Code and data will be publicly released.
Generating dynamic 3D content that moves and deforms over time is a key frontier in visual computing, with applications in VR/AR, robotics, and digital humans. In this talk, I present our series of works on physically realistic 4D generation: from neural surface deformation with explicit velocity fields (ICLR 2025) to our 4Deform framework for robust shape interpolation (CVPR 2025). Both methods use implicit neural representations with physically constrained velocity fields that enforce volume preservation, spatial smoothness, and geometric consistency. I will also introduce TwoSquared (3DV 2026, oral), which achieves full 4D generation from just two 2D image pairs — demonstrating a practical path toward controllable, physically plausible 4D content creation.
Finding NeMO: A Geometry-Aware Representation of Template Views for Few-Shot Perception
How can we use and perceive objects without training a new model given only a few images? We present NeMO, a novel object representation that allows 6DoF object pose estimation, detection and segmentation given only a hand full of RGB images of an unknown object.