In-person
EMEA
Meetups
Munich AI, ML and Computer Vision Meetup - April 22, 2026
This event has ended, but you can still catch up! Watch the on-demand recordings and register for our future events.
Apr 22, 2026
5:30 - 8:30 PM CEST
Impact Hub Munich Gotzinger Str. 8 München, Germany 81371
Speakers
About this event
Join the Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Schedule
Learning Disentangled Motion Representations for Open-World Motion Transfer
Recent progress in image- and text-to-video generation has made it possible to synthesize visually compelling videos, yet these models typically lack an explicit, reusable notion of motion. In this talk, I will present recent work on learning high-level, content-independent motion representations directly from open-world video data, with a focus on our NeurIPS spotlight paper introducing DisMo. By disentangling motion semantics from appearance and object identity, such representations enable open-world motion transfer across semantically unrelated entities and provide a flexible interface for adapting and fine-tuning modern video generation models. Beyond generation, I will discuss how abstract motion representations support downstream motion understanding tasks and why they offer a promising direction for more controllable, general, and future-proof video models. The talk will conclude with a broader perspective on the opportunities and challenges of motion-centric representations in computer vision and video learning.
Resources
Towards Generating Fully Navigable 3D Scenes
3D world generation is a longstanding goal of computer vision with applications in VR/gaming/movies, robotics, and digital twins. Recent progress in generative models, in particular image and video diffusion models, enables automatic generation of photorealistic 3D environments. This talk describes a simple yet effective framework to exploit these models for 3D scene genration. Namely, we'll briefly talk about early approaches (Text2Room, ViewDiff) and dive deep into our recent state-of-the-art approach WorldExplorer.
Resources
The Future of 3D Vision Data: From Human Annotation to AI-Generated Data
Accuracy of the dataset is one of the most important, yet often overlooked, aspects of the 3D computer vision field. This talk will start by revisiting my earlier work on 6D pose and depth estimation tasks to highlight how ground-truth errors can cause issues during evaluations, then present practical techniques for accurate data annotation and demonstrate the issues. Finally, we discuss leveraging a diffusion model as a scalable approach to create a large-scale, realistic synthetic dataset that replicates realistic sensor noise.
Navigating a 3D Vision Conference with VLMs and Embeddings
This talk shows how to build a systematic, interactive map of an entire conference using modern open-source tools. We load all 177 papers from 3DV 2026 (full PDF page images plus metadata) into a FiftyOne grouped dataset. We then run three annotation passes using Qwen3.5-9B on each cover page: topic classification, author affiliation extraction, and project page detection. Document embeddings from Jina v4 are computed across all 3,019 page images, pooled to paper-level vectors, and fed into FiftyOne Brain for UMAP visualization, similarity search, representativeness scoring, and uniqueness scoring.

The result is an interactive dataset you can query, filter, and explore in the FiftyOne App. Sort by uniqueness to find distinctive work, filter by topic and sort by representativeness to understand each research area, and cross-reference with scheduling metadata to build a personal agenda.

I demonstrate the end-to-end pipeline and discuss design decisions regarding grouped datasets, reasoning model output parsing, and embedding pooling strategies.