Munich AI, ML and Computer Vision Meetup - April 22, 2026
Apr 22, 2026
5:30 - 8:30 PM CEST
Impact Hub Munich
Gotzinger Str. 8
München, Germany
81371
Speakers
About this event
Join the Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Schedule
Learning Disentangled Motion Representations for Open-World Motion Transfer
Recent progress in image- and text-to-video generation has made it possible to synthesize visually compelling videos, yet these models typically lack an explicit, reusable notion of motion. In this talk, I will present recent work on learning high-level, content-independent motion representations directly from open-world video data, with a focus on our NeurIPS spotlight paper introducing DisMo. By disentangling motion semantics from appearance and object identity, such representations enable open-world motion transfer across semantically unrelated entities and provide a flexible interface for adapting and fine-tuning modern video generation models. Beyond generation, I will discuss how abstract motion representations support downstream motion understanding tasks and why they offer a promising direction for more controllable, general, and future-proof video models. The talk will conclude with a broader perspective on the opportunities and challenges of motion-centric representations in computer vision and video learning.
Towards Generating Fully Navigable 3D Scenes
3D world generation is a longstanding goal of computer vision with applications in VR/gaming/movies, robotics, and digital twins. Recent progress in generative models, in particular image and video diffusion models, enables automatic generation of photorealistic 3D environments. This talk describes a simple yet effective framework to exploit these models for 3D scene genration. Namely, we'll briefly talk about early approaches (Text2Room, ViewDiff) and dive deep into our recent state-of-the-art approach WorldExplorer.
Finding Motion in Commotion: Estimating and Anticipating Motion in Everyday Visual Scenes
Motion is an intrinsic property of video data. How do we harness motion from the abundance of videos to advance vision foundation models? This talk will examine key challenges and emerging opportunities in motion estimation and motion-aware representation learning at scale. Drawing on our latest results from NeurIPS and ICCV, the talk will show how motion-centric learning can enable more versatile and generalisable vision foundation models.
Small Models, Big Intelligence: How vLLM Semantic Router Uses Sub-2B Language Models for Production-Scale Routing
The vLLM Semantic Router introduces a groundbreaking approach to intelligent LLM request routing through its MoM (Mixture of Models) family, a collection of specialized small language models that make split-second routing decisions for production systems. This system operates between users and models, capturing signals from requests, responses, and context to make intelligent routing decisions, including model selection, safety filtering (jailbreak, PII), semantic caching, and hallucination detection. In this talk, we'll explore how the router leverages tiny but powerful models like ModernBERT (encoder-based) and Qwen3 (0.6B-1.7B parameter decoder models) to achieve sub-10ms latency classification at over 10,000 queries per second. We'll dive into the technical architecture showing how these small models handle domain classification, jailbreak detection, PII protection, and hallucination detection, proving that for routing intelligence, size isn't everything.
Navigating a 3D Vision Conference with VLMs and Embeddings
This talk shows how to build a systematic, interactive map of an entire conference using modern open-source tools. We load all 177 papers from 3DV 2026 (full PDF page images plus metadata) into a FiftyOne grouped dataset. We then run three annotation passes using Qwen3.5-9B on each cover page: topic classification, author affiliation extraction, and project page detection. Document embeddings from Jina v4 are computed across all 3,019 page images, pooled to paper-level vectors, and fed into FiftyOne Brain for UMAP visualization, similarity search, representativeness scoring, and uniqueness scoring.
The result is an interactive dataset you can query, filter, and explore in the FiftyOne App. Sort by uniqueness to find distinctive work, filter by topic and sort by representativeness to understand each research area, and cross-reference with scheduling metadata to build a personal agenda.
I demonstrate the end-to-end pipeline and discuss design decisions regarding grouped datasets, reasoning model output parsing, and embedding pooling strategies.