Register for the event
In-person
Americas
Meetups

Seattle AI, ML, and Computer Vision Meetup - July 15, 2026

Jul 15, 2026
5:30 PM - 8:30 PM PT
Union.ai Offices 400 112th Ave NE #115 Bellevue, WA 98004
Speakers
About this event
Join our in-person meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. View more Computer Vision events here.
Schedule
Motivation and Challenges in working with Multimodal Timeseries Data
Physical AI is having its moment, with many companies and research teams focussed on this space. But working with physical AI data means wrestling with high cardinality, complex, non-synced, multi sensor streams that are hard to explore, align and curate. In this talk, we will break down the challenges that come with multimodal time series data, and then look at the research directions this industry is pursuing, the ones that are unlocked when you can actually work with your data effectively.
Orchestrating Scalable AI Workflows with Flyte and Union.ai
Modern AI systems require infrastructure that can reliably orchestrate training, inference, and production workflows at scale. This session explores approaches to AI orchestration, distributed compute, and resilient ML infrastructure for real-world machine learning and computer vision applications.

Topics may include production AI pipelines, workflow automation, scalable deployment strategies, and operating AI systems securely within cloud environments. Attendees will gain a high-level look at emerging patterns shaping the next generation of AI infrastructure and operational workflows.
STELLAR: Learning Sparse Visual Concepts for Unified Vision Models
Modern vision models often split into two regimes: models that learn strong semantics for recognition, and models that preserve spatial detail for reconstruction.
In this talk, we present STELLAR, a self-supervised framework for learning sparse visual concepts as a unified representation for vision models. The key idea is to factorize visual features into semantic concept tokens (the "what"), and spatial assignment maps (the "where"), allowing the model to align concepts across views while preserving the geometry needed for reconstruction.
This sparse, low-rank representation creates a compact interface that supports recognition, dense prediction, and image reconstruction, while also suggesting future directions for efficient visual encoding, video self-supervision, generative modeling, and world-model-style visual reasoning.
We discuss the core method, empirical results, and why concept-centric visual representations may be a useful building block for the next generation of unified vision systems.
Building Foundation Models for Robotic Perception
3D spatial understanding is a critical skill for robotics which typically requires tedious manual design, expensive data collection and per-domain training. This presentation will focus on the development and application of foundation models to address several fundamental challenges in robotic perception, and how they facilitate robotic loco-manipulation skills.
First, we introduce FoundationStereo (CVPR'25 best paper candidate), a novel architecture optimized for zero-shot performance. The model leverages a 1M-pair self-curated synthetic dataset, bridges the sim-to-real gap using monocular priors, and incorporates an advanced filtering module for long-range context reasoning.
Second, we address its computational bottlenecks with Fast-FoundationStereo (CVPR'26). We propose a "divide-and-conquer" acceleration strategy that retains the teacher model's robustness while achieving a 10x speedup, making it suitable for real-time applications.