Register for the event
In-person
EMEA
Meetups

Paris AI, ML, and CV Meetup - July 28, 2026

Jul 28, 2026
5:30 PM - 8:30 PM CEST
InstaDeep SAS 42 Rue de Paradis 75010 Paris, France
Speakers
About this event
Join our in-person meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. View more CV events here.
Schedule
Building Real-World Computer Vision Systems with Voxel51
This talk will explore practical workflows for building, evaluating, and improving modern computer vision systems. We’ll dive into real-world approaches to dataset curation, model analysis, multimodal AI workflows, and production-ready vision pipelines using open-source technologies.

The session is designed for engineers, researchers, and AI practitioners looking to better understand how teams are developing and scaling computer vision applications today. Expect practical demos, technical insights, and discussions around the evolving AI tooling ecosystem.
Computer Vision at Nanoscale - Detecting, Segmenting and Analyzing Nanoparticles in microscopic images
This talk covers detecting, segmenting, and analyzing nanoparticles in microscopic images — including YOLO models, Classical CV, and Gradio.
Finetuning VLMs for domain specific tasks
Vision-language models are becoming powerful general-purpose tools, but their real value often appears when they are adapted to domain-specific tasks. In this talk, I will present practical strategies for fine-tuning VLMs on specialized data, from dataset design and annotation quality to model selection, evaluation, and deployment constraints. I will discuss common challenges such as limited data, visual ambiguity, domain shift, hallucinations, and maintaining robustness in production. The session will focus on applied lessons for building reliable VLM systems that go beyond demos and solve concrete business or research problems.
Towards a Resolution- and Modality-Agnostic Transformers for Earth Observation
Vision Transformers (ViT) dominate computer vision. However, their reliance on rigid patch projectors hinders transfer to Earth Observation (EO), where inputs vary widely in modality, scale, and resolution. We introduce UniverSat, a ViT-style backbone built around a Universal Patch Encoder that maps patches from arbitrary spatial, spectral, and temporal resolutions, and from both optical and non-optical sensors, into a shared embedding space with a single set of weights. This enables training one model on heterogeneous multimodal corpora in self-supervision, yielding robust sensor-agnostic spatial features.
Efficient Image Generation through Smarter Data, Objectives, and Alignment
State-of-the-art image generation models require massive web scrapes and expensive post-training alignment. This talk explores three recent works that challenge the "bigger is better" paradigm to build efficient and controllable models. First, we show how ImageNet alone (only 1/1000th the training data of Stable Diffusion) can match billion-scale models using a fraction of the compute. Second, we introduce a frequency-balanced training objective that overcomes spectral bias, learning high-fidelity textures up to 40% faster. Finally, we present MIRO, a multi-reward pretraining method that bakes human preferences directly into the model, bypassing costly post-hoc RLHF and outperforming models 30x its size.