AI, ML and Computer Vision Meetup - October 30, 2025
Oct 30, 2025
9 AM Pacific
Online. Register for the Zoom!
Speakers
About this event
Join the Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.
Schedule
Responsible Generative AI: Ensuring Safety in Text-to-Image and Image-to-Text Generation
This talk addresses key safety challenges in generative AI, focusing on text-to-image and image-to-text generation systems. For text-to-image generation using diffusion-based models, I will cover detecting harmful prompts, removing inappropriate content during generation, and tracing the origins of problematic images. In the image-to-text domain, I will present how images can be used to mislead multi-task prompting and chain-of-thought processes of multimodal LLM, and jailbreak the alignment of multimodal LLMs. This talk provides a comprehensive overview of current practices and emerging solutions in ensuring the safety and reliability of generative AI systems.
Privacy-preserving in Computer Vision through Optics Learning
Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline.
In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design.
It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data.