July 9, 2025 | 9 AM Pacific
July 9, 2025 at 9 AM Pacific
Online. Register for the Zoom!
Porsche AG
Foundation models hold the potential to generalize the driving task and support language-based interaction in autonomous driving. However, they continue to struggle with specific reasoning tasks essential for robotic navigation. Current benchmarks typically provide only aggregate performance scores, making it difficult to assess the underlying capabilities these models require. Drive4C addresses this gap by introducing a closed-loop benchmark that evaluates semantic, spatial, temporal, and physical understanding—enabling more targeted improvements to advance foundation models for autonomous driving.
TUM
Predicting future human motion is a key challenge in generative AI and computer vision, as generated motions should be realistic and diverse at the same time. This talk presents a novel approach that leverages top-performing latent generative diffusion models with a novel paradigm. Nonisotropic Gaussian diffusion leads to better performance, fewer parameters, and faster training at no additional computational cost. We will also discuss how such benefits can be obtained in other application domains.
Corteva Agriscience
We propose an efficient few-shot adaptation method for the Grounding-DINO open-set object detection model, designed to improve performance on domain-specific specialized datasets like agriculture, where extensive annotation is costly. The method circumvents the challenges of manual text prompt engineering by removing the standard text encoder and instead introduces randomly initialized, trainable text embeddings. These embeddings are optimized directly from a few labeled images, allowing the model to quickly adapt to new domains and object classes with minimal data. This approach demonstrates superior performance over zero-shot methods and competes favorably with other few-shot techniques, offering a promising solution for rapid model specialization.
Nanyang Technological University
University of Texas
Optical imaging capable of resolving nanoscale features would revolutionize scientific research and engineering applications across biomedicine, smart manufacturing, and semiconductor quality control. However, due to the physical phenomenon of diffraction, the optical resolution is limited to approximately half the wavelength of light, which impedes the observation of subwavelength objects such as the native state coronavirus, typically smaller than 200 nm. Fortunately, deep learning methods have shown remarkable potential in uncovering underlying patterns within data, promising to overcome the diffraction limit by revealing the mapping pattern between diffraction images and their corresponding ground truth object images. However, the absence of suitable datasets has hindered progress in this field —— collecting high-quality optical data of subwavelength objects is highly difficult as these objects are inherently invisible under conventional microscopy, making it impossible to perform standard visual calibration and drift correction. Therefore, we provide the first general optical imaging dataset based on the “building block” concept for challenging the diffraction limit. Drawing an analogy to modular construction principles, we construct a comprehensive optical imaging dataset comprising subwavelength fundamental elements, i.e., small square units that can be assembled into larger and more complex objects. We then frame the task as an image-to-image translation task and evaluate various vision methods. Experimental results validate our “building block” concept, demonstrating that models trained on basic square units can effectively generalize to realistic, more complex unseen objects. Most importantly, by highlighting this underexplored AI-for-science area and its potential, we aspire to advance optical science by fostering collaboration with the vision and machine learning communities.
Join the AI and ML enthusiasts who have already become members
The goal of the AI, Machine Learning, and Computer Vision Meetup network is to bring together a community of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies. If that’s you, we invite you to join the Meetup closest to your timezone.