SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation
The growing integration of computer vision and machine learning into the retail industry—both online and in physical stores—has driven the adoption of multimodal recommender systems to help users navigate increasingly complex product landscapes. These systems leverage diverse data sources, such as product images, textual descriptions, and user-generated content, to better model user preferences and item characteristics. While the fusion of multimodal data helps address issues like data sparsity and cold-start problems, it also introduces challenges such as information inconsistency, noise, and increased training instability. In this paper, we analyze these robustness issues through the lens of flat local minima and propose a strategy that incorporates BLIP—a Vision-Language Model with strong denoising capabilities—to mitigate noise in multimodal inputs. Our method, Sharpness-Aware Mirror Gradient with BLIP-Based Denoising (SGBD), is a concise yet effective training strategy that implicitly enhances robustness during optimization. Extensive theoretical and empirical evaluations demonstrate its effectiveness across various multimodal recommendation benchmarks. SGBD offers a scalable solution for improving recommendation performance in real-world retail environments, where noisy, high-dimensional, and fast-evolving product data is the norm, making it a promising paradigm for training robust multi-modal recommender systems in retail industry.
Resources
About the Speaker
Kathy Wu holds a Ph.D. in Applied Mathematics and dual M.S. degrees in Computer Science and Quantitative Finance from the University of Southern California (USC), Los Angeles, CA, USA. At USC, she served as a course lecturer, offering ML Foundations and ML for Business Applications in the science school and business school. Her academic research spans high-dimensional statistics, deep learning, and causal inference, etc.
Kathy brings industry experience from Meta, LinkedIn, and Morgan Stanley in the Bay Area and New York City, US, where she focused on AI methodologies and real-world applications. She is currently an Applied Scientist at Amazon, within the Global Store organization, leading projects in E-Commerce Recommendation Systems, Search Engines, Multi-Modal Vision-Language Models (VLMs), and LLM/GenAI in retails.
Her work has been published in top-tier conferences including ICCV, CVPR, ICLR, SIGIR, WACV, etc. At ICCV 2025, she won the Best Paper Award in Retail Vision.