Feb 6, 2025 at 9 AM Pacific
Welcome to the Best of NeurIPS virtual series that highlights some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
University of Basel
Benchmark datasets in computer vision often contain issues such as off-topic samples, near-duplicates, and label errors, compromising model evaluation accuracy. This talk will discuss SelfClean, a data-cleaning framework that leverages self-supervised representation learning and distance-based indicators to detect these issues effectively.
By framing the task as a ranking or scoring problem, SelfClean minimizes human effort while outperforming competing methods in identifying synthetic and natural contamination across natural and medical domains. With this methodology, we identified up to 16% of problematic samples in current benchmark datasets and enhanced the reliability of model performance evaluation.
Read the paper, “Intrinsic Self-Supervision for Data Quality Audits”
Vrije Universiteit Brussel
We interpret CLIP’s zero-shot image classification by examining shared textual concepts learned by its vision and language encoders. We analyzes 13 CLIP models across various architectures, sizes, and datasets. The approach highlights a human-friendly way to understand CLIP’s classification decisions.
Read the paper, “Interpreting and Analysing CLIP’s Zero-Shot Image Classification via Mutual Knowledge”
NYU
Motivated by how humans perceive scenes, we propose the Multiview Scene Graph (MSG) as a general topological scene representation. MSG constructs a place+object graph from unposed RGB images and we provide novel metrics to evaluate the graph quality. We combine visual place recognition and object association to build MSG in one Transformer decoder model. We believe MSG can connect dots across classic vision tasks to promote spatial intelligence and open new doors for topological 3D scene understanding.
Read the paper, “Multiview Scene Graph”
Aalto University
Deep neural networks perform exceptionally on clean images but face significant challenges with corrupted ones. While data augmentation with specific corruptions during training can improve model robustness to those particular distortions, this approach typically degrades performance on both clean images and corruptions not encountered during training. In this talk, we present a novel approach that improves DNN robustness across diverse corruptions while maintaining clean image accuracy. Our key insight reveals that input perturbations can be effectively simulated through multiplicative perturbations in the weight space. Building on this finding, we introduce Data Augmentation via Multiplicative Perturbation (DAMP), a training methodology that optimizes DNNs under random multiplicative weight perturbations. Comprehensive experiments across multiple image classification datasets (CIFAR-10/100, TinyImageNet, and ImageNet) and architectures (ResNet50, ViT-S/16, ViT-B/16) demonstrate that DAMP enhances model generalization under corruptions while maintaining computational efficiency comparable to standard SGD. Notably, DAMP successfully trains a ViT-S/16 on ImageNet from scratch without extensive data augmentations and achieves a top-1 error of 23.7%, which is comparable to a ResNet50.
Read the paper, “Improving robustness to corruptions with multiplicative weight perturbations”
Join the AI and ML enthusiasts who have already become members
The goal of the AI, Machine Learning, and Computer Vision Meetup network is to bring together a community of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies. If that’s you, we invite you to join the Meetup closest to your timezone.