Skip to content

Boston AI, ML and Computer Vision Meetup

February 28, 2025 | 5 – 8 PM

When and Where

February 28, 2025 | 5:00 – 8:00 PM

Microsoft Research Lab – New England (NERD) at MIT
Deborah Sampson Conference Room
One Memorial Drive, Cambridge, MA, 02142

Register for the event at Microsoft NERD

A Label is Worth a Thousand Images in Dataset Distillation

Sunny Qin

Harvard University

Dataset distillation is a method to create smaller yet highly informative datasets that retain the efficacy of larger datasets. In this talk, I’ll highlight the importance of soft labels—probabilistic labels that provide more nuanced information than hard labels—in improving model performance. We’ll also cover the relationship between data and knowledge distillation, the role of expert-generated labels, and the challenges and benefits of using soft labels, even in noisy data scenarios.

The Challenges of Learning from Biomedical Data

Eric Ma

Moderna Therapeutics

Abstract coming soon!

ImageNet-RIB Benchmark: Large Pre-Training Datasets Don’t Guarantee Robustness after Fine-Tuning

Jaedong Hwang

MIT

Highly performant large-scale pre-trained models promise to provide a valuable foundation for learning specialized tasks through fine-tuning. By starting with a robust general-purpose model, the goal is to achieve both specialization in the target task and to maintain robustness. To evaluate model robustness to out-of-distribution samples after fine-tuning on downstream datasets, we introduce a new benchmark, ImageNet-RIB (Robustness Inheritance Benchmark). This benchmark comprises a set of related but distinct specialized tasks; pre-trained models are fine-tuned on one task in the set, and their robustness is evaluated on the others, iterating across all tasks. Counterintuitively, we find that model robustness after fine-tuning on related downstream tasks is lowest when the pre-training dataset is the richest and most diverse. This result suggests that starting with the strongest foundation model may not always be the best strategy for achieving optimal performance on specialist tasks.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Harpreet Sahota

Voxel51

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

Find a Meetup Near You

Join the AI and ML enthusiasts who have already become members

The goal of the AI, Machine Learning, and Computer Vision Meetup network is to bring together a community of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies. If that’s you, we invite you to join the Meetup closest to your timezone.