Skip to content

AI, Machine Learning and Computer Vision Meetup

March 20, 2025 | 8:30 AM Pacific

Register for the Zoom

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

Satvik Dixit

Carnegie Mellon University

The development of multimodal AI agents marks a pivotal step toward creating systems Current audio language models lag behind the text-based LLMs and Vision Language Models (VLMs) in reasoning capabilities. Incorporating audio information into VLMs could help us leverage their advanced language reasoning capabilities for audio input. To explore this, this talk will cover how VLMs (such as GPT-4o and Claude-3.5-sonnet) can recognize audio content from spectrograms and how this approach could enhance audio understanding within VLMs.

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Aditi Singh, Ph.D.

Cleveland State University

In this talk, we will explore Agentic Retrieval-Augmented Generation, or Agentic RAG, a groundbreaking method that enhances Large Language Models (LLMs) by combining intelligent retrieval with autonomous agents. We will discover how Agentic RAG leverages advanced agentic behaviors such as reflection, planning, tool use, and multiagent collaboration to dynamically refine retrieval strategies and adapt workflows, significantly improving real-time responsiveness and complex task management

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Vishaal Udandarao

University of Tübingen

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight inheritance. In this talk, I will describe an alternative, yet simple approach — active data curation as effective distillation for contrastive multimodal pretraining.

Our simple online batch selection method, ACID, outperforms strong KD baselines across various model-, data- and compute-configurations. Further, we find such an active data curation strategy to in fact be complementary to standard KD, and can be effectively combined to train highly performant inference-efficient models. Our simple and scalable pretraining framework, ACED, achieves state-of-the-art results across 27 zero-shot classification and retrieval tasks with upto 11% less inference FLOPs.

We further demonstrate that our ACED models yield strong vision-encoders for training generative multimodal models in the LiT-Decoder setting, outperforming larger vision encoders for image-captioning and visual question-answering tasks.

Dataset Safari: Adventures from 2024’s Top Computer Vision Conferences

Harpreet Sahota

Voxel51

Datasets are the lifeblood of machine learning, driving innovation and enabling breakthrough applications in computer vision and AI. This talk presents a curated exploration of the most compelling visual datasets unveiled at CVPR, ECCV, and NeurIPS 2024, with a unique twist – we’ll explore them live using FiftyOne, the open-source tool for dataset curation and analysis.

Using FiftyOne’s powerful visualization and analysis capabilities, we’ll take a deep dive into these collections, examining their unique characteristics through interactive sessions. We’ll demonstrate how to:

  • Analyze dataset distributions and potential biases
  • Identify edge cases and interesting samples
  • Compare similar samples across datasets
  • Explore multi-modal annotations and complex label structures

Whether you’re a researcher, practitioner, or dataset enthusiast, this session will provide hands-on insights into both the datasets shaping our field and practical tools for dataset exploration. Join us for a live demonstration of how modern dataset analysis tools can unlock deeper understanding of the data driving AI forward.

Find a Meetup Near You

Join the AI and ML enthusiasts who have already become members

The goal of the AI, Machine Learning, and Computer Vision Meetup network is to bring together a community of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies. If that’s you, we invite you to join the Meetup closest to your timezone.