ECCV 2024 Redux: Day 4 – Nov 22, 2024

ECCV 2024 Redux: Day 4 – Nov 22, 2024

This event is now over.

Register for the next one.

Go to upcoming events
Skip to content

ECCV 2024 Redux: Day 4

Nov 22, 2024 at 9:00 AM Pacific

Register for the Zoom

By submitting you (1) agree to Voxel51’s Terms of Service and Privacy Statement and (2) agree to receive occasional emails.

Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning

Yuchen Yang
John Hopkins University

Video Anomaly Detection (VAD) is critical for applications such as surveillance and autonomous driving. However, existing methods lack transparent reasoning, limiting public trust in real-world deployments. We introduce a rule-based reasoning framework that leverages Large Language Models (LLMs) to induce detection rules from few-shot normal samples and apply them to identify anomalies, incorporating strategies such as rule aggregation and perception smoothing to enhance robustness. The abstract nature of language enables rapid adaptation to diverse VAD scenarios, ensuring flexibility and broad applicability.

ECCV 2024 Paper: Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

About the Speaker

Yuchen Yang is a a Ph.D. Candidate in the Department of Computer Science at Johns Hopkins University. Her research aims to deliver functional, trustworthy solutions for machine learning and AI systems.

Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

Xiaoyu Zhu
Carnegie Mellon

In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes.

ECCV 2024 Paper: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

About the Speaker

Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models.