We just wrapped up Day 4 of ECCV 2024 Redux. If you missed it or want to revisit it, here’s a recap!
In this blog post you’ll find the playback recordings, highlights from the presentations and Q&A, as well as the upcoming Meetup schedule so that you can join us at a future event.
Zero-shot Video Anomaly Detection: Leveraging Large Language Models for Rule-Based Reasoning
Video Anomaly Detection (VAD) is critical for applications such as surveillance and autonomous driving. However, existing methods lack transparent reasoning, limiting public trust in real-world deployments. We introduce a rule-based reasoning framework that leverages Large Language Models (LLMs) to induce detection rules from few-shot normal samples and apply them to identify anomalies, incorporating strategies such as rule aggregation and perception smoothing to enhance robustness. The abstract nature of language enables rapid adaptation to diverse VAD scenarios, ensuring flexibility and broad applicability.
ECCV 2024 Paper: Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
Speaker: Yuchen Yang is a a Ph.D. Candidate in the Department of Computer Science at Johns Hopkins University. Her research aims to deliver functional, trustworthy solutions for machine learning and AI systems.
Q&A
- When lacking data for training for detecting visual anomalies did you attempt to utilize data synthesis for those edge cases specifically?
- Did you fine tune model for reasoning?
- Did you make use of multi-agents for the rule deduction reasoning module?
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes.
ECCV 2024 Paper: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Speaker: Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models.
Q&A
- Why do you work on point clouds instead of meshes?
- Have you considered using GANs instead of diffusion models in your model architecture?
- What are some of the real world applications of it and how long it takes for inference?
- Can the output of your system be used to construct a “better 3D” image that includes labels (segmented 3D objects) so that the 3D objects can be removed from the 3D image? And, the diffusion can be deployed to fill in the space where the object was removed?
- Can your system segment and remove “the whole floor” and then display added features existing on the “other side” of the floor, like an x-ray?
Join the AI, Machine Learning and Computer Vision Meetup!
The goal of the Meetups is to bring together communities of data scientists, machine learning engineers, and open source enthusiasts who want to share and expand their knowledge of AI and complementary technologies.
Join one of the 12 Meetup locations closest to your timezone.