Visual and multimodal AI applications are quickly evolving from research experiments into core drivers of real-world innovation. Powering everything from autonomous vehicles, robots, to industrial automation and analytics, the success of these systems in production relies on high-quality visual data and scalable workflows.
At
CVPR 2025, the premier conference in computer vision, we’re excited to showcase practical tools, integrations, and workflows that meet the rising demands for building faster, more accurate AI models and datasets.
Join us in Nashville, TN, from June 17-21 to get hands-on with the tools engineered to simplify your data processes and accelerate model development.
Here's a glance at what we’ll be showcasing at the conference.
Demos
Workshops & Tutorials
Lightning Talks
- Practical tips and discussion focused on solving data curation and model performance challenges in use cases across medical, agtech, and manufacturing.
Come for the talks and demos, and stay for the drinks, swag, and insightful conversations! 🎉
Simulation-to-Reality with NVIDIA Omniverse + FiftyOne
Synthetic data is becoming a critical component in building robust visual AI systems, especially in domains like AVs, industrial robotics, and physical automation, where collecting diverse real-world data is often costly, impractical, or even unsafe. It’s slowly becoming a powerful alternative in scenarios where precise control over scene composition, object placement, and conditions is necessary.
By leveraging synthetic datasets using
NVIDIA Omniverse, OpenUSD, and
FiftyOne data curation,
ML engineers can achieve comprehensive coverage and significantly augment training sets, without the burden of extensive manual data collection or labeling.
Come see a live demo of FiftyOne and NVIDIA Omniverse to learn how you can bridge the gap between synthetic and real-world data curation:
- Visualize and inspect synthetic scenes at scale
- Organize and curate multimodal datasets for training and testing
- Analyze 3D reconstructions of real scenes using NVIDIA’s Neural Reconstruction Engine
- Augment your data with NVIDIA Cosmos world foundation models to imagine your samples like never before
📍See it in action at the CVPR booth #1417 with Voxel51 and NVIDIA, and meet with the experts from both companies.
Zero-shot auto-labeling with near-human labeling performance at a fraction of the cost
Manual labeling has traditionally been one of the biggest bottlenecks in getting computer vision models into production due to the time, cost, and accuracy implications. As foundation models advance, zero-shot labeling is emerging as a viable, scalable path to building high-quality datasets—without the manual burden.
We’re excited to introduce a new approach to
AI-assisted annotation that combines Voxel51’s expertise in data curation with automated labeling and QA workflows. Our
research on Verified Auto-Labeling shows that this approach achieves 95% of human-level performance while being
5,000X faster and
cutting costs up to
$100,000X.To put it in perspective, a dataset such as
COCO, consisting of ~850K objects, takes
human labeling a total of
1,653 hours and
$30,598 in costs, compared to ~27 minutes and just $0.42 with Verified Auto-Labeling. And with almost the same model performance! That’s mind-blowing and changes the paradigm of computer vision data workflows!
Curious to see how it works?
📍Stop by the booth #1417 to get hands-on with the product.
Solving the Video Understanding Challenge with Voxel51, Twelve Labs, and Databricks
Video analysis techniques have historically involved a labor-intensive pipeline that includes manual annotation of frames or the use of scripts scrubbing through hours of footage using timestamps or scene changes. These manual approaches are expensive and error-prone, making it extremely challenging to detect precise scenarios, e.g., “a red car stopping at a traffic light” across an unlabeled video collection.
State-of-the-art embedding techniques encode video content into searchable representations so you can perform similarity searches without the need for explicit video annotations.
Learn about an integrated approach to understanding your video content using rich embeddings, fast similarity searches, and a visual user interface that streamlines the exploration of relevant video segments.
- Generate rich, multimodal embeddings (video+audio+text) using Twelve Labs’ foundation models
- Index and search those embeddings for fast, cloud-scale retrieval with Databricks Vector Search
- Run searches, visualize results, and refine datasets with FiftyOne
📍 See this workflow in action at the Voxel51 booth
Voxel51 Workshops and Talks
Join Voxel51 researchers to explore high-impact computer vision challenges ranging from anomaly detection to real-world deployment in agriculture.
VAND 3.0: Visual Anomaly and Novelty Detection
Researchers from academia and industries such as Bosch will present and discuss recent developments, opportunities and open challenges in the area of anomaly detection. Encourage the development and benchmarking of new algorithms by participating in the anomaly detection challenge! Organized by top researchers from AWS AI Labs, Voxel51, Intel, Durham University, and many others.
When: June 18, 2025
Where: In person and on Zoom
Agriculture-Vision: Visual AI in the Field
Learn how researchers and practitioners are applying computer vision to real-world agricultural challenges—from crop segmentation to weed detection and disease prediction.
Check out the workshop on “
Exploring the Next Generation of Data” where the NVIDIA team addresses the challenges of curating NeXD25, high-quality, scalable, and unbiased data for foundation models in safety-critical applications.
Interesting CVPR Research Papers
Here are some interesting CVPR papers we’re watching that we think you should too!
- OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit
B. Wang, R. An, J.-K. So, S. Kurdiumov, E. A. Chan, G. Adamo, Y. Peng, Y. Li, and B. An - “Few-Shot Adaptation of Grounding DINO for Agricultural Domain,”
R. Singh, R. B. Puhl, K. Dhakal, and S. Sornapudi - “FLAIR: VLM with Fine-grained Language-informed Image Representations,”
R. Xiao, S. Kim, M.-I. Georgescu, Z. Akata, and S. Alaniz, - “RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings,”
A. Dhakal, S. Sastry, S. Khanal, A. Ahmad, E. Xing, and N. Jacobs - “Interactive Medical Image Analysis with Concept-based Similarity Reasoning”
Ta Duc Huy, Sen Kim Tran, Phan Nguyen, Nguyen Hoang Tran, Tran Bao Sam, Anton van den Hengel, Zhibin Liao, Johan W. Verjans, Minh-Son To, and Vu Minh Hieu Phan
Best of CVPR 2025 Series: 12 must-read papers
Read more in
The Best of CVPR 2025 Series three-part series (
Part 1,
Part 2,
Part 3) that highlights papers that rethink how vision models interact with complexity, ambiguity, including papers that address safety, trust, usability across use cases and industries.
Visual agents: Key research wave at CVPR
CVPR 2025 marks a turning point for visual agents: this year's papers are signaling a shift from academic curiosity to tangible systems capable of interacting and controlling visual environments.
Read the full blog for key research papers driving the development.
Compositional Image Retrieval: The future of visual search
Compositional Image Retrieval (CIR) brings image search closer to how humans naturally describe visual concepts, opening new doors for e-commerce and creative applications.
Check out the blog for an in-depth look.
Community Happy Hour
Presenting a paper at CVPR and using FiftyOne in the course of your research? Meet the FiftyOne ML experts and have a drink. Spots are limited.
Register here!
🎸 We can’t wait to meet you!