Visual and multimodal AI applications are quickly evolving from research experiments into core drivers of real-world innovation. Powering everything from autonomous vehicles, robots, to industrial automation and analytics, the success of these systems in production relies on high-quality visual data and scalable workflows.

At CVPR 2025, the premier conference in computer vision, we’re excited to showcase practical tools, integrations, and workflows that meet the rising demands for building faster, more accurate AI models and datasets.

Join us in Nashville, TN, from June 17-21 to get hands-on with the tools engineered to simplify your data processes and accelerate model development.

Here's a glance at what we’ll be showcasing at the conference.

Demos

Simulation-to-reality: Smart simulation, curation, visualization with NVIDIA Omniverse and Voxel51
Verified Auto-Labeling: New zero-shot annotation workflows in FiftyOne to label visual data 5,000X faster with near-human accuracy
Video Search: Understand and efficiently search video content with Databricks, TwelveLabs, and Voxel51
Explore and curate datasets, improve and debug AI models using FiftyOne structured and actionable insights.

Workshops & Tutorials

VAND 3.0: Visual Anomaly and Novelty Detection workshop
Agriculture-Vision: Visual AI in the Field - Tutorial + Workshop

Lightning Talks

Practical tips and discussion focused on solving data curation and model performance challenges in use cases across medical, agtech, and manufacturing.

Come for the talks and demos, and stay for the drinks, swag, and insightful conversations! 🎉

Simulation-to-Reality with NVIDIA Omniverse + FiftyOne

Synthetic data is becoming a critical component in building robust visual AI systems, especially in domains like AVs, industrial robotics, and physical automation, where collecting diverse real-world data is often costly, impractical, or even unsafe. It’s slowly becoming a powerful alternative in scenarios where precise control over scene composition, object placement, and conditions is necessary.

By leveraging synthetic datasets using NVIDIA Omniverse, OpenUSD, and FiftyOne data curation, ML engineers can achieve comprehensive coverage and significantly augment training sets, without the burden of extensive manual data collection or labeling.

Come see a live demo of FiftyOne and NVIDIA Omniverse to learn how you can bridge the gap between synthetic and real-world data curation:

Visualize and inspect synthetic scenes at scale
Organize and curate multimodal datasets for training and testing
Analyze 3D reconstructions of real scenes using NVIDIA’s Neural Reconstruction Engine
Augment your data with NVIDIA Cosmos world foundation models to imagine your samples like never before

📍See it in action at the CVPR booth #1417 with Voxel51 and NVIDIA, and meet with the experts from both companies.

Zero-shot auto-labeling with near-human labeling performance at a fraction of the cost

Manual labeling has traditionally been one of the biggest bottlenecks in getting computer vision models into production due to the time, cost, and accuracy implications. As foundation models advance, zero-shot labeling is emerging as a viable, scalable path to building high-quality datasets—without the manual burden.

We’re excited to introduce a new approach to AI-assisted annotation that combines Voxel51’s expertise in data curation with automated labeling and QA workflows. Our research on Verified Auto-Labeling shows that this approach achieves 95% of human-level performance while being 5,000X faster and cutting costs up to $100,000X.

To put it in perspective, a dataset such as COCO, consisting of ~850K objects, takes human labeling a total of 1,653 hours and $30,598 in costs, compared to ~27 minutes and just $0.42 with Verified Auto-Labeling. And with almost the same model performance! That’s mind-blowing and changes the paradigm of computer vision data workflows!

Curious to see how it works?

📍Stop by the booth #1417 to get hands-on with the product.

Solving the Video Understanding Challenge with Voxel51, Twelve Labs, and Databricks

Video analysis techniques have historically involved a labor-intensive pipeline that includes manual annotation of frames or the use of scripts scrubbing through hours of footage using timestamps or scene changes. These manual approaches are expensive and error-prone, making it extremely challenging to detect precise scenarios, e.g., “a red car stopping at a traffic light” across an unlabeled video collection.

State-of-the-art embedding techniques encode video content into searchable representations so you can perform similarity searches without the need for explicit video annotations.

Learn about an integrated approach to understanding your video content using rich embeddings, fast similarity searches, and a visual user interface that streamlines the exploration of relevant video segments.

In this demo with TwelveLabs, Databricks, and Voxel51, you can simplify video understanding and curation by learning how to:

Generate rich, multimodal embeddings (video+audio+text) using Twelve Labs’ foundation models
Index and search those embeddings for fast, cloud-scale retrieval with Databricks Vector Search
Run searches, visualize results, and refine datasets with FiftyOne

📍 See this workflow in action at the Voxel51 booth

Voxel51 Workshops and Talks

Join Voxel51 researchers to explore high-impact computer vision challenges ranging from anomaly detection to real-world deployment in agriculture.

VAND 3.0: Visual Anomaly and Novelty Detection

Researchers from academia and industries such as Bosch will present and discuss recent developments, opportunities and open challenges in the area of anomaly detection. Encourage the development and benchmarking of new algorithms by participating in the anomaly detection challenge! Organized by top researchers from AWS AI Labs, Voxel51, Intel, Durham University, and many others.

When: June 18, 2025

Where: In person and on Zoom

Workshop Info

Agriculture-Vision: Visual AI in the Field

Learn how researchers and practitioners are applying computer vision to real-world agricultural challenges—from crop segmentation to weed detection and disease prediction.

Tutorial: June 12, 2025 @8:30 am | Tutorial Info

Workshop: June 12, 2025 @1pm | Workshop Info

Our partners at the NVIDIA research team have over 60 papers and 15+ workshops.

Check out the workshop on “Exploring the Next Generation of Data” where the NVIDIA team addresses the challenges of curating NeXD25, high-quality, scalable, and unbiased data for foundation models in safety-critical applications.

Interesting CVPR Research Papers

Here are some interesting CVPR papers we’re watching that we think you should too!

OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit
B. Wang, R. An, J.-K. So, S. Kurdiumov, E. A. Chan, G. Adamo, Y. Peng, Y. Li, and B. An
“Few-Shot Adaptation of Grounding DINO for Agricultural Domain,”
R. Singh, R. B. Puhl, K. Dhakal, and S. Sornapudi
“FLAIR: VLM with Fine-grained Language-informed Image Representations,”
R. Xiao, S. Kim, M.-I. Georgescu, Z. Akata, and S. Alaniz,
“RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings,”
A. Dhakal, S. Sastry, S. Khanal, A. Ahmad, E. Xing, and N. Jacobs
“Interactive Medical Image Analysis with Concept-based Similarity Reasoning”
Ta Duc Huy, Sen Kim Tran, Phan Nguyen, Nguyen Hoang Tran, Tran Bao Sam, Anton van den Hengel, Zhibin Liao, Johan W. Verjans, Minh-Son To, and Vu Minh Hieu Phan

Best of CVPR 2025 Series: 12 must-read papers

Read more in The Best of CVPR 2025 Series three-part series (Part 1, Part 2, Part 3) that highlights papers that rethink how vision models interact with complexity, ambiguity, including papers that address safety, trust, usability across use cases and industries.

Visual agents: Key research wave at CVPR

CVPR 2025 marks a turning point for visual agents: this year's papers are signaling a shift from academic curiosity to tangible systems capable of interacting and controlling visual environments. Read the full blog for key research papers driving the development.

Compositional Image Retrieval: The future of visual search

Compositional Image Retrieval (CIR) brings image search closer to how humans naturally describe visual concepts, opening new doors for e-commerce and creative applications. Check out the blog for an in-depth look.

Community Happy Hour

Presenting a paper at CVPR and using FiftyOne in the course of your research? Meet the FiftyOne ML experts and have a drink. Spots are limited. Register here!

🎸 We can’t wait to meet you!

Talk to a computer vision expert