Meet the Most Comprehensive Real-World 3D Dataset Ever Created
Meta AI’s
uCO3D dataset, which I saw at CVPR 2025, is the most significant advancement in real-world 3D object data collection we’ve seen in years. With 170,000 meticulously captured objects across more than 1,000 categories,
uCO3D finally solves the persistent dilemma that has plagued 3D vision research: choosing between scale and quality. Previous datasets like CO3Dv2 offered decent quality but limited diversity (just 50 categories), while MVImgNet provided more objects but sacrificed the crucial 360° coverage needed for complete 3D understanding. Meta’s approach combines the best of both worlds while eliminating their respective limitations.
This dataset will undoubtedly become the new gold standard for training 3D vision models.
Diversity That Reflects Reality
The 1,000+ object categories in uCO3D deliver an unprecedented breadth of real-world objects that previous datasets simply couldn’t match.
By adopting the
LVIS taxonomy, which deliberately includes long-tail categories, uCO3D captures the true diversity of objects in our world rather than just focusing on common items. This matters enormously because real-world applications need to recognize and understand the full spectrum of objects we encounter, not just the most frequent ones. The category distribution is thoughtfully organized into 50 super-categories, each containing approximately 20 subcategories, making it both comprehensive and structured.
Researchers no longer need to accept limited category coverage as an inevitable constraint.
Technical Innovations That Set uCO3D Apart
Every aspect of uCO3D’s creation process incorporates state-of-the-art techniques that previous datasets simply couldn’t match.
Instead of using the industry-standard COLMAP for structure-from-motion, Meta AI employed
VGGSfM to achieve significantly more accurate camera parameters and point clouds. The segmentation pipeline combines
text-conditioned Segment-Anything (langSAM) with XMem to achieve temporal consistency, thereby resolving the flickering mask issues that plagued previous datasets. Perhaps most impressively, each object includes a complete 3D Gaussian Splat reconstruction, enabling photorealistic novel-view synthesis and canonical-view rendering that was previously impossible with real-world data.
These technical choices aren’t just incremental improvements — they’re revolutionary advances.
A New Standard in 3D Data
uCO3D fundamentally redefines what’s possible in 3D computer vision.
The combination of unprecedented scale, diversity, and quality addresses all the major limitations that have held back progress in this field. The additional innovations like 3D Gaussian Splat reconstructions and rigorous quality control set a new standard that future datasets will struggle to match. The performance improvements demonstrated across multiple benchmarks provide empirical validation that this dataset delivers on its promises.
This is quite simply the dataset that the 3D vision community has been waiting for.
Getting Started with the Manageable Preview Subset
Meta AI brilliantly provides a 52-video preview subset that lets you dive in without downloading the full 19TB dataset.
At just 9.6 GB, this subset is ideal for developing your processing pipeline while still representing the dataset’s diversity. You can grab it with a simple command from their GitHub repository:
The preview subset contains the same rich data structure as the full dataset, making it invaluable for initial experimentation.
Understanding the Data Structure
Each object in uCO3D follows a consistent organization that makes programmatic processing straightforward.
The dataset organizes objects in a category/subcategory/object_id hierarchy, with each object directory containing high-resolution video ( *_video.mp4
), segmentation masks ( mask_video.mp4
), point clouds ( segmented_point_cloud.ply
), and camera parameters. The full dataset includes an additional 3D Gaussian Splat reconstructions that enable photorealistic novel view synthesis. This consistent structure makes it easy to iterate through objects and extract the data you need.
This logical organization is crucial for efficiently working with such a large-scale dataset.
Converting PointClouds to FiftyOne 3D Scenes
Our first step in parsing uCO3D is converting the point cloud PLY files into interactive FiftyOne 3D scenes.
This function transforms raw PLY files into interactive 3D scenes that you can rotate, zoom, and explore in your browser.
Building a Unified Dataset with Proper Relationships
The heart of our parsing script establishes relationships between different views of the same object.
This powerful approach links RGB videos, segmentation masks, and 3D point clouds into a unified dataset with meaningful relationships.
Running the Complete Parsing Process
The main function ties everything together into a streamlined end-to-end
process.
This code creates a persistent FiftyOne dataset that you can load in future
sessions, making it a one-time processing effort.
Exploring Your Processed Dataset
Once parsed, FiftyOne transforms how you interact with uCO3D’s rich
multimodal data.
FiftyOne’s interactive visualization tools let you explore videos, masks, and 3D point clouds with unprecedented ease.
Why This Dataset Matters for Your Research
Models trained on uCO3D consistently outperform those trained on previous datasets across all benchmarks.
Few-view reconstruction models, such as
LightplaneLRM, show PSNR improvements of more than a full point when trained on uCO3D instead of
CO3Dv2 or
MVImgNet. Novel view synthesis using CAT3D-like approaches sees LPIPS error reductions of 5–20%. Most impressively, uCO3D’s 3D Gaussian Splat reconstructions enable the training of text-to-3D models, such as Instant3D, using real-world data for the first time, resulting in dramatically more realistic generations.
These performance gains translate directly to better applications across AR/VR, robotics, and e-commerce.
Load the dataset in FiftyOne format directly from the Hugging Face Hub
I’ve already parsed this dataset for you! To download the parsed dataset directly from Hugging Face:
Unleashing the Full Potential of uCO3D
The combination of uCO3D’s revolutionary data quality and FiftyOne’s exploration capabilities creates a development environment that accelerates 3D vision research.
By structuring the dataset with proper relationships between different views, our parsing approach makes it simple to build multi-view training pipelines. The integration with FiftyOne lets you visually inspect reconstruction quality, helping you understand edge cases and failure modes. For anyone working in 3D computer vision, this parsed dataset becomes an invaluable resource that dramatically shortens the path from idea to implementation.
This integration represents the new gold standard for working with large-scale 3D datasets.
A New Era for 3D Computer Vision
uCO3D fundamentally redefines what’s possible in 3D vision research, and our parsing pipeline makes it accessible.
With its unprecedented combination of scale, diversity, and quality, uCO3D enables a new generation of 3D models that can generalize across thousands of object categories. By converting this dataset into FiftyOne format, we’ve created a structured, interactive environment that makes it easy to explore the data, understand its characteristics, and build sophisticated training pipelines. For researchers and developers in 3D vision, this parsed dataset eliminates countless hours of preprocessing, allowing you to focus on the actual innovation.
The future of 3D vision is here — it’s just waiting for you to parse it.