Streamline Your Visual Data Discovery with FiftyOne Data Lens
April 2, 2025 – Written by Kirti Joshi
TL;DR: This post outlines a novel workflow to simplify and streamline the often tedious process of exploring and retrieving data from external sources. Find the right visual samples to answer your data questions.

As AI developers, we constantly grapple with understanding our models’ performance –– and more often than not, face the toughest question: “Why isn’t my model performing better?” The answer is almost always in the data. Whether it’s imbalances, gaps in diversity, or poor quality samples, the process of identifying the why and what next always leads us to ask ourselves:
Do I have enough data? Is it diverse enough to handle real-world edge cases? Should I collect new data, leverage existing data, or generate data synthetically? These questions rarely have foolproof answers, yet we must navigate them to meet project goals and deadlines.
In our typical data workflows, the manual processes of finding and including the right data stall progress. Additionally, dependencies on other teams for data access create further delays.
To solve these pressing issues that most teams face, we created Data Lens, the newest workflow within FiftyOne Enterprise, designed to simplify and accelerate your data discovery. At its core, Data Lens is a powerful data exploration and retrieval tool that helps you quickly explore and retrieve exactly the data you need from billions of available samples. You can easily query, preview, and import relevant samples directly into FiftyOne without manual bottlenecks or slow searches.
With Data Lens, you can:
- Integrate with data lakes, databases, and cloud storage providers
- Rapidly search massive unstructured data collections
- Preview media and labels instantly
- Import data directly into your dataset for further analysis
How Does Data Lens Work
Data Lens provides a simple workflow to find and select samples that enhance your datasets and improve model performance.
Imagine you’re building an AI model to detect flowers, and you need specific images from your BigQuery data warehouse to address gaps where your model struggles to identify certain flower varieties. Here’s how Data Lens makes this process easy:
Define Your Search Experience
Data Lens provides a flexible framework to customize your search parameters. Whether you need specific annotations, image categories, or broader dataset subsets, you can set up the queries for your search to return the most relevant data.
Connect Your Data Sources
Once your search experience is defined, connecting Data Lens to an external data source (e.g., database, data lake) is straightforward. You simply:
- Add a new data source and provide its configuration details
- Establish a secure connection to your repository
Data Lens can be connected to any data source, such as:
- Databricks for extensive querying across data lakes
- PostgreSQL for structured database exploration
- Cloud Storage Providers for accessing storage like Amazon S3, and Google Cloud Storage
Find, Preview, and Import Data
With your data source connected, you can search billions of samples in real time using filters like metadata, annotations, or image properties. Then, instantly preview them in FiftyOne and pull only the data you need into your dataset.
Once you build your dataset with the desired data samples, continue with your model training and evaluation work to see improvements in model performance!

Handling Multiple Complex Query Scenarios
Data Lens also excels at handling complex scenarios by providing the framework to enable customized search queries. For instance, ADAS/AV teams working on safety-critical vehicle detection AI systems frequently need specific data samples to improve model performance under challenging conditions and scenarios, e.g., lighting conditions, weather scenes, and objects. With Data Lens, you can quickly set up complex queries such as:
- “Find rainy residential scenes with pedestrians at night.”
- “Show me vehicles driving under bridges.”
The example below shows a data query in Databricks setup to filter samples by metadata qualified by weather, scene, time of day, and detection labels.

You can also set up a semantic vector search using similarity queries. The example below shows a similarity search for “cars under a bridge” using Databricks vector search. Under the hood, embeddings are used to compare the semantic meaning of each image, allowing you to identify relevant samples without requiring labels.

Preview results directly in FiftyOne, choose the best samples, and import these directly into your dataset
Bringing Value to Your ML Workflows
Data Lens brings multiple benefits to the data curation and model-building workflows.
- Efficiency: Find the exact data samples in seconds. Save hours or days of manual effort.
- Accuracy: Preview samples to ensure quality and relevance before importing.
- Improved Productivity: Eliminate manual tasks and dependencies. Speed up model iteration significantly.
With Data Lens, you can build custom datasets from different data sources, fine-tune your models with the most relevant and informative samples, and speed up model iteration cycles.
Better data—not just more data—is the key to high-performing models. With Data Lens, you can turn data discovery from a tedious bottleneck into a streamlined and integrated part of your ML workflow.
Getting Started with Data Lens
Data Lens is available now with FiftyOne Enterprise. Check out our Data Lens documentation for easy steps to get started.
Already an enterprise user? Upgrade to FiftyOne Enterprise 2.7.1 and give it a try. I’m sure you’ll find this workflow very helpful. We value your feedback. Share your thoughts with the FiftyOne community.
Happy modeling! 🚀