Streamline Your Visual Data Discovery with FiftyOne Data Lens

April 2, 2025 – Written by Kirti Joshi

Computer Vision News

 

TL;DR: This post outlines a novel workflow to simplify and streamline the often tedious process of exploring and retrieving data from external sources. Find the right visual samples to answer your data questions.

 

As AI developers, we constantly grapple with understanding our models’ performance –– and more often than not, face the toughest question: “Why isn’t my model performing better?” The answer is almost always in the data. Whether it’s imbalances, gaps in diversity, or poor quality samples, the process of identifying the why and what next always leads us to ask ourselves: 

Do I have enough data? Is it diverse enough to handle real-world edge cases? Should I collect new data, leverage existing data, or generate data synthetically? These questions rarely have foolproof answers, yet we must navigate them to meet project goals and deadlines.

In our typical data workflows, the manual processes of finding and including the right data stall progress. Additionally, dependencies on other teams for data access create further delays. 

To solve these pressing issues that most teams face, we created Data Lens, the newest workflow within FiftyOne Enterprise, designed to simplify and accelerate your data discovery. At its core, Data Lens is a powerful data exploration and retrieval tool that helps you quickly explore and retrieve exactly the data you need from billions of available samples. You can easily query, preview, and import relevant samples directly into FiftyOne without manual bottlenecks or slow searches.

With Data Lens, you can:

  • Integrate with data lakes, databases, and cloud storage providers
  • Rapidly search massive unstructured data collections
  • Preview media and labels instantly
  • Import data directly into your dataset for further analysis

 

How Does Data Lens Work

Data Lens provides a simple workflow to find and select samples that enhance your datasets and improve model performance.

Imagine you’re building an AI model to detect flowers, and you need specific images from your BigQuery data warehouse to address gaps where your model struggles to identify certain flower varieties. Here’s how Data Lens makes this process easy: 

Define Your Search Experience

Data Lens provides a flexible framework to customize your search parameters. Whether you need specific annotations, image categories, or broader dataset subsets, you can set up the queries for your search to return the most relevant data.

Connect Your Data Sources

Once your search experience is defined, connecting Data Lens to an external data source (e.g., database, data lake) is straightforward. You simply:

  • Add a new data source and provide its configuration details
  • Establish a secure connection to your repository

Data Lens can be connected to any data source, such as:

  • Databricks for extensive querying across data lakes
  • PostgreSQL for structured database exploration
  • Cloud Storage Providers for accessing storage like Amazon S3, and Google Cloud Storage

Find, Preview, and Import Data

With your data source connected, you can search billions of samples in real time using filters like metadata, annotations, or image properties. Then, instantly preview them in FiftyOne and pull only the data you need into your dataset.

Once you build your dataset with the desired data samples, continue with your model training and evaluation work to see improvements in model performance!

Example of using Data Lens to find samples of “flower types” using Google BigQuery data warehouse

 

Handling Multiple Complex Query Scenarios

Data Lens also excels at handling complex scenarios by providing the framework to enable customized search queries. For instance, ADAS/AV teams working on safety-critical vehicle detection AI systems frequently need specific data samples to improve model performance under challenging conditions and scenarios, e.g., lighting conditions, weather scenes, and objects. With Data Lens, you can quickly set up complex queries such as:

  • “Find rainy residential scenes with pedestrians at night.”
  • “Show me vehicles driving under bridges.”

The example below shows a data query in Databricks setup to filter samples by metadata qualified by weather, scene, time of day, and detection labels.

Data Lens to find “rainy samples in a residential setting with pedestrians” using Databricks data lake 

You can also set up a semantic vector search using similarity queries. The example below shows a similarity search for “cars under a bridge” using Databricks vector search. Under the hood, embeddings are used to compare the semantic meaning of each image, allowing you to identify relevant samples without requiring labels.

Similarity search using Databricks vector search
Preview results directly in FiftyOne, choose the best samples, and import these directly into your dataset

 

Bringing Value to Your ML Workflows

Data Lens brings multiple benefits to the data curation and model-building workflows. 

  • Efficiency: Find the exact data samples in seconds. Save hours or days of manual effort.
  • Accuracy: Preview samples to ensure quality and relevance before importing.
  • Improved Productivity: Eliminate manual tasks and dependencies. Speed up model iteration significantly.

With Data Lens, you can build custom datasets from different data sources, fine-tune your models with the most relevant and informative samples, and speed up model iteration cycles.

Better data—not just more data—is the key to high-performing models. With Data Lens, you can turn data discovery from a tedious bottleneck into a streamlined and integrated part of your ML workflow.

 

Getting Started with Data Lens

Data Lens is available now with FiftyOne Enterprise. Check out our Data Lens documentation for easy steps to get started. 

Already an enterprise user? Upgrade to FiftyOne Enterprise 2.7.1 and give it a try. I’m sure you’ll find this workflow very helpful. We value your feedback. Share your thoughts with the FiftyOne community

Happy modeling! 🚀