In this tutorial, you will learn how to prepare and visualize a dataset using FiftyOne after obtaining data from Kaggle. We will use the American Sign Language MNIST (ASL-MNIST) dataset as our example.

FiftyOne helps you curate high-quality datasets by combining code-based analysis with visual exploration. With it we can filter samples, remove duplicates, fix annotations, and add metadata programmatically. Then we can launch an interactive app to explore our results visually and slice or aggregate the data as we’d like.

What You will Need

A virtual environment with Python 3.9-3.11 installed (or willingness to use a Google Colab notebook)
Kaggle account with API access
HuggingFace account
Basic familiarity with Pandas and PIL

Time Required

90 minutes start-to-finish

Installing FiftyOne

Local FiftyOne installation instructions are available here.

You can also follow the tutorial using this Google Colab notebook.

Quick Start: Load ASL‑MNIST into FiftyOne

If you just want to get the dataset and explore it in FiftyOne (and already have a HuggingFace account set up), you can simply execute:

This is the artifact that we will produce at step 8 of this tutorial: a FiftyOne dataset published on HuggingFace Hub with all the images from the test and training set.

To visualize it, we can start the FiftyOne app

Try filtering the samples by label, here we get a view of images that share the label “V”

However, if what you want is to learn how to get your data from Kaggle to FiftyOne and HuggingFace Hub, please keep reading. There are only eight steps ahead :)

Even though this dataset is already available on HuggingFace, this tutorial is valuable if you want to upload your own datasets or understand the underlying steps.

Step 1: Set Up Your Python Environment

Begin by installing the necessary libraries:

We pin versions to ensure the code in this tutorial runs exactly as shown. For your own projects, you may be able to use more recent versions. These versions are known to be compatible with each other as of July 2025.

Step 2: Configure the Kaggle API Credentials

To access datasets from Kaggle:

Log into Kaggle and navigate to your account settings.
Generate a new API token (kaggle.json).
Move the downloaded file to ~/.kaggle/ and set permissions:

Step 3: Download & Process the ASL‑MNIST Dataset

Download and extract the dataset:

Quirks of ASL-MNIST

The images are not stored as JPEG or PNG files but as rows inside two CSV files. This format is unusual and I found it unintuitive, but is done on variants of the MNIST dataset. Each row of the csv file encoding the training set has 784 pixel intensity values in uint8 and a column indicating the ground truth label. The ground truth label is the letter that the image of the gesture is representing. Gestures in the dataset do not represent numbers. These images are really tiny. They only have 28x28 pixels and that’s where the “MNIST” part of the name comes from. The images have the same dimensions as the original MNIST dataset of grayscale handwritten digits. FiftyOne supports importing datasets using more standard formats, such as COCO or having directory structures where the name of the folder maps to the label. In this case, we need to do some manual processing to get around the quirks of ASL-MNIST.

Test images will have the label “unknown”.

Knowing all this, we process the CSV files into images with pandas and PIL:

Step 4: Build a FiftyOne Dataset from ASL‑MNIST Images

We import the processed jpg images into a FiftyOne Dataset and map numerical labels to their corresponding letters.

Note: The letters 'J' (9) and 'Z' (25) are not included in this dataset as they involve motion, and the images that we have are static (single frames).

Each image is a Sample within the FiftyOne dataset. We can associate metadata, labels, and tags to each of them. Samples are initialized with a filepath to the corresponding data on disk. We had to save our images to the local hard drive to create the FiftyOne Dataset (a collection of samples).

Note that asl_dataset.persistent = True only affects in-memory persistence across Python sessions, not across system reboots. This is why we also export the dataset to disk in Step 6.

Step 5: Explore & Visualize the Dataset in FiftyOne

We want to launch the FiftyOne App to explore the dataset:

We can now visualize, query, and analyze our dataset interactively using FiftyOne. The FiftyOne app is a powerful graphical user interface that allows us to browse, tag, aggregate, and interact directly with the dataset.

Producing Histograms

After getting the dataset into FiftyOne, try producing an aggregation of its ground_truth.label field by going to the Histograms panel and selecting the field.

You can then click on the “Split Horizontally” button to see the Samples next to the Histogram panel.

Here is a short video demoing this.

I encourage you to try creating histograms on other fields, such as metadata.size_bytes.

Step 6: Save the FiftyOne Dataset Locally

To save our FiftyOne dataset for future use or for sharing, we can export it locally. The dataset.export() method creates a portable and self-contained archive and allows us to save the dataset in various formats. In this case, we will save it in the FiftyOneDataset format, which preserves the full FiftyOne dataset structure. Note that this local export is also needed to make the dataset available for ourselves after we turn off our computer. The persistence that we defined in step 4 (with asl_dataset.persistent=True) is only valid across different Python sessions. With export_media=True, our export ensures portability by copying the images into a self-contained folder.

This will create a new directory named asl-mnist-fiftyone-dataset containing the JSON definition of your dataset and copies of the images. The export directory will contain both the label files AND copies of all the original jpg images. This creates a self-contained dataset that you can share or move without worrying about broken file paths.

This exported dataset can be easily reloaded into FiftyOne later using fo.Dataset.from_dir().

For more information on loading and using datasets, see the documentation.

Following these steps, we have moved from a raw Kaggle dataset stored in CSV files to a portable, visual, and queryable dataset ready for interpretable computer vision workflows.

Step 7: Publish the Dataset to Hugging Face Hub

We can share the dataset with others through HuggingFace Hub, which is a great resource both for open models and data. When we push our FiftyOne dataset to the HF Hub, a fiftyone.yml is generated and the dataset remains in FiftyOne format, with all its fields: annotations, tags, and metadata.

If you are new to HuggingFace, you can follow this guide to set up your authentication token. You will need to have it set up with write permissions to publish your dataset.

Be sure to check the data's licensing rights. The American Sign Language dataset has a CC0 License, meaning that is in the public domain. The original content is CC0, and we apply an MIT license only to the packaging and any additional code or annotations.

We can modify a public domain dataset and redistribute it under a compatible license (such as the MIT license). Public domain works have no copyright restrictions, so anyone can use, modify, and redistribute them without permission. When we modify public domain content, we “create” a new work that we own the copyright to. We can license our modifications under MIT, but the original public domain portions remain public domain and cannot be relicensed. Remember to follow best practices to avoid complications:

Document your changes: Clearly state which portions are your modifications versus the original public domain content.
Apply the license correctly: Ensure your MIT license applies only to your contributions.
Verify the source: Double-check that the original dataset is truly in the public domain.

After uploading the dataset, be sure to fill-in its dataset card with all the details on the data collection process. Dataset cards serve as documentation for how your dataset was collected, cleaned, and used. You can use the one that we have created for this example as a template.

Step 8: Verify & Reload the Dataset from Hugging Face Hub

Finally, we can check that the dataset is available on our HuggingFace user account.

Pro Tip: Move the Dataset to a Hugging Face Organization

A detail that might be important to you is that push_to_hub will only allow you to push to personal accounts, not organizations. To transfer the dataset to an organization, you will first need to push it to a personal account and then transfer ownership through the dataset’s page.

Full Google Colab Notebook and Source Code

You can find the full Google Colab notebook to run this example through this link, the fully processed dataset is on our HuggingFace organization page.

Key Takeaways

Congratulations! You have successfully taken a dataset from Kaggle, processed it into a usable format, curated it within FiftyOne, and shared it with the community on Hugging Face Hub. This workflow is a powerful pattern for any computer vision project, enabling better data understanding, collaboration, and reproducibility.

Try applying these steps to your own dataset and share it on HuggingFace Hub!

Next steps

In the following blog posts, we will go into training neural networks using the integration of FiftyOne and PyTorch. Be sure to try those techniques on this data!

Meanwhile, explore the FiftyOne documentation and our Tutorials.

Talk to a computer vision expert

What You will Need

Time Required

Installing FiftyOne

Quick Start: Load ASL‑MNIST into FiftyOne

Step 1: Set Up Your Python Environment

Step 2: Configure the Kaggle API Credentials

Step 3: Download & Process the ASL‑MNIST Dataset

Quirks of ASL-MNIST

Step 4: Build a FiftyOne Dataset from ASL‑MNIST Images

Step 5: Explore & Visualize the Dataset in FiftyOne

Producing Histograms

Step 6: Save the FiftyOne Dataset Locally

Step 7: Publish the Dataset to Hugging Face Hub

Step 8: Verify & Reload the Dataset from Hugging Face Hub

Pro Tip: Move the Dataset to a Hugging Face Organization

Full Google Colab Notebook and Source Code

Key Takeaways

Next steps

Talk to a computer vision expert

Related posts

Related posts

Talk to a computer vision expert

What You will Need

Time Required

Installing FiftyOne

Quick Start: Load ASL‑MNIST into FiftyOne

Step 1: Set Up Your Python Environment

Step 2: Configure the Kaggle API Credentials

Step 3: Download & Process the ASL‑MNIST Dataset

Quirks of ASL-MNIST

Step 4: Build a FiftyOne Dataset from ASL‑MNIST Images

Step 5: Explore & Visualize the Dataset in FiftyOne

Producing Histograms

Step 6: Save the FiftyOne Dataset Locally

Step 7: Publish the Dataset to Hugging Face Hub

Step 8: Verify & Reload the Dataset from Hugging Face Hub

Pro Tip: Move the Dataset to a Hugging Face Organization

Full Google Colab Notebook and Source Code

Key Takeaways

Next steps

Talk to a computer vision expert

Related posts

Related posts

Step 7: Publish the Dataset to Hugging Face Hub

Step 8: Verify & Reload the Dataset from Hugging Face Hub

Pro Tip: Move the Dataset to a Hugging Face Organization