Welcome to the latest installment of our ongoing blog series where we explore computer vision related datasets, this one is from a new Kaggle competition! In this post we’ll use the open source FiftyOne computer vision toolset to explore Google Research’s Image Matching Challenge 2023 - Reconstruct 3D Scenes from 2D Images competition.

Wait, what’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

About the competition

In this challenge, competitors will need to reconstruct 3D scenes from many different views and help uncover the best methods for building accurate 3D models that can be applied to photography, cultural heritage preservation, and more! It doesn’t hurt that there is $50,000 in prizes up for grabs as well!

Let’s say we want to create a 3D scene of a famous landmark from images that tourists and professional photographers have taken from various angles and shared on the internet? What if we were able to combine all the photos to create a more complete, three-dimensional view of the landmark?

To accomplish this, we’ll need to use Structure from Motion (SfM), which is the process of reconstructing the 3D model of an environment from a collection of images. SfM normally deals with two types of data:

Uniform, high-quality data: This type of data is typically captured by trained operators or with additional sensor data, for example as the cars used by Google Maps. This results in homogeneous, high-quality data.
Uneven, varying quality data: This will include assorted images taken using cameras of varying quality, with a wide variety of viewpoints, along with lighting, weather, and other variables.

About the dataset

According to the competition organizers, the competition uses a hidden test. When your submitted notebook is scored, the actual test data (including a sample submission) will be made available to your notebook. You should expect to find roughly 1,100 images in the hidden test set. The number of images in a scene may vary from <10 to ~250.

Dataset quick facts

Download Dataset: Download from Kaggle
License: Varies. Check the LICENSE.txt files in the image directories for details
Dataset Size: 12.64 GB
FiftyOne Dataset Name: image-matching-challenge-2023

Up next, let’s download the dataset, install FiftyOne and import the dataset into the App so we can visualize it!

Step 1: Download the dataset

In order to load the “Image Matching Challenge 2023” dataset into FiftyOne, you’ll need to first download the source data from Kaggle. (Don’t forget to accept the terms of the competition first!). After unzipping the download, your directory should look like this:

Step 2: Install FiftyOne

If you don’t already have FiftyOne installed on your laptop, it takes less than a minute! For example on macOS:

Verify your version of Python
Create and activate a virtual environment
Install IPython (optional)
Upgrade your setuptools
Install FiftyOne

Learn more about how to get up and running with FiftyOne in the Docs.

Step 3: Import the dataset

Now that you have the dataset downloaded and FiftyOne installed, let’s import the dataset and launch the FiftyOne App.

Step 4: Optional, but awesome! Add Embeddings

Visualizing datasets in a low-dimensional embedding space is a powerful workflow that can reveal patterns and clusters in your data that can answer important questions about the critical failure modes of your model and how to augment your dataset to address these failures.

Using the FiftyOne Brain’s embeddings visualization capability can help uncover hidden patterns in the data, enabling us to take the required actions to improve the quality of the dataset and associated models.

If you’d like to take advantage of embeddings with this dataset, just bolt the following snippet onto the previous Step 3 snippet.

Tips and Tricks: If you opt to include the embeddings snippet above, depending on your hardware setup, it will take a few minutes to calculate them. So, this would be a great time to check out all the cool upcoming computer vision events sponsored by Voxel51!

Also, if you are running Python 3.11, you might see an error related to umap-learn as one of its dependencies, numba, is not yet supported. If you encounter this, simply run a supported environment like 3.9. For example:

Step 5: Launch the App to visualize the dataset

The code snippet above will launch the FiftyOne App in your default browser. You should see the following initial view of the image-matching-challenge-2023 dataset by default in the App:

Ok, let’s do a quick exploration of the dataset!

Sample details

Click on any of the samples to get additional detail like tags, metadata, labels, and primitives.

Filtering by location

The dataset includes 23 distinct locations including Notre Dame, Taj Mahal, Buckingham Palace, the Lincoln Memorial and more. For example, let’s filter the samples by those labeled pantheon_exterior.

Filtering by type

The dataset includes four ‘types’, including phototourism, haiper, heritage, and urban. For example, let’s filter the samples by those of the heritage type.

Filtering by id or filepath

FiftyOne also makes it very easy to filter the samples to find the ones that meet your specific criteria. For example we can filter by a specific id:

Embeddings

If you opted for calculating embeddings in Step 4 you can view them by clicking on the “+” to the right of the “Samples” tab and selecting “Embeddings.” In this example we’ll select img_vix as our Brain Key, the location label, and lasso the green cluster which captures the images labeled cyprus.

Cool! So, what can you do with embeddings? You can:

Learn more about how to work with embeddings in the FiftyOne Docs.

Start working with the dataset

Now that you have a general idea of what the dataset contains, you can start using FiftyOne to perform a variety tasks including:

You can also start making use of the FiftyOne Brain which provides powerful machine learning techniques you can apply to your workflows like visualizing embeddings, finding similarity, uniqueness, and mistakenness.

Sponsors and acknowledgements

This competition is being sponsored by Haiper (Canada) Ltd., Google, and Kaggle.The individuals responsible for putting together the competition include:

Eduard Trulls (Google), Dmytro Mishkin (Czech Technical University in Prague/HOVER Inc), Jiri Matas (Czech Technical University in Prague), Fabio Bellavia (University of Palermo), Luca Morelli (University of Trento/Bruno Kessler Foundation), Fabio Remondino (Bruno Kessler Foundation), Weiwei Sun (University of British Columbia), and Kwang Moo Yi (University of British Columbia/Haiper).

Talk to a computer vision expert