Welcome to the latest installment of our ongoing blog series where we explore computer vision related datasets, this one is from a new Kaggle competition! In this post we’ll use the open source FiftyOne computer vision toolset to explore Google Research’s Image Matching Challenge 2023 – Reconstruct 3D Scenes from 2D Images competition.
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
About the competition
In this challenge, competitors will need to reconstruct 3D scenes from many different views and help uncover the best methods for building accurate 3D models that can be applied to photography, cultural heritage preservation, and more! It doesn’t hurt that there is $50,000 in prizes up for grabs as well!
Let’s say we want to create a 3D scene of a famous landmark from images that tourists and professional photographers have taken from various angles and shared on the internet? What if we were able to combine all the photos to create a more complete, three-dimensional view of the landmark?
To accomplish this, we’ll need to use Structure from Motion (SfM), which is the process of reconstructing the 3D model of an environment from a collection of images. SfM normally deals with two types of data:
- Uniform, high-quality data: This type of data is typically captured by trained operators or with additional sensor data, for example as the cars used by Google Maps. This results in homogeneous, high-quality data.
- Uneven, varying quality data: This will include assorted images taken using cameras of varying quality, with a wide variety of viewpoints, along with lighting, weather, and other variables.
About the dataset
According to the competition organizers, the competition uses a hidden test. When your submitted notebook is scored, the actual test data (including a sample submission) will be made available to your notebook. You should expect to find roughly 1,100 images in the hidden test set. The number of images in a scene may vary from <10 to ~250.
Dataset quick facts
- Download Dataset: Download from Kaggle
- License: Varies. Check the LICENSE.txt files in the image directories for details
- Dataset Size: 12.64 GB
- FiftyOne Dataset Name:
image-matching-challenge-2023
Up next, let’s download the dataset, install FiftyOne and import the dataset into the App so we can visualize it!
Step 1: Download the dataset
In order to load the “Image Matching Challenge 2023” dataset into FiftyOne, you’ll need to first download the source data from Kaggle. (Don’t forget to accept the terms of the competition first!). After unzipping the download, your directory should look like this:
Step 2: Install FiftyOne
pip install fiftyone umap-learn
If you don’t already have FiftyOne installed on your laptop, it takes less than a minute! For example on macOS:
- Verify your version of Python
- Create and activate a virtual environment
- Install IPython (optional)
- Upgrade your setuptools
- Install FiftyOne
Learn more about how to get up and running with FiftyOne in the Docs.
Step 3: Import the dataset
Now that you have the dataset downloaded and FiftyOne installed, let’s import the dataset and launch the FiftyOne App.
import glob import os import fiftyone as fo # Download and unzip `image-matching-challenge-2023.zip` and put path here dataset_dir = "/path/to/image-matching-challenge-2023" samples = [] for filepath in glob.glob(os.path.join(dataset_dir, "**"), recursive=True): if filepath.endswith((".jpg", ".jpeg", ".png", ".JPG")): folders = filepath[len(dataset_dir) + 1:].split("/")[:-2] sample = fo.Sample( filepath=filepath, tags=[folders[0]], type=folders[1], location=folders[2], ) samples.append(sample) dataset = fo.Dataset( "image-matching-challenge-2023", persistent=True ) dataset.add_samples(samples) dataset.compute_metadata()
Step 4: Optional, but awesome! Add Embeddings
Visualizing datasets in a low-dimensional embedding space is a powerful workflow that can reveal patterns and clusters in your data that can answer important questions about the critical failure modes of your model and how to augment your dataset to address these failures.
Using the FiftyOne Brain’s embeddings visualization capability can help uncover hidden patterns in the data, enabling us to take the required actions to improve the quality of the dataset and associated models.
If you’d like to take advantage of embeddings with this dataset, just bolt the following snippet onto the previous Step 3 snippet.
import fiftyone.brain as fob fob.compute_visualization( dataset, model="clip-vit-base32-torch", brain_key="img_viz", )
Tips and Tricks: If you opt to include the embeddings snippet above, depending on your hardware setup, it will take a few minutes to calculate them. So, this would be a great time to check out all the cool upcoming computer vision events sponsored by Voxel51!
Also, if you are running Python 3.11, you might see an error related to umap-learn
as one of its dependencies, numba
, is not yet supported. If you encounter this, simply run a supported environment like 3.9. For example:
conda create -env39 python=3.9 conda activate env39 pip install fiftyone umap-learn
Step 5: Launch the App to visualize the dataset
session = fo.launch_app(dataset)
The code snippet above will launch the FiftyOne App in your default browser. You should see the following initial view of the image-matching-challenge-2023
dataset by default in the App:
Ok, let’s do a quick exploration of the dataset!
Sample details
Click on any of the samples to get additional detail like tags, metadata, labels, and primitives.
Filtering by location
The dataset includes 23 distinct locations including Notre Dame, Taj Mahal, Buckingham Palace, the Lincoln Memorial and more. For example, let’s filter the samples by those labeled pantheon_exterior.
Filtering by type
The dataset includes four ‘types’, including phototourism
, haiper
, heritage
, and urban
. For example, let’s filter the samples by those of the heritage
type.
Filtering by id or filepath
FiftyOne also makes it very easy to filter the samples to find the ones that meet your specific criteria. For example we can filter by a specific id
:
Embeddings
If you opted for calculating embeddings in Step 4 you can view them by clicking on the “+” to the right of the “Samples” tab and selecting “Embeddings.” In this example we’ll select img_vix
as our Brain Key, the location
label, and lasso the green cluster which captures the images labeled cyprus
.
Cool! So, what can you do with embeddings? You can:
- Identify anomalous/incorrect image labels
- Find examples of scenarios of interest
- Pre-annotate unlabeled data for training
Learn more about how to work with embeddings in the FiftyOne Docs.
Start working with the dataset
Now that you have a general idea of what the dataset contains, you can start using FiftyOne to perform a variety tasks including:
- Creating dataset views
- Creating aggregations
- Creating interactive plots
- Annotating datasets
- Evaluating models
You can also start making use of the FiftyOne Brain which provides powerful machine learning techniques you can apply to your workflows like visualizing embeddings, finding similarity, uniqueness, and mistakenness.
Sponsors and acknowledgements
This competition is being sponsored by Haiper (Canada) Ltd., Google, and Kaggle. The individuals responsible for putting together the competition include:
Eduard Trulls (Google), Dmytro Mishkin (Czech Technical University in Prague/HOVER Inc), Jiri Matas (Czech Technical University in Prague), Fabio Bellavia (University of Palermo), Luca Morelli (University of Trento/Bruno Kessler Foundation), Fabio Remondino (Bruno Kessler Foundation), Weiwei Sun (University of British Columbia), and Kwang Moo Yi (University of British Columbia/Haiper).