Welcome to the latest installment of our ongoing blog series where we highlight a dataset from the FiftyOne Dataset Zoo! FiftyOne provides a Dataset Zoo that contains a collection of common datasets that you can download and load into FiftyOne via a few simple commands. In this post, we explore the Families in the Wild dataset and some of its use cases.
You can watch an abbreviated version of this blog here:
Wait, what’s FiftyOne?
FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.
The FiftyOne Dataset Zoo comprises more than 30 datasets, with new datasets being added all the time! They cover a variety of use cases including:
- Video
- Images
- Location
- Point-cloud
- Action-recognition
- Classification
- Detection
- Segmentation
- Relationships
- and more!
About the Families in the Wild dataset
Families in the Wild (FIW) is a public benchmark for recognizing families via facial images. Developed and maintained by the Synergetic Media Learning (SMILE) Lab at Northeastern University, it is the largest and most comprehensive database available for kinship recognition. The dataset contains more than 26k images of 5k faces collected from almost 1k families. A unique Family ID (FID) is assigned per family, ranging from F0001-F1018 (Note: Some families in the dataset were merged or removed since it was first released in 2016).
The Families in the Wild dataset includes the families of publicly recognizable people like Jimmy Fallon, Bob Dylan, Michael Jackson, Margaret Thatcher, and Bruce Lee. The images span a variety of styles and eras, from old black and white photos to modern color images.
What is visual kinship recognition?
This discipline within computer vision aims to:
- Identify who is related and who is not related
- Identify different family relationship types like mother, father, children, and sibling
- Create fine-grain classification so family trees can be constructed that include uncles, aunts, cousins, grandparents, etc.
Visual kinship recognition can be applied to use cases like:
- Search identification: Identifying fugitives or criminal suspects
- Modern-day refugee crisis: Reuniting lost family members who have been separated due to war, famine, or natural catastrophes
- Missing children: Finding relatives of missing or exploited children
- Genealogy services and research: Using instead of or in conjunction with DNA-based approaches
As you can imagine, visual kinship recognition is a challenging problem to solve because of intra and inter-class variation, individuals that look the same but are not related, insufficient data distribution, and a lack of labeled data. The goal of the FIW project was to build a large-scale kinship dataset to address some of these challenges!
Dataset quick facts
- Dataset Source: Hosted and maintained by the SMILE Lab at Northeastern University
- License: MIT
- Last Update: March 18, 2022
- GitHub: README and development kit
- Dataset Name in FiftyOne: fiw
- Dataset Size: 173.00 MB
- Tags: image, kinship, verification, classification, search-and-retrieval, facial-recognition
- Supported Splits: test, val, train
- ZooDataset class: FIWDataset
Note: For convenience, FiftyOne provides get_pairwise_labels()
and get_identifier_filepaths_map()
utilities for FIW.
Acknowledgements
What more details about the dataset? For statistics, task evaluations, benchmarks, and more, please check out the following journal article by the FIW authors:
Robinson, JP, M. Shao, and Y. Fu. “Survey on the Analysis and Modeling of Visual Kinship: A Decade in the Making.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2021
There is also an in-depth presentation on YouTube by Joseph Robinson about the dataset and benchmark.
Ok, let’s get started exploring!
First things first, install FiftyOne
If you don’t already have FiftyOne installed on your laptop, it takes just a few minutes! For example on MacOs:
- Verify your version of Python
- Create and activate a virtual environment
- Upgrade your Setuptools
- Install FiftyOne
Learn more about how to get up and running with FiftyOne in the Docs.
Next, import the dataset
Now that you are up and running, importing the dataset and launching the FiftyOne App takes just a few more lines of code:
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset("fiw", split="test")
session = fo.launch_app(dataset)
How are faces cropped?
Faces were cropped from imagery using the five-point face detector MTCNN from various phototypes (i.e., mostly family photos, along with several profile pics of individuals (facial shots)). The number of members per family varies from 3-to-26, with the number of faces per subject ranging from 1 to >10.
Levels and label types
Various levels and types of labels are associated with samples in this dataset. Family-level labels contain a list of members, each assigned a member ID (MID) unique to that respective family. For example: F0870.MID2 refers to member 2 of family 870.
Each member has annotations specifying gender and relationship to all other members in that respective family.
The relationships in FIW are:
Viewed in the FiftyOne App:
Within FiftyOne, each sample corresponds to a single face image and contains primitive labels of the Family ID, Member ID, etc. The relationship labels are stored as multi-label Classifications, where each classification represents one relationship that the member has with another member in the family. The number of relationships will differ from one person to the next, but all faces of the same person will have the same relationship labels.
Additionally, the labels for the Kinship Verification task are also loaded into this dataset through FiftyOne. As with relationships, the kinship labels are stored as classifications. Whereas a relationship label only specifies a role for the associated person, such as “parent”, a kinship label specifies both sides, with fd representing a Father-Daughter kinship or md for Mother-Daughter.
In order to make it easier to browse the dataset in the FiftyOne App, each sample also contains a face_id
field containing a unique integer for each face of a member, always starting at 0. This allows you to filter the face_id
in the App to show only a single image of each person.
For your reference, the relationship labels are stored on disk in a matrix that provides the relationship of each member with other members of the family as well as names and genders. The i-th row represents the i-th family member’s relationships to the other members. In other words, the entry in the i-th row and the j-th column corresponds to the i-th family member’s relationship to the j-th member of the family.
For example, FID0001.csv
contains:
This family has three members, as listed under the MID column (far-left). Each MID reads across its row. We can see that MID1 is related to MID2 by 4 -> 1 (Parent -> Child), which of course can be viewed as the inverse, i.e., MID2 -> MID1 is 1 -> 4. It can also be seen that MID1 and MID3 are spouses of one another, i.e., 5 -> 5.
Note: The spouse label will likely be removed in a future version of this dataset as it serves no value to the problem of kinship.
What’s next?
- If you like what you see on GitHub, give the project a star.
- Get started! We’ve made it easy to get up and running in a few minutes.
- Join the FiftyOne Slack community, we’re always happy to help.