A Comprehensive Guide Using FiftyOne and Ultralytics YOLOv5

Drones are revolutionizing industries from agriculture to surveillance, and the ability to accurately detect and analyze objects from aerial imagery is becoming increasingly invaluable. Embarking on the journey of harnessing the potential of drone data through advanced object detection techniques is an exciting endeavor. This tutorial dives deep into the powerful synergy between FiftyOne and Ultralytics YOLOv5, two cutting-edge tools that, when combined, offer a robust solution for training drone data. Whether you're a seasoned machine learning practitioner or a drone enthusiast venturing into the realm of AI, this guide will equip you with the knowledge and tools to navigate the complexities of drone data training and object detection.

Wait, What’s FiftyOne?

FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster.

Getting Started With Your Data

Drone data is very sensitive in computer vision and it is extremely important to keep all factors in mind. One small misrepresentation can severely hurt model accuracy. When curating or collecting data, it is important to take into account the angle, height, lens type, and more as to keep it consistent with how you plan on using your drone for its computer vision task. That task may be search and rescue, surveying, agriculture, or traffic detection. All of these tasks have different flight requirements and will be “seeing” the work it is doing differently.

An easy to depict issue that can arise is the changing height of the terrain below the drone, which can lead to troublesome data. If there is an expected height above target for the use case, that should also be consistent on data collection. If the drone should be 50 feet above the ground but you are flying above a hill, the drone should also raise its altitude. Neglecting to do so will not only change the appearance of the image, but you can be missing coverage in your scan. None of these issues are going to totally impede training your model but could impact performance. After all the data has been collected, the only actions you can make are additional annotations or data curation. Setting some guidelines for what data is acceptable and what is not for the use case is a great start.

For this walkthrough on using FiftyOne and Ultralytics to train an ML model on drone data, I will be using the Kaggle Roundabout Aerial Images dataset, but you can apply the same workflow to any annotated detection dataset though, so feel free to use your own.

Training With YOLOv5

One of the most painful experiences one goes through with training detection models is changing data types from one to another. Thankfully the days of parsing through several GB large COCO json files or scrambling together VOC xml files is over. FiftyOne allows you to natively convert your dataset to any type quickly and easily. We can start from data all the way to training in a few steps.

Step 1: Load Your Data

Let's start by loading our VOC dataset.

Now that all of our data is loaded, it is time for us to convert it into into the YOLO format.

Step 2: Convert Into YOLO Format

To do this we will shuffle our data; we are using FiftyOne random_split. With it, we can easily split our data into a 85/15 split and export these datasets to the YOLO format. The operation will create a yaml file that points to the proper directories and will assist us in training. With these few steps, our data is already prepared to go train with Ultralytics!

Step 3: Training With Ultralytics

With our data all ready to go, next up is to train a model. Providing great tools as well as an extremely powerful model, Ultralytics is a great option when training your object detection models. To download, clone their YOLOv5 repository by running the commands in your terminal:

Ultralytics offers many different variations of training such as Multi-GPU, hyperparameter search, as well as pruning. For the sake of this demo, we will just be doing the default training. It is recommended that to achieve the highest accuracy you should tweak the training to fit your use case and data the best. For tutorials or tips on how to take advantage of YOLOv5 features, hop on over to their docs. For the default training, follow along with:

As is the case with most training runs, this can take quite a while on most machines and requires a graphics card. After the model has finished training and you are satisfied with the results, we can hop back into FiftyOne for some insights on how well the model trained.

Step 4: Post Training

Our model has been trained and our weights have been saved. Now, in order to deploy or run inference on our trained model, we need to export it to the format of our choosing. I chose TorchScript due to its fast compiled nature and ease to work with. First locate your weights at /path/to/yolov5/runs/train/exp#/weights/best.pt then to export to Torchscript, run the following command:

This will save the TorchScript file at the same location. The Ultralytics training run should have provided us with several evaluation marks such as loss or mAP. However, it is beneficial to take a deeper look to find exactly which images our model struggled with to develop a strategy to curtail this in future experiments. Let’s hop into FiftyOne for the answers!

Prepping the Model for Inference

With our model loaded and ready to go, let's take a slice of our dataset and introspect a bit. We start with creating a prediction view of 100 samples:

We follow up by creating an inference loop that will inference on the 100 images and add the detections to the samples. This will allow us to compare the detections in the FiftyOne App in the future.

With the App restarted and our new view in place, we can extract fresh insights from our data. Immediately, we're able to observe our detections superimposed on the ground truths, facilitating a performance evaluation. Additionally, the option to hide the ground truths provides a clear picture of missed detections. Another valuable suggestion is to experiment with the label confidence sliders, offering a glimpse of our high and low-confidence predictions.

Upon analyzing the data, I can draw conclusions that were previously inaccessible without a thorough examination:

The model performs strongly with high confidence on cars going around the roundabout
The model struggles at closely bunched cars, such as parked cars
The model mistakes rectangular objects as vehicles potentially

Investigating the Embeddings

With the FiftyOne Brain, we can take an even deeper look at our data and predictions by using embeddings. By executing a couple commands beforehand, we can open a power visualization tool that shows the groups within your data and how your model is performing on them. Take a look below:

We can grab awesome insights of where maybe we need to add to our dataset. We can potentially try to add more parked car views, different street views other than roundabouts, or even think about decreasing the height of the drone to increase the information we get about each car. Looking through our highest and lowest confidence predictions compared to ground truths allows us to learn more about data than a simple training run or evaluation could ever do. If you'd like learn more, check out this talk I gave about many of the topics in this blog in August at the Computer Vision Meetup. In summary, this tutorial has explored the exciting realm of harnessing the potential of drone data, spotlighting the synergy between FiftyOne and Ultralytics YOLOv5. By meticulously curating drone data and leveraging these powerful tools, this guide equips both AI enthusiasts and machine learning practitioners with the expertise to navigate the complexities of object detection training. The provided steps, from data preparation and model training to post-training analysis, underscore the importance of tailored approaches for specific use cases of working with drone data. By offering insights into model performance and utilizing embeddings for deeper analysis, this tutorial empowers users to drive accuracy and make informed decisions in the dynamic intersection of drone technology and artificial intelligence.

Talk to a computer vision expert