Saying that machine learning (ML) is transforming industries worldwide doesn’t quite capture its true impact. ML has brought a 360-degree shift to how businesses operate and solve problems, and experts predict the global ML market will reach USD 500 billion by 2030.
Machine learning models assist in early disease detection and personalized healthcare treatments. Financial institutions also use ML for fraud detection and algorithmic trading. Autonomous vehicles depend on these models for safe navigation. Even customer service is witnessing a revolution with ML powering chatbots and recommendation systems.
However, the success of these applications relies heavily on data quality. ML models can only perform as well as the data used for training. High-quality, well-prepared datasets result in more reliable models with high generalizability.
But what determines data quality? Annotation (also known as labeling) is one of the most critical factors. Annotations describe the content of each data sample, helping ML models understand what the sample contains.
Here, we will discuss what annotation is and why accurate dataset annotation is important for ML development. We will also discuss the risks linked with poor annotation practices, key strategies for high-quality labeling, and how FiftyOne can improve how you annotate data, particularly for computer vision (CV) systems.
Dataset annotation adds key information to data samples that enable ML models to learn patterns and make predictions. For example, training a CV model may include adding accurate and relevant labels to images. The model can then learn patterns that help identify the content within each image.
Multiple annotation strategies exist for different data types. The list below highlights the methods used for labeling visual, text, and audio data for training and evaluating ML models.
Visual annotation labels visual media samples to train and evaluate models. Types of visual data include still images, video, and 3D scenes, as well as standardized formats like DICOM for medical imaging. Common computer vision tasks requiring annotation include classification, object detection, and segmentation.
The goal in classification is to apply one or more labels to an entire image sample, whereas, in object detection, annotators mark objects with bounding boxes, which are useful to train the model to find individual objects and their locations within the image. Keypoint annotation identifies key landmarks, within an object, such as individual joints on a body, for predicting movements. And segmentation is even more complex, requiring pixel-level annotation for precise object boundaries. Semantic segmentation, for example, assigns a specific label to each pixel based on the object class it belongs to (e.g., “car”, “dog”, or “stopsign”). All these image annotation types are essential for applications like autonomous driving and facial recognition.
Sentiment analysis identifies emotions or opinions in a text. For instance, it can label customer feedback as “positive,” “negative,” or “neutral”. PoS tagging identifies and labels each word in a text with its corresponding grammatical role or part of speech. In contrast, NER extracts and labels entities like names, dates, or locations.
Text annotations are essential for building natural language processing (NLP) applications like chatbots and virtual assistants.
Finally, audio annotation labels audio files to train models that process and analyze sounds and voices. Key tasks include speech-to-text transcription, text-to-speech generation, and speaker identification. Labeling audio requires converting audio files into waveforms and spectrograms. Developers can use these formats to label specific regions to indicate frequency, amplitude, and length.
Human expertise helps ensure annotations are accurate, relevant, and context-aware. Skilled annotators understand nuances, cultural differences, and domain-specific knowledge that automated tools might miss. Their input helps maintain data quality and align annotations with real-world contexts.
However, while human expertise is invaluable, even the most skilled annotators make mistakes. Misunderstandings or inconsistent interpretations of the data can affect annotations. Numerous examples in ML datasets illustrate how poor annotation can derail model training.
In January 2021, Koksal et al. published a paper on how annotation inconsistencies impact drone detection using the well-known detection model YOLOv3. They used the CVPR-2020 Anti-UAV Challenge dataset to train YOLOv3 and assess its detection and tracking performance. The dataset contains 300 video frames and 580,000 manually annotated bounding boxes.
The researchers introduced multiple annotation errors, including additional bounding boxes, missing boxes, and shifted boxes. They assessed the impact of these errors on the model’s tracking accuracy.
The results show that with original annotations, tracking accuracy was 73.6%. However, the combined effect of the annotation errors reduced accuracy to 54.2%. The performance degradation clearly highlights the importance of annotation quality, even in robust models like YOLOv3.
Although specific annotation procedures may differ by use case, the following sections outline best practices to help you ensure high-quality, reliable annotations for any application.
Choosing the correct annotation schema is vital. Start by clearly defining the primary task to balance speed and accuracy. For instance, classification tasks, which assign labels to an entire image, are quicker to complete but offer less detail. In contrast, bounding boxes or pixel-level annotations of objects within images require more time and precision but provide richer data for complex models. Documenting attributes and labels is equally critical and ensures consistent labeling across the dataset.
Also, consider adopting an existing data schema, like the format used by the COCO dataset, which offers predefined categories and annotation structures. Using established schemas saves time, ensures compatibility with industry standards, and supports benchmarking against existing ML models. The schema should also address ambiguities and edge cases in the data. For example, annotators need instructions on handling occluded or unclear objects, such as applying fallback labels like “unknown” or “partial.”
A diverse pool of annotators can help reduce bias in the annotation process. Different cultural, linguistic, and demographic backgrounds can offer varied perspectives. This reduces the risk of annotations reflecting a narrow or skewed worldview. For example, in image annotation, multiple annotators can reduce biases related to ethnicity, gender, or social context, resulting in more inclusive and fair datasets.
Double-annotation and consensus-based approaches can also boost annotation quality. Double annotation requires multiple annotators to label the same data. The method can quickly reveal discrepancies and inaccuracies that a single annotator may miss.
A consensus-based approach takes this further by requiring annotators to agree on the final label. The process may include discussions or automated algorithms that resolve disagreements.
Given the importance of data quality, we need good software to manage the datasets being annotated. FiftyOne is a tool that enables visual AI builders to refine data and models in one place. It includes a variety of features to both visualize datasets and improve data quality. For example, FiftyOne makes it easy to find and remediate mistakes in your data, such as annotation errors like incorrect or missing object labels.
The previous sections emphasized the importance of best practices for improving annotation pipelines. Now, let’s go over some real-world scenarios where accurate and consistent annotations directly impact model performance.
We mentioned earlier that FiftyOne is a toolkit for managing and refining the datasets used in your visual AI projects. FiftyOne helps teams curate the samples needing annotation and perform quality control to ensure annotations are correct and relevant to your models, while also integrating with your chosen annotation software.
FiftyOne makes it easy to navigate through large datasets both programmatically and in the App in order to curate the data needing annotation. At a minimum, this could mean examining and filtering sample metadata for vital statistics like image size and aspect ratio. It could also mean verifying quality metrics like object clarity and potential duplicate samples before passing the data to annotators.
FiftyOne includes a rich plugin ecosystem to assist annotation teams. For example, a team can use the zero-shot model prediction plugin to perform initial predictions on the dataset that are later validated by humans. Then, once annotation teams are ready to begin their work, FiftyOne can hook into any defined annotation backend. For instance, teams can generate dataset views containing samples likely needing new or edited annotations, upload those samples to a platform like CVAT, perform the annotations, and then reload the samples back into FiftyOne.
FiftyOne works with visual media like images, videos, 3D scenes, and even geolocation data, and likewise includes many ways to identify possible labeling mistakes. One method is to compute embeddings from the data. Embeddings are vector representations of image properties. They can be used to identify distributions of unique, similar, or ambiguous samples within a dataset, and also visually identify outliers that could indicate mistaken labels.
You could also use FiftyOne’s interactive plots to create dynamic charts and dashboards. These are viewable in the aAp and can be useful for catching annotation mistakes. In the figure below, the pie chart shows the ratio of model-predicted false positives to total positives in the sample set. Selecting the false positive region of the pie chart creates an equivalent dataset view of false positive samples. From there you can further drill into the samples to determine if some of these false positives are the result of annotation errors.
The enterprise version of FiftyOne is built for collaboration, with the goal of making it as easy as possible for multiple users to work together to build high-quality datasets and computer vision models. A data scientist, for instance, might use FiftyOne to tag samples as likely label mistakes and then share that dataset with a data quality assurance team to verify, before sending the tagged images for reannotation with one of FiftyOne’s native annotation integrations. The data science and QA teams can then iterate between correcting sample annotations and re-training the model with the improved data.
As use cases become more complex, the availability of labeled data becomes more limited. This phenomenon is making model development a major challenge. However, ongoing research into alternative approaches is leading to innovative annotation methods that require minimal labeled data for training models.
The following section provides an overview of these emerging trends and how FiftyOne integrates with them to offer a comprehensive end-to-end labeling solution.
Active learning allows models to suggest which data points to annotate next. Instead of labeling random samples, the model predicts labels for a subset of unlabeled data. It then identifies predictions for which it has low confidence.
Lastly, it asks the user to provide the labels for such data points. The technique reduces the amount of labeled data needed and makes the process more resource-efficient and cost-effective.
Semi-supervised learning (SSL) trains models on labeled and unlabeled data. In this approach, annotators provide a small amount of labeled data to help the model learn and improve.
Additionally, large amounts of unlabeled data help refine the model’s understanding. SSL is beneficial when labeled data is scarce or costly to obtain. It is a more efficient and scalable alternative to traditional, fully supervised learning.
FiftyOne integrates with active learning and SSL frameworks. This includes a plugin for the modAL library for building active learning pipelines. You can easily integrate it into your data annotation projects to enhance labeling accuracy with minimal additional coding effort. Additionally, the modAL plugin is compatible with industry-standard annotation tools like CVAT, Labelbox, and Label Studio, ensuring a smooth and efficient annotation process.
FiftyOne also supports state-of-the-art models like CoTracker3, an innovative point tracker that predicts a point’s trajectory in a video using its existing location. Unlike other tracking frameworks that use synthetic data, CoTrakcer3 uses SSL to predict locations based on real-world video feeds.
With FiftyOne’s model evaluation features, developers can use CoTracker3’s SSL ability to improve a model’s generalization performance.
With rising concerns around AI’s reliability in mission-critical applications like healthcare and autonomous cars, high-quality annotations are key ingredients in determining a model’s success. Accurate and consistent annotations ensure models learn effectively and perform well in unknown situations.
Organizations can invest in tools like FiftyOne to help developers build scalable labeling pipelines for multiple use cases. The platform empowers AI experts by streamlining and integrating the annotation process with advanced techniques like active and semi-supervised learning.
If you want to take your AI workflows to the next level, you can explore the following resources to help you get started:
Talk to an expert about FiftyOne for your enterprise.
Like what you see on GitHub? Give the Open Source FiftyOne project a star
Get answers and ask questions in a variety of use case-specific channels on Discord