Bounding Box
If you’ve worked with any visual data for machine learning, you may have seen rectangles called bounding boxes. But what are bounding boxes? A bounding box in computer vision is a rectangle (2‑D) or cuboid (3‑D) that encloses an object so models know where it is. Think of it as the tightest frame that fully contains the target — a bounding box defined by its coordinates. Labeling these boxes is usually step one when teaching models to detect objects.

Why Use Bounding Boxes?

Bounding boxes provide ground truth for object‑detection models and a yardstick for evaluation. Metrics such as Intersection over Union (IoU) measure overlap between predicted and true boxes, and datasets like COCO compute mAP from these overlaps to rank models. Without accurate boxes, a detector can’t learn where things are.

2‑D vs 3‑D Bounding Boxes

Most people start with 2‑D axis‑aligned rectangles in images or video. In multimodal stacks — e.g., autonomous driving — we extend the idea to 3‑D cuboids in LiDAR point clouds. A 3‑D box captures length, width, height, and rotation, letting downstream planners reason in real‑world space.

Working with Bounding Boxes in FiftyOne

Voxel51’s FiftyOne includes a browser‑based App that lets you overlay detections and ground truth. Filter by box area, confidence, or label to surface edge cases instantly. Need more labels? Pipe a view to CVAT or Labelbox via the annotation API, collect fresh boxes, then re‑evaluate — all without leaving your workflow. If you’re working with point clouds, the 3‑D viewer renders cuboids in WebGL so you can orbit around scenes and inspect every angle.

From Dataset to Deployment

Whether you’re boxing stop signs in dash‑cam footage or cuboids around drones in LiDAR, bounding boxes are the scaffolding that lets vision models learn and localize. Clean, well‑curated boxes lead to better performing models.

More resources: Evaluating Object Detections in FiftyOne | Visualizing 3D Object Detections Blog

Part of a team?

Talk to a computer vision expert.