If you’ve worked with any visual data for machine learning, you may have seen rectangles called
bounding boxes. But
what are bounding boxes? A
bounding box in computer vision is a rectangle (2‑D) or cuboid (3‑D) that encloses an object so models know where it is. Think of it as the tightest frame that fully contains the target — a
bounding box defined by its coordinates. Labeling these boxes is usually step one when teaching models to detect objects.
Why Use Bounding Boxes?
Bounding boxes provide ground truth for object‑detection models and a yardstick for evaluation. Metrics such as
Intersection over Union (IoU) measure overlap between predicted and true boxes, and datasets like
COCO compute mAP from these overlaps to rank models. Without accurate boxes, a detector can’t learn
where things are.
2‑D vs 3‑D Bounding Boxes
Most people start with 2‑D axis‑aligned rectangles in images or video. In multimodal stacks — e.g., autonomous driving — we extend the idea to 3‑D cuboids in LiDAR point clouds. A 3‑D box captures length, width, height, and rotation, letting downstream planners reason in real‑world space.
Working with Bounding Boxes in FiftyOne
Voxel51’s
FiftyOne includes a browser‑based App that lets you overlay detections and ground truth. Filter by box area, confidence, or label to surface edge cases instantly. Need more labels? Pipe a view to CVAT or Labelbox via the
annotation API, collect fresh boxes, then re‑evaluate — all without leaving your workflow. If you’re working with point clouds, the
3‑D viewer renders cuboids in WebGL so you can orbit around scenes and inspect every angle.
From Dataset to Deployment
Whether you’re boxing stop signs in dash‑cam footage or cuboids around drones in LiDAR, bounding boxes are the scaffolding that lets vision models learn and localize. Clean, well‑curated boxes lead to better performing models.