(Part 1) Generating Anchor boxes for Yolo-like network for vehicle detection using KITTI dataset.

In this post, I will present steps for computing anchor boxes for YOLO9000 (or YOLOv2). YOLOv2 is a combined classification-bounding box prediction framework where we directly predict the objects in each cell and the corrections on anchor boxes. More specifically, YOLOv2 divides the entire image into 13X13 grid cells, next places 5 anchor boxes at each location and finally predicts corrections on these anchor boxes. YOLOv2 makes 5 predictions corresponding to corrections on location of center (x and y), height and width, and finally the intersection over union (IOU) between predicted bounding boxes and ground truth boxes. A unique feature of YOLOv2 is that all the predictions are have magnitude less than 1, as a result the chance of one type of cost dominating the optimization is less likely. A unique feature of YOLOv2 is that the anchor boxes are designed specifically for the given dataset using K-means clustering. Unlike other anchor boxes (or prior) based methods, like Single Shot Detection, YOLOv2 does not assume the aspect ratios or shapes of the boxes.  <a href="https://vivek-yadav.medium.com/part-1-generating-anchor-boxes-for-yolo-like-network-for-vehicle-detection-using-kitti-dataset-b2fe033e5807">Click Here</a>