Fine-Tuning a YOLOS model for object detection on a fashion dataset

In the Fashionpedia dataset, boxes are specified in the following format: (x1, x2, y1, y2) which differs from the format required by YOLOS (x_center, y_center, width, height). The following functions will help us switch between two formats. <pre> def xyxy_to_xcycwh(box): x1, y1, x2, y2 = box.unbind(dim=1) width = x2-x1 height = y2-y1 xc = x1 + width*0.5 yc = y1 + height*0.5 b = [xc, yc, width, height] return torch.stack(b, dim=1) def cxcywh_to_xyxy(x): x_c, y_c, w, h = x.unbind(1) x1 = x_c - 0.5 * w y1 = y_c - 0.5 * h x2 = x_c + 0.5 * w y2 = y_c + 0.5 * h b = [x1, y1, x2, y2] return torch.stack(b, dim=1)</pre> We will also need to preprocess the images of the dataset, which are stored as PILImage, in order to do this, we are going to use the YOLOS Feature Extractor which will convert the image into a tensor of values. <a href="https://pub.aimind.so/fine-tunning-a-yolos-model-for-object-detection-on-a-fashion-dataset-94bc59fa192e">Click Here</a>