US 12,462,530 B2
	Optimized single shot detector (SSD) model for object detection
KeYong Yu, Singapore (CN)
Assigned to ACCENTURE GLOBAL SOLUTIONS LIMITED, Dublin (IE)
Filed by ACCENTURE GLOBAL SOLUTIONS LIMITED, Dublin (IE)
Filed on Dec. 28, 2022, as Appl. No. 18/089,663.
Prior Publication US 2024/0221361 A1, Jul. 4, 2024
Int. Cl. G06V 10/764 (2022.01); G06V 10/77 (2022.01); G06V 10/82 (2022.01)

CPC G06V 10/765 (2022.01) [G06V 10/7715 (2022.01); G06V 10/82 (2022.01); G06V 2201/07 (2022.01)]

20 Claims

1. A system comprising:

a processor;

a memory coupled to the processor, wherein the memory comprises processor-executable instructions, which on execution, cause the processor to:

receive an image of a plurality of objects;

determine a plurality of feature layers and a plurality of feature cell sizes corresponding to the received image, based on an aspect ratio of the received image;

determine an aspect ratio of one or more anchor boxes from a trained model file;

determine, based on the aspect ratio of one or more anchor boxes, a position and a number of the one or more anchor boxes to be tiled in each feature cell of the plurality of feature layers, wherein the number of one or more anchor boxes to be tiled is based on the aspect ratio of the one or more anchor boxes and the plurality of feature cell sizes;

assign the one or more anchor boxes as a horizontal tile or a vertical tile in each feature cell, when the anchor box aspect ratio is less than a first pre-defined threshold value and greater than a second pre-defined threshold value, respectively;

generate one or more feature maps using an object detection model comprising a neural network (NN) model, wherein the one or more feature maps comprises one or more feature map tensors;

generate, for each layer of the one or more feature maps, a prediction tensor of a pre-defined dimension, from the one or more feature map tensors, using a prediction convolution layer, wherein, for each layer of the one or more feature maps, the prediction convolution layer is created based on each feature cell size, the position of the one or more anchor boxes, and the aspect ratio of the one or more anchor boxes; and

detect and classify the plurality of objects, based on the generated prediction tensor.