US 11,790,040 B2
	Method for object detection and recognition based on neural network
Yongduan Song, Chongqing (CN); Shilei Tan, Chongqing (CN); Li Huang, Chongqing (CN); Ziqiang Jiang, Chongqing (CN); Jian Liu, Chongqing (CN); and Lihui Tan, Chongqing (CN)
Assigned to DIBI (CHONGQING) INTELLIGENT TECHNOLOGY RESEARCH INSTITUTE CO., LTD., Chongqing (CN)
Filed by Dibi (Chongqing) Intelligent Technology Research Institute Co., Ltd., Chongqing (CN)
Filed on Jul. 7, 2021, as Appl. No. 17/368,946.
Claims priority of application No. 202110268857.1 (CN), filed on Mar. 12, 2021.
Prior Publication US 2022/0292311 A1, Sep. 15, 2022
Int. Cl. G06F 18/214 (2023.01); G06N 3/08 (2023.01); G06T 3/40 (2006.01); G06F 18/2415 (2023.01)

CPC G06F 18/2148 (2023.01) [G06F 18/2415 (2023.01); G06N 3/08 (2013.01); G06T 3/4046 (2013.01)]

2 Claims

1. A method for object detection and recognition based on a neural network, comprising:

S100: constructing a new YOLOv5 network model by adding a detection layer following three detection layers of an existing YOLOv5 network model;

S200: training the new YOLOv5 network model, wherein a specific training process comprises:

S210: constructing a training data set: acquiring N images, resizing each of the N images to make it suitable for model training, and labeling each of the N images with ground truth boxes and object class labels, wherein all of the N labeled images constitute the training data set;

S220: setting thresholds for a center-to-center distance and an aspect ratio of the new YOLOv5 network model;

S230: initializing parameters in the new YOLOv5 network model;

inputting all samples of the training data set into the new YOLOv5 network model, performing calculation through a following formula:

IoU represents an aspect ratio of a predicted box and a ground truth box, and is expressed by:

R_CIoUrepresents a distance between a center point of the ground truth box and a center point of an overlapping area between the predicted box and the ground truth box, and is expressed by:

where s_irepresents a classification score of an object of each class, ε represents an artificially set NMS threshold, M represents a value of the predicted box with a highest score, B_irepresents a list of the predicted boxes, b represents the predicted box, b^gtrepresents the ground truth box, ρ²(b, b^gt) represents a distance between a center point of the predicted box and the center point of the ground truth box, and is expressed by a diagonal length of a smallest enclosing rectangular box covering the predicted box and the ground truth box, ω^gtand h^gtrespectively indicate width and height of the ground truth box, ω and h respectively indicate width and height of the predicted box;

S240: during the training in S230, due to suppression problems of the predicted box, when a difference between IoU-CIoU of the predicted box M with the highest score and IoU-CIoU of another box B_iis less than the set threshold ε, keeping a score s_iof the box B_iunchanged; otherwise, directly changing s_ito 0, to filter out the predicted box; and

calculating loss functions, the loss functions include an object loss function, a class loss function, and a box loss function, performing repeated iterative training to minimize the loss functions, and obtaining optimal parameters of the network model; and

S300: detecting a to-be-detected image: resizing the to-be-detected image through the method in S210 and inputting resized images into the trained new YOLOv5 network model for prediction, outputting the predicted box of an object and probability values corresponding to a class to which the object belongs, and setting a class corresponding to a maximum probability value as a predicted class of the object in the to-be-detected image.