US 11,790,640 B1
Method for detecting densely occluded fish based on YOLOv5 network
Jun Yue, Yantai (CN); Cheng Dong, Yantai (CN); Zhenbo Li, Beijing (CN); Jun Zhang, Yantai (CN); Guangjie Kou, Yantai (CN); Shixiang Jia, Yantai (CN); Ning Li, Yantai (CN); and Hao Sun, Yantai (CN)
Assigned to Ludong University, Yantai (CN)
Filed by Ludong University, Yantai (CN)
Filed on Mar. 15, 2023, as Appl. No. 18/184,529.
Claims priority of application No. 202210709818.5 (CN), filed on Jun. 22, 2022.
Int. Cl. G06V 10/80 (2022.01); G06N 3/04 (2023.01); G06V 10/20 (2022.01)
CPC G06V 10/806 (2022.01) [G06V 10/255 (2022.01)] 6 Claims
OG exemplary drawing
 
1. A method for detecting densely occluded fish based on a You Only Look Once 5 (YOLOv5) network, comprising a data set establishment and processing part, a model training part and a model testing part, wherein the data set establishment and processing part comprises data collection of fish pictures, data labeling and data division of the fish pictures, and the data division is to divide data into a training set, a verification set and a test set,
wherein the data in the training set is expanded by changing brightness, contrast and saturation of the fish pictures, and then the data is input into a neural network model used by the YOLOv5 network for training; when the model is trained, a mosaic method is used as an algorithm to enhance the data; four pictures selected in the training set are scaled and cropped respectively, and a scaled picture size is 0.5 to 1.5 times of an original picture size, and a cropped range is a cropping of 1/10 of a left side or a right side of one picture; and then these four pictures are placed in an order of an upper left corner, an upper right corner, a lower left corner and a lower right corner, and these four pictures are spliced into one picture, and these four pictures are input into the network as one picture for training, and the picture size is scaled to 640×640;
the data division is to divide the pictures into the training set, the verification set and the test set according to a ratio of 8:1:1 after the data labeling is completed;
in the model training part, a model training budget result output by the model is calculated by using a loss function, an error of the training budget result is obtained, and parameters in the neural network of the model are updated to improve an accuracy of the model; the loss function is an improved loss function, and an improved repulsion loss function is introduced into the loss function to enhance an ability of the model to detect mutually occluded fish;
the improved repulsion loss function is LRepGT, and SmoothIn and SmoothL1 functions used in the LRepGT function make prediction boxes of different fishes repel each other, so as to achieve an effect of being far away from each other and reduce an overlap degree between the prediction boxes, thereby reducing a number of fish missed detection and improving a detection accuracy;
the improved repulsion loss function is as follows:

OG Complex Work Unit Math
wherein λ1 and λ2 are weight values of each function, and custom character+={P} represents a set of all positive samples in one picture; BP represents the prediction box, GRephu P represents the prediction box BP and truth boxes of other targets that have the greatest intersection over union with BP except the truth box corresponding to BP;
the GRepP,

OG Complex Work Unit Math
wherein GAttrP=argmaxG∈custom characterIoU(G,P), custom character={G} represents a set of all the truth boxes in one picture; expressions of SmoothIn(), SmoothL1(), and IoG() are as follows, and σ∈[0,1];

OG Complex Work Unit Math
the picture in the training set is input into the neural network model, the features are extracted through a backbone network in the model, and the extracted features are transported to a feature pyramid for a feature fusion; then, the fused features are transported to a detection module; after a detection, prediction results on three different scales are output; and the prediction results comprise a category, a confidence and the coordinates of a target in the picture, and the prediction results of the model are obtained;
a loss on the training set, namely a prediction error, is calculated by using an improved loss function after the prediction results of a first round of training are obtained;
when fish objects are densely occluded, obtained prediction boxes among different fish objects have a high coincidence degree and a high error value; the neural network is continuously optimized in a following training by using an improved repulsion loss function, so the prediction boxes among different fish objects are far away from each other, and the coincidence degree among the prediction boxes is reduced, and the error value is continuously reduced;
the parameters in the neural network are iteratively updated by using a back propagation algorithm;
the pictures in the verification set are input into the neural network model to extract the features, the prediction results on the verification set are obtained, and an error between the prediction results and real results is calculated, and a prediction accuracy is further calculated;
if a current training is a first round, the model of the current training is saved; if the current training is not the first round, whether the accuracy on the verification set in a current training process is higher than that calculated on the verification set in a last round of training is compared; if the accuracy is high, the model trained in the current training process is saved; otherwise, a next round of training is entered; and
the above process is one round of training, and this process is repeated 300 times according to setting; and
finally, a model testing module is as follows:
1) the pictures in the test set are loaded and the picture size is scaled to 640×640;
2) the model saved in the training process is loaded, and the model has the highest accuracy on the verification set;
3) the pictures in the test set are input into the loaded model, and the prediction results are obtained; and
4) filtered prediction bounding boxes are visualized, and the prediction accuracy and a calculation speed are calculated to test a generalization performance of the model.