US 12,223,693 B2
	Object detection method, object detection apparatus, and object detection system
Chunshan Zu, Beijing (CN)
Assigned to BOE TECHNOLOGY GROUP CO., LTD., Beijing (CN)
Appl. No. 17/772,923
Filed by BOE TECHNOLOGY GROUP CO., LTD., Beijing (CN)
PCT Filed Apr. 1, 2021, PCT No. PCT/CN2021/084995 § 371(c)(1), (2) Date Apr. 28, 2022, PCT Pub. No. WO2022/205329, PCT Pub. Date Oct. 6, 2022.
Prior Publication US 2024/0161461 A1, May 16, 2024
Int. Cl. G06K 9/00 (2022.01); G06T 7/73 (2017.01); G06V 10/77 (2022.01); G06V 10/774 (2022.01); G06V 10/80 (2022.01); G06V 10/82 (2022.01)

CPC G06V 10/774 (2022.01) [G06T 7/73 (2017.01); G06V 10/7715 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06V 2201/07 (2022.01)]

17 Claims

1. An object detection method, comprising:

obtaining a video to be detected;

preprocessing the video to be detected to obtain an image to be detected;

inputting the image to be detected into an object detection network;

extracting, by the object detection network, a feature map from the image to be detected;

performing, by the object detection network, an object prediction on the extracted feature map, so as to obtain a position of an object in the image to be detected and a confidence degree corresponding to the position, wherein the object detection network includes multi-layer cascade networks, a feature map extracted by a cascade network in each layer is obtained according to a first feature map and a second feature map, the first feature map is obtained by performing a convolution on a feature map extracted by a previous-layer cascade network, and the second feature map is obtained by performing a linear transformation on the first feature map; and

generating a marked object video according to the position of the object in the image to be detected, the confidence degree corresponding to the position, and the video to be detected;

wherein the object detection network includes a feature extraction network, a feature fusion network and a prediction network; the feature extraction network includes the multi-layer cascade networks; and

inputting the image to be detected into the object detection network, extracting, by the object detection network, the feature map from the image to be detected, and performing, by the object detection network, the object prediction on the extracted feature map, so as to obtain the position of the object in the image to be detected and the confidence degree corresponding to the position, includes:

inputting the image to be detected into the feature extraction network;

performing, by the cascade network in each layer, the convolution on the feature map extracted by the previous-layer cascade network to obtain the first feature map;

performing, by the cascade network in each layer, the linear transformation on the first feature map to obtain the second feature map;

obtaining, by the cascade network in each layer, the feature map extracted by the cascade network in each layer according to the first feature map and the second feature map;

inputting a plurality of feature maps extracted by the multi-layer cascade networks into the feature fusion network to fuse the feature maps, so as to obtain fused feature maps; and

inputting the fused feature maps into the prediction network for object prediction, so as to obtain the position of the object in the image to be detected and the confidence degree corresponding to the position.