| CPC G06V 10/774 (2022.01) [G06T 7/73 (2017.01); G06V 10/7715 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06V 2201/07 (2022.01)] | 17 Claims |

|
1. An object detection method, comprising:
obtaining a video to be detected;
preprocessing the video to be detected to obtain an image to be detected;
inputting the image to be detected into an object detection network;
extracting, by the object detection network, a feature map from the image to be detected;
performing, by the object detection network, an object prediction on the extracted feature map, so as to obtain a position of an object in the image to be detected and a confidence degree corresponding to the position, wherein the object detection network includes multi-layer cascade networks, a feature map extracted by a cascade network in each layer is obtained according to a first feature map and a second feature map, the first feature map is obtained by performing a convolution on a feature map extracted by a previous-layer cascade network, and the second feature map is obtained by performing a linear transformation on the first feature map; and
generating a marked object video according to the position of the object in the image to be detected, the confidence degree corresponding to the position, and the video to be detected;
wherein the object detection network includes a feature extraction network, a feature fusion network and a prediction network; the feature extraction network includes the multi-layer cascade networks; and
inputting the image to be detected into the object detection network, extracting, by the object detection network, the feature map from the image to be detected, and performing, by the object detection network, the object prediction on the extracted feature map, so as to obtain the position of the object in the image to be detected and the confidence degree corresponding to the position, includes:
inputting the image to be detected into the feature extraction network;
performing, by the cascade network in each layer, the convolution on the feature map extracted by the previous-layer cascade network to obtain the first feature map;
performing, by the cascade network in each layer, the linear transformation on the first feature map to obtain the second feature map;
obtaining, by the cascade network in each layer, the feature map extracted by the cascade network in each layer according to the first feature map and the second feature map;
inputting a plurality of feature maps extracted by the multi-layer cascade networks into the feature fusion network to fuse the feature maps, so as to obtain fused feature maps; and
inputting the fused feature maps into the prediction network for object prediction, so as to obtain the position of the object in the image to be detected and the confidence degree corresponding to the position.
|