| CPC G06V 40/172 (2022.01) [G06F 21/32 (2013.01); G06V 10/454 (2022.01); G06V 10/7715 (2022.01); G06V 10/806 (2022.01); G06V 10/82 (2022.01); G06V 40/161 (2022.01); G06V 40/162 (2022.01); G06V 40/168 (2022.01)] | 20 Claims |

|
1. A face detection device based on a convolutional neural network, comprising:
a feature extractor assembly, comprising:
a first feature extractor, configured to apply a first set of convolution kernels on an input grayscale image thereby generate a set of basic feature maps;
a second feature extractor, configured to apply a second set of convolution kernels each of which a size is smaller than that of each of the first set of convolution kernels, on the set of basic feature maps and thereby generate more than one set of intermediate feature maps, the more than one set of intermediate feature maps being concatenated thereby forming a concatenated layer; and
a third feature extractor, configured to perform at least one convolution operation on the concatenated layer; and
a detector assembly, comprising at least one detector each of which input is derived from one of the second feature extractor and the third feature extractor;
wherein the third feature extractor comprises at least two convolution operations;
wherein the at least one detector comprises a first detector, a second detector and a third detector, an input of the first detector is derived from the second feature extractor, the first detector is configured to output a first detection result, an input of the second detector is derived from one of the at least two convolution operations of the third feature extractor, the second detector is configured to output a second detection result, an input of the third detector is derived from another of the at least two convolution operations of the third feature extractor and the third detector is configured to output a third detection result; and
wherein in response to an area ratio of a human face to an entire input grayscale image being in a first range, the first detector is determined as the most accurate detector and the detection result thereof is outputted; in response to the area ratio of the human face to the entire input grayscale image being in a second range, the second detector is determined as the most accurate detector and the detection result thereof is outputted; in response to the area ratio of the human face to the entire input grayscale image being in a third range, the third detector is determined as the most accurate detector and the detection result thereof is outputted.
|
|
12. A face detection method based on a convolutional neural network, comprising:
applying a first set of convolution kernels on an input grayscale image thereby generating a set of basic feature maps;
applying a second set of convolution kernels each of which a size is smaller than that of each of the first set of convolution kernels, on the set of basic feature maps and thereby generating more than one set of intermediate feature maps, the more than one set of intermediate feature maps being concatenated thereby forming a concatenated layer;
performing at least two convolution operations on the concatenated layer thereby generating a set of deep feature maps; and
determining, by at least one detector, a bounding box classification and a bounding box regression, based on the concatenated layer or the set of deep feature maps;
wherein the at least one detector comprises a first detector, a second detector and a third detector, an input of the first detector is derived from the concatenated layer, an input of the second detector is derived from one of the at least two convolution operations performed on the concatenated layer, and an input of the third detector is derived from another of the at least two convolution operations performed on the concatenated layer; and
wherein the determining, by at least one detector, a bounding box classification and a bounding box regression, based on the concatenated layer or the set of deep feature maps, comprises:
in response to an area ratio of a human face to an entire input grayscale image being in a first range, determining the first detector as the most accurate detector, and outputting a first detection result from the first feature detector as the bounding box classification and the bounding box regression;
in response to the area ratio of the human face to the entire input grayscale image being in a second range, determining the second detector as the most accurate detector, and outputting a second detection result from the second feature detector as the bounding box classification and the bounding box regression;
in response to the area ratio of the human face to the entire input grayscale image being in a third range, determining the third detector as the most accurate detector, and outputting a third detection result from the third feature detector as the bounding box classification and the bounding box regression.
|
|
14. A face unlock system, comprising:
an IR camera, configured to capture an image;
an image decoding device, configured to decode the captured image to form a grayscale image;
a face detection device, wherein the face detection device is based on a convolutional neural network, and comprises:
a feature extractor assembly, comprising:
a first feature extractor, configured to apply a first set of convolution kernels on an input grayscale image thereby generate a set of basic feature maps;
a second feature extractor, configured to apply a second set of convolution kernels each of which a size is smaller than that of each of the first set of convolution kernels, on the set of basic feature maps and thereby generate more than one set of intermediate feature maps, the more than one set of intermediate feature maps being concatenated thereby forming a concatenated layer; and
a third feature extractor, configured to perform at least one convolution operation on the concatenated layer; and
a detector assembly, comprising at least one detector each of which input is derived from one of the second feature extractor and the third feature extractor, and configured to output a bounding box classification result and a bounding box regression result;
wherein the third feature extractor comprises at least two convolution operations; wherein the at least one detector comprises a first detector, a second detector and a third detector, an input of the first detector is derived from the second feature extractor, the first detector is configured to output a first detection result, an input of the second detector is derived from one of the at least two convolution operations of the third feature extractor, the second detector is configured to output a second detection result, an input of the third detector is derived from another of the at least two convolution operations of the third feature extractor and the third detector is configured to output a third detection result; and
wherein in response to an area ratio of a human face to an entire input grayscale image being in a first range, the first detector is determined as the most accurate detector and the detection result thereof is outputted; in response to the area ratio of the human face to the entire input grayscale image being in a second range, the second detector is determined as the most accurate detector and the detection result thereof is outputted; in response to the area ratio of the human face to the entire input grayscale image being in a third range, the third detector is determined as the most accurate detector and the detection result thereof is outputted; and
a face verification device, configured to determine whether the grayscale image corresponds to an authorized person for unlocking, based on the bounding box classification result and the bounding box regression result.
|