US 11,836,960 B2
	Object detection device, object detection method, and program
Fumiaki Sato, Takatsuki (JP)
Assigned to Konica Minolta, Inc., Tokyo (JP)
Appl. No. 17/621,469
Filed by Konica Minolta, Inc., Tokyo (JP)
PCT Filed May 22, 2020, PCT No. PCT/JP2020/020344 § 371(c)(1), (2) Date Dec. 21, 2021, PCT Pub. No. WO2021/005898, PCT Pub. Date Jan. 14, 2021.
Claims priority of application No. 2019-129183 (JP), filed on Jul. 11, 2019.
Prior Publication US 2022/0351486 A1, Nov. 3, 2022
Int. Cl. G06V 10/00 (2022.01); G06V 10/26 (2022.01); G06V 10/774 (2022.01); G06V 10/762 (2022.01); G06V 10/82 (2022.01)

CPC G06V 10/273 (2022.01) [G06V 10/763 (2022.01); G06V 10/7747 (2022.01); G06V 10/82 (2022.01)]

16 Claims

1. An object detection device comprising a hardware processor that detects an object from an image including the object by neural computation using a convolutional neural network, wherein

the hardware processor:

extracts a feature amount of the object from the image;

obtains a plurality of object rectangles indicating candidates for a position of the object on the basis of the feature amount and obtains information and a certainty factor of a category of the object for each of the object rectangles; and

calculates, for each of the object rectangles, an object tag indicating which object in the image the object rectangle is linked to, on the basis of the feature amount,

the hardware processor further separates the plurality of object rectangles for which the category of the object is the same into a plurality of groups according to the object tags, and deletes an excess object rectangle in each of the separated groups on the basis of the certainty factor,

wherein the hardware processor updates a calculated tag value of the object tag by either one of a first method or a second method,

in the first method, when the tag values calculated for the respective object rectangles are distributed in a plurality of tag regions separated from each other by a predetermined margin or more, the hardware processor groups the tag values in each of the tag regions, and converts the tag values into a same tag value in the same tag region and into different tag values between different tag regions, and

in the second method, the hardware processor generates a plurality of clusters by changing the number of clusters in a given range by a k-means method with respect to the tag values calculated for the respective object rectangles, then calculates an optimum number of clusters by an elbow method, groups the tag values in each of the calculated number of clusters, and converts the tag values into a same tag value in the same cluster and into different tag values between different clusters.