US 12,462,576 B2
Model generation method, model generation apparatus, non-transitory storage medium, mobile object posture estimation method, and mobile object posture estimation apparatus
Aya Hamajima, Sunto-gun (JP); Yusuke Nakano, Nagoya (JP); Katsunori Nakanishi, Nagoya (JP); and Youhei Yamaguchi, Nagoya (JP)
Assigned to TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP); and Kurusugawa Computer Inc., Nagoya (JP)
Filed by TOYOTA JIDOSHA KABUSHIKI KAISHA, Toyota (JP); and Kurusugawa Computer Inc., Nagoya (JP)
Filed on Jul. 22, 2022, as Appl. No. 17/871,216.
Claims priority of application No. 2021-121548 (JP), filed on Jul. 26, 2021.
Prior Publication US 2023/0021591 A1, Jan. 26, 2023
Int. Cl. G06K 9/00 (2022.01); B60W 40/12 (2012.01); G06V 20/58 (2022.01)
CPC G06V 20/58 (2022.01) [B60W 40/12 (2013.01); B60W 2420/403 (2013.01); B60W 2554/4041 (2020.02)] 5 Claims
OG exemplary drawing
 
1. A method comprising:
specifying image coordinates that fall within a two-dimensional image obtained by capturing at least a mobile object and that correspond to at least one point among vertexes of a rectangular shape formed when an outer shape of the mobile object viewed from above is projected on a road, as a key point of the mobile object, and creating the two-dimensional image, to which information on the key point is added, as training data; and
generating a machine learning model that outputs the key point from a two-dimensional image obtained by capturing at least a mobile object, by performing machine learning using the training data, wherein
image coordinates of two or more points among the vertexes of the rectangular shape are specified as key points of the mobile object,
the key points in the two-dimensional image are specified in predetermined order, and
at least one of the key points is in an estimated position,
wherein the machine learning model comprises a neural network structure including:
a Base Net configured to extract features from the two-dimensional image,
a Spatial Net configured to create a multiresolution feature map by performing multiresolution analysis on the extracted features, and
a discriminator configured to output the key points based on the multiresolution feature map,
inferring the key points of a mobile object from a two-dimensional image obtained by capturing at least the mobile object by using the machine learning model; and
performing an operation over the key points to estimate a size, a posture, a location, and a traveling direction of the mobile object based on the key points and information on the order in which the key points are specified, wherein
the size of the mobile object is estimated by obtaining a transformation matrix through calibration between physical coordinates of a landmark point on a road and image coordinates and converting the image coordinates of the key points to the physical coordinates using the transformation matrix,
the location of the mobile object is estimated based on image coordinates of key points and location information of an image capturing range of an image acquisition device, and
the traveling direction of the mobile object is estimated from a time change in a direction of the mobile object and the estimated location of the mobile object.