US 12,406,461 B2
	Image processing apparatus
Shoma Sakamoto, Tokyo (JP)
Assigned to SUBARU CORPORATION, Tokyo (JP)
Filed by SUBARU CORPORATION, Tokyo (JP)
Filed on Sep. 2, 2021, as Appl. No. 17/465,017.
Claims priority of application No. 2020-149643 (JP), filed on Sep. 7, 2020.
Prior Publication US 2022/0076045 A1, Mar. 10, 2022
Int. Cl. G06V 20/56 (2022.01); B60R 1/31 (2022.01); G06T 7/593 (2017.01); G06V 10/44 (2022.01)

CPC G06V 10/44 (2022.01) [B60R 1/31 (2022.01); G06T 7/593 (2017.01); G06V 20/56 (2022.01); G06T 2207/30252 (2013.01)]

10 Claims

1. An image processing apparatus comprising:

a first extractor configured to extract a first feature quantity of a captured image by inputting the captured image into a first neural network, the first neural network including a first plurality of convolutional layers and a first plurality of pooling layers configured to perform a plurality of convolutional and pooling operations to generate the first feature quantity, the captured image being one of a left image and a right image captured by a stereo camera mounted on a vehicle, the first feature quantity including image-wide features derived from entirety of captured image;

a first object identifier configured to identify a first object in the captured image on a basis of the first feature quantity;

a distance image generator configured to generate a distance image on a basis of the left image and the right image;

a region defining unit configured to define an image region in the captured image, wherein the image region corresponds to only a part of the captured image, and the image region is defined based on the distance image;

a second extractor configured to separately perform image processing on an image in the image region to extract a second feature quantity by inputting the image into a second neural network, the second neural network including a second plurality of convolutional layers and a second plurality of pooling layers configured to perform a plurality of convolutional and pooling operations to generate the second feature quantity, the second feature quantity being a region-specific local feature derived directly from the image region independently of the first feature quantity;

a selector configured to select, on a basis of data related to the image region defined by the region defining unit, a part of the first feature quantity, wherein the selection is based on at least one of (i) a location of the image region in the captured image and (ii) parallax values in the image region in the distance image, and the selected part represents contextual feature information relevant to the location or the parallax of the image region; and

a second object identifier configured to identify a second object in the image region on a basis of both (1) the second feature quantity of the image in the image region and (2) the selected part of the first feature quantity selected by the selector, wherein the second object identifier performs recognition using a combination of the second feature quantity and the selected part of the first feature quantity, to enhance object identification accuracy in the image region.