US 11,922,667 B2
	Object region identification device, object region identification method, and object region identification program
Yeongnam Chae, Tokyo (JP); Mijung Kim, Tokyo (JP); and Preetham Prakasha, Tokyo (JP)
Assigned to Rakuten Group, Inc., Tokyo (JP)
Appl. No. 17/269,570
Filed by Rakuten Group, Inc., Tokyo (JP)
PCT Filed Apr. 28, 2020, PCT No. PCT/JP2020/018114 § 371(c)(1), (2) Date Feb. 19, 2021, PCT Pub. No. WO2021/220398, PCT Pub. Date Nov. 4, 2021.
Prior Publication US 2022/0122340 A1, Apr. 21, 2022
Int. Cl. G06V 10/22 (2022.01); G06T 7/269 (2017.01); G06T 7/70 (2017.01); G06V 10/40 (2022.01)

CPC G06V 10/22 (2022.01) [G06T 7/269 (2017.01); G06T 7/70 (2017.01); G06V 10/40 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20081 (2013.01)]

17 Claims

1. An object region identification device comprising:

at least one memory configured to store computer program code,

at least one processor configured to access the memory and operate as instructed by the computer program code, the computer program code including;

frame image acquisition code configured to cause at least one of the at least one processor to acquire a first frame image and a second frame image that are temporally successive;

position information acquisition code configured to cause at least one of the at least one processor to input the first frame image to a model configured to identify an object in the first frame image and acquire position information indicating a position in the first frame image, the position affecting identification of the object in the first frame image, wherein the identifying an object includes generating one or more feature maps of the first frame image and a value indicating existence of each of one or more classes in the first frame image, and wherein the acquiring position information includes calculating a gradient of a final layer of the model based on at least one class of the one or more classes, calculating a weight of each of the one or more feature maps based on the calculated gradient, and generating the position information based on at least the one or more feature maps;

motion information acquisition code configured to cause at least one of the at least one processor to acquire motion information indicating a motion of the object in the first frame image based on the first frame image and the second frame image;

region information generation code configured to cause at least one of the at least one processor to generate, based on the acquired position information and motion information, region information indicating a region in the second frame image, the region corresponding to a position of the object; and

processing code configured to cause at least one of the at least one processor to process the region in the second frame image indicated by the generated region information by using a predetermined image processing to output a result of the processing.