US 12,443,745 B2
	Information processing apparatus and information processing method
Chie Yamada, Tokyo (JP); Noriko Totsuka, Tokyo (JP); Hiroaki Ogawa, Tokyo (JP); Yasuharu Asano, Tokyo (JP); Akira Takahashi, Tokyo (JP); Chika Myoga, Tokyo (JP); Masanobu Nakamura, Tokyo (JP); Kana Nishikawa, Tokyo (JP); Masahiro Yamamoto, Tokyo (JP); and Michael Hentschel, Tokyo (JP)
Assigned to SONY GROUP CORPORATION, Tokyo (JP)
Appl. No. 17/922,725
Filed by SONY GROUP CORPORATION, Tokyo (JP)
PCT Filed Mar. 25, 2021, PCT No. PCT/JP2021/012640 § 371(c)(1), (2) Date Nov. 1, 2022, PCT Pub. No. WO2021/235086, PCT Pub. Date Nov. 25, 2021.
Claims priority of application No. 2020-087001 (JP), filed on May 18, 2020.
Prior Publication US 2023/0252183 A1, Aug. 10, 2023
Int. Cl. G06F 21/62 (2013.01); G06F 40/20 (2020.01); G06T 11/00 (2006.01); G06V 10/25 (2022.01); G06V 10/74 (2022.01); G06V 10/75 (2022.01); G06V 10/774 (2022.01); G06V 40/16 (2022.01)

CPC G06F 21/6245 (2013.01) [G06F 40/20 (2020.01); G06T 11/00 (2013.01); G06V 10/25 (2022.01); G06V 10/759 (2022.01); G06V 10/761 (2022.01); G06V 10/774 (2022.01); G06V 40/172 (2022.01); G06V 2201/10 (2022.01)]

15 Claims

1. An information processing apparatus, comprising:

at least one processor configured to:

estimate a plurality of candidate regions of object detection from a first image;

estimate a topic of the first image based on text information, wherein the text information accompanies the first image;

evaluate the plurality of candidate regions based on relationships with the topic;

determine, based on a first object detector that relates to the topic, a candidate region of the plurality of candidate regions as a region of interest, and candidate regions of the plurality of candidate regions, other than the candidate region, as regions of non-interest, wherein

the candidate region has a specific relationship with the topic, and

the candidate region is a region in which an object that relates to the topic is detected;

detect, in a case where the topic is an object name, objects corresponding to the topic from the plurality of candidate regions based on a second object detector that relates to the object name;

collect, in a case where the first object detector is not prepared in advance, an image group having tag information that relates to the topic;

detect the objects based on a third object detector, wherein a learning of the third object detector is based on the image group; and

generate a second image based on the detection of the objects.