US 12,443,745 B2
Information processing apparatus and information processing method
Chie Yamada, Tokyo (JP); Noriko Totsuka, Tokyo (JP); Hiroaki Ogawa, Tokyo (JP); Yasuharu Asano, Tokyo (JP); Akira Takahashi, Tokyo (JP); Chika Myoga, Tokyo (JP); Masanobu Nakamura, Tokyo (JP); Kana Nishikawa, Tokyo (JP); Masahiro Yamamoto, Tokyo (JP); and Michael Hentschel, Tokyo (JP)
Assigned to SONY GROUP CORPORATION, Tokyo (JP)
Appl. No. 17/922,725
Filed by SONY GROUP CORPORATION, Tokyo (JP)
PCT Filed Mar. 25, 2021, PCT No. PCT/JP2021/012640
§ 371(c)(1), (2) Date Nov. 1, 2022,
PCT Pub. No. WO2021/235086, PCT Pub. Date Nov. 25, 2021.
Claims priority of application No. 2020-087001 (JP), filed on May 18, 2020.
Prior Publication US 2023/0252183 A1, Aug. 10, 2023
Int. Cl. G06F 21/62 (2013.01); G06F 40/20 (2020.01); G06T 11/00 (2006.01); G06V 10/25 (2022.01); G06V 10/74 (2022.01); G06V 10/75 (2022.01); G06V 10/774 (2022.01); G06V 40/16 (2022.01)
CPC G06F 21/6245 (2013.01) [G06F 40/20 (2020.01); G06T 11/00 (2013.01); G06V 10/25 (2022.01); G06V 10/759 (2022.01); G06V 10/761 (2022.01); G06V 10/774 (2022.01); G06V 40/172 (2022.01); G06V 2201/10 (2022.01)] 15 Claims
OG exemplary drawing
 
1. An information processing apparatus, comprising:
at least one processor configured to:
estimate a plurality of candidate regions of object detection from a first image;
estimate a topic of the first image based on text information, wherein the text information accompanies the first image;
evaluate the plurality of candidate regions based on relationships with the topic;
determine, based on a first object detector that relates to the topic, a candidate region of the plurality of candidate regions as a region of interest, and candidate regions of the plurality of candidate regions, other than the candidate region, as regions of non-interest, wherein
the candidate region has a specific relationship with the topic, and
the candidate region is a region in which an object that relates to the topic is detected;
detect, in a case where the topic is an object name, objects corresponding to the topic from the plurality of candidate regions based on a second object detector that relates to the object name;
collect, in a case where the first object detector is not prepared in advance, an image group having tag information that relates to the topic;
detect the objects based on a third object detector, wherein a learning of the third object detector is based on the image group; and
generate a second image based on the detection of the objects.