US 12,288,391 B2
Image grounding with modularized graph attentive networks
Zhenfang Chen, Cambridge, MA (US); Chuang Gan, Cambridge, MA (US); Bo Wu, Cambridge, MA (US); and Pin-Yu Chen, White Plains, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 13, 2022, as Appl. No. 17/743,661.
Prior Publication US 2023/0368510 A1, Nov. 16, 2023
Int. Cl. G06V 30/262 (2022.01); G06V 10/422 (2022.01); G06V 10/82 (2022.01)
CPC G06V 10/82 (2022.01) [G06V 10/422 (2022.01); G06V 30/262 (2022.01)] 17 Claims
OG exemplary drawing
 
1. A system, said system comprising:
a memory; and
a processor in communication with said memory, said processor being configured to perform operations, said operations comprising:
receiving an input;
extracting features from said input, wherein extracting features from said input comprises:
using an attention network to extract textual features for a plurality of specific modules, the plurality of specific modules including at least a subject module, a location module, and a relation module, wherein the attention network parses different components of the input for each specific module, including parsing a subject component for the subject module, a location component for the location module, and a relation component for the relation module;
mining object relations using said features;
determining feature vectors using said object relations; and
generating, using said feature vectors, an output indicating a target region, wherein said target region corresponds to said input.