CPC G06V 10/426 (2022.01) [G06F 18/217 (2023.01); G06F 18/22 (2023.01); G06F 40/205 (2020.01); G06N 3/02 (2013.01)] | 20 Claims |
1. A computer-implemented method comprising:
generating, by at least one processor, label graph embeddings from connected entity labels in an ungrounded label graph corresponding to a digital image, the ungrounded label graph comprising a connected set of nodes corresponding to entities in the digital image without positional information associated with the entities;
generating, by the at least one processor, visual graph embeddings from entity bounding regions in a visual graph corresponding to the digital image, the visual graph comprising a set of nodes corresponding to positions of the entities in the digital image;
determining, by the at least one processor, similarity metrics between the label graph embeddings and the visual graph embeddings; and
generating, utilizing a first-order graph matching algorithm, a semantic scene graph comprising entity nodes connected via a plurality of relationship edges based on the similarity metrics between the label graph embeddings and the visual graph embeddings.
|