US 12,346,827 B2
	Generating scene graphs from digital images using external knowledge and image reconstruction
Handong Zhao, San Jose, CA (US); Zhe Lin, Fremont, CA (US); Sheng Li, San Jose, CA (US); Mingyang Ling, San Francisco, CA (US); and Jiuxiang Gu, Singapore (SG)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Jun. 3, 2022, as Appl. No. 17/805,289.
Application 17/805,289 is a continuation of application No. 16/448,473, filed on Jun. 21, 2019, granted, now 11,373,390.
Prior Publication US 2022/0309762 A1, Sep. 29, 2022
Int. Cl. G06N 5/022 (2023.01); G06F 18/243 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2023.01); G06V 10/20 (2022.01); G06V 10/426 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)

CPC G06N 5/022 (2013.01) [G06F 18/24323 (2023.01); G06N 3/045 (2023.01); G06N 3/084 (2013.01); G06V 10/255 (2022.01); G06V 10/426 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)]

20 Claims

1. A computer-implemented method comprising:

determining, by at least one processor, a set of object labels comprising a plurality of bounding boxes that indicate locations of a plurality of entities and, based on annotations of the set of object labels, information indicating object relationships of the plurality of entities;

generating, by the at least one processor utilizing a layout generation model, a scene layout comprising relative positioning information of a plurality of objects at the locations corresponding to bounding boxes of the set of object labels according to the plurality of entities and the information indicating the object relationships of the plurality of entities from the set of object labels; and

generating, by the at least one processor utilizing an image generation neural network, a synthetic digital image comprising the plurality of objects based on the scene layout.

10. A system comprising:

one or more computer memory devices; and

one or more processing devices configured to cause the system to:

determine a set of object labels comprising a plurality of bounding boxes that indicate locations of a plurality of entities, information based on annotations of the set of object labels that indicates object relationships of the plurality of entities, and a set of predicate labels indicating relationships of the plurality of bounding boxes;

generate, utilizing a layout generation model, a semantic scene graph comprising nodes indicating relative positioning of a plurality of objects at the locations corresponding to bounding boxes of the set of object labels according to the plurality of entities and the information indicating the object relationships of the plurality of entities from the set of object labels; and

generate, utilizing an image generation neural network, a synthetic digital image comprising the plurality of objects based on the semantic scene graph.

17. A non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising:

determining a set of object labels indicating a plurality of locations of a plurality of entities and, based on annotations of the set of object labels, relationship information indicating object relationships for the plurality of entities;

generating a semantic scene graph comprising object nodes and relationship nodes indicating relative positioning of a plurality of objects at the plurality of locations according to the plurality of entities and the relationship information indicating the object relationships of the plurality of entities from the set of object labels; and

generating, utilizing an image generation neural network, a synthetic digital image comprising the plurality of objects based on the semantic scene graph.