US 12,112,537 B2
	Contrastive captioning for image groups
Quan Hung Tran, San Jose, CA (US); Long Thanh Mai, San Jose, CA (US); Zhe Lin, Fremont, WA (US); and Zhuowan Li, Baltimore, MD (US)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Oct. 16, 2023, as Appl. No. 18/487,183.
Application 18/487,183 is a division of application No. 16/998,876, filed on Aug. 20, 2020, granted, now 11,790,650.
Prior Publication US 2024/0037939 A1, Feb. 1, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 20/30 (2022.01); G06F 16/535 (2019.01); G06F 16/55 (2019.01); G06F 18/214 (2023.01); G06F 40/205 (2020.01); G06V 10/75 (2022.01); G06V 10/82 (2022.01)

CPC G06V 20/30 (2022.01) [G06F 16/535 (2019.01); G06F 16/55 (2019.01); G06F 18/214 (2023.01); G06F 40/205 (2020.01); G06V 10/751 (2022.01); G06V 10/82 (2022.01)]

20 Claims

1. A method comprising:

generating a dataset on which to train a model; and

training the model on the dataset;

wherein generating the dataset comprises:

parsing each image caption, in a set of image captions corresponding to a set of training images, into a scene graph;

identifying target groups from within the set of training images, wherein each of the target groups comprises a subset of the set of training images having a shared scene graph;

identifying reference groups from within the set of training images, wherein each of the reference groups corresponds to a different one of the target groups and comprises a different subset of the set of training images having scene graphs that only partially overlap with the shared scene graph of a corresponding one of the target groups; and

generating a group caption for each of the target groups based at least on the shared scene graph for a given target group.