CPC G06V 20/30 (2022.01) [G06F 16/535 (2019.01); G06F 16/55 (2019.01); G06F 18/214 (2023.01); G06F 40/205 (2020.01); G06V 10/751 (2022.01); G06V 10/82 (2022.01)] | 20 Claims |
1. A method comprising:
generating a dataset on which to train a model; and
training the model on the dataset;
wherein generating the dataset comprises:
parsing each image caption, in a set of image captions corresponding to a set of training images, into a scene graph;
identifying target groups from within the set of training images, wherein each of the target groups comprises a subset of the set of training images having a shared scene graph;
identifying reference groups from within the set of training images, wherein each of the reference groups corresponds to a different one of the target groups and comprises a different subset of the set of training images having scene graphs that only partially overlap with the shared scene graph of a corresponding one of the target groups; and
generating a group caption for each of the target groups based at least on the shared scene graph for a given target group.
|