CPC G06F 16/538 (2019.01) [G06F 16/532 (2019.01); G06N 3/08 (2013.01); G06T 11/20 (2013.01); G06T 2210/12 (2013.01)] | 20 Claims |
1. A system, comprising a processor to:
receive a randomly generated alpha-map, a pair of training images, and a pair of training texts associated with the pair of training images;
generate a blended image based on the randomly generated alpha-map and the pair of training images; and
train a visual language grounding model to separate the blended image into a pair of heatmaps identifying portions of the blended image corresponding to each of the training images using a separation loss.
|