US 11,954,144 B2
	Training visual language grounding models using separation loss
Assaf Arbelle, Lehvot Haviva (IL); Leonid Karlinsky, Mazkeret Batya (IL); Sivan Doveh, Ramat Gan (IL); Joseph Shtok, Binyamina (IL); and Amit Alfassy, Haifa (IL)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Aug. 26, 2021, as Appl. No. 17/412,528.
Prior Publication US 2023/0061647 A1, Mar. 2, 2023
Int. Cl. G06N 3/08 (2023.01); G06F 16/532 (2019.01); G06F 16/538 (2019.01); G06T 11/20 (2006.01)

CPC G06F 16/538 (2019.01) [G06F 16/532 (2019.01); G06N 3/08 (2013.01); G06T 11/20 (2013.01); G06T 2210/12 (2013.01)]

20 Claims

1. A system, comprising a processor to:

receive a randomly generated alpha-map, a pair of training images, and a pair of training texts associated with the pair of training images;

generate a blended image based on the randomly generated alpha-map and the pair of training images; and

train a visual language grounding model to separate the blended image into a pair of heatmaps identifying portions of the blended image corresponding to each of the training images using a separation loss.