US 12,217,191 B2
Learning unpaired multimodal feature matching for semi-supervised learning
Subhajit Chaudhury, Kawasaki (JP); Daiki Kimura, Tokyo (JP); Gakuto Kurata, Tokyo (JP); and Ryuki Tachibana, Setagaya-ku (JP)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Dec. 2, 2020, as Appl. No. 17/109,550.
Prior Publication US 2022/0172080 A1, Jun. 2, 2022
Int. Cl. G06N 5/04 (2023.01); G06N 20/00 (2019.01)
CPC G06N 5/04 (2013.01) [G06N 20/00 (2019.01)] 25 Claims
OG exemplary drawing
 
1. A computer-implemented method for learning multimodal feature matching comprising:
training an image encoder with a triplet loss that pushes similar images together and dissimilar images apart to obtain encoded images;
training a common classifier on the encoded images by using labeled images to learn text embeddings with corresponding labels; and
training a text encoder while keeping the common classifier in a fixed configuration by using learned text embeddings and corresponding labels for the learned text embeddings, wherein the text encoder is further trained to match a distance of predicted text embeddings which is encoded by the text encoder to a fitted Gaussian distribution on the encoded images.