US 11,960,843 B2
	Multi-module and multi-task machine learning system based on an ensemble of datasets
Zhe Lin, Fremont, CA (US); Trung Huu Bui, San Jose, CA (US); Scott Cohen, Sunnyvale, CA (US); Mingyang Ling, Sunnyvale, CA (US); and Chenyun Wu, Amherst, MA (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on May 2, 2019, as Appl. No. 16/401,548.
Prior Publication US 2020/0349464 A1, Nov. 5, 2020
Int. Cl. G06N 20/00 (2019.01); G06F 40/30 (2020.01); G06V 10/25 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06F 18/21 (2023.01); G06F 40/205 (2020.01)

CPC G06F 40/30 (2020.01) [G06N 20/00 (2019.01); G06V 10/25 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06F 18/217 (2023.01); G06F 40/205 (2020.01)]

17 Claims

1. A method of training a machine learning model, comprising:

generating a neural network model to perform a referring expression task based on a plurality of training datasets, wherein each training dataset includes data of a different data format, wherein the neural network model includes an image classification sub-module configured to perform an image classification task and a phrase parsing sub-module configured to perform a phrase parsing task, wherein the image classification task and phrase parsing task are combined to obtain the referring expression task of the neural network model;

receiving the plurality of training datasets, wherein the plurality of training datasets comprises a first training dataset comprising images and a second training dataset comprising natural language phrases, wherein the first training dataset is an image classification dataset, and wherein the second training dataset is a natural language dataset;

determining a data format associated with the first training dataset is an image format;

identifying, using a lookup table, the image classification sub-module as being associated with the image format;

training the image classification sub-module using the images of the image classification dataset, wherein the image classification sub-module is trained to perform the image classification task;

determining a data format associated with the second training dataset is a text format identifying, using the lookup table, the phrase parsing sub-module as being associated with the text format; and

training the phrase parsing sub-module using the natural language phrases of the natural language dataset, wherein the phrase parsing sub-module is trained to perform the phrase parsing task.