US 11,995,048 B2
Lifelong schema matching
Handong Zhao, San Jose, CA (US); Yikun Xian, Edison, NJ (US); Sungchul Kim, San Jose, CA (US); Tak Yeon Lee, Cupertino, CA (US); Nikhil Belsare, Foster City, CA (US); Shashi Kant Rai, Santa Clara, CA (US); Vasanthi Holtcamp, Fremont, CA (US); Thomas Jacobs, Cupertino, CA (US); Duy-Trung T Dinh, Cupertino, CA (US); and Caroline Jiwon Kim, San Francisco, CA (US)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Sep. 29, 2020, as Appl. No. 17/036,453.
Prior Publication US 2022/0100714 A1, Mar. 31, 2022
Int. Cl. G06F 16/00 (2019.01); G06F 16/21 (2019.01); G06F 18/2115 (2023.01); G06F 18/214 (2023.01); G06F 18/2431 (2023.01); G06N 3/08 (2023.01); G06V 30/262 (2022.01)
CPC G06F 16/213 (2019.01) [G06F 18/2115 (2023.01); G06F 18/2148 (2023.01); G06F 18/2431 (2023.01); G06N 3/08 (2013.01); G06V 30/274 (2022.01)] 18 Claims
OG exemplary drawing
 
1. A method for lifelong schema matching, comprising:
training a neural network classifier based on initial training data comprising a plurality of initial classes;
receiving an additional training set comprising a plurality of additional examples corresponding to an additional class;
embedding each of the plurality of additional examples from the additional training set in an embedding space;
computing a metric indicating how well each of the plurality of additional examples represents the additional class based on the embedding;
selecting a subset of the plurality of additional examples based on the metric;
determining a number of examples for each of a plurality of exemplar training sets corresponding to the plurality of initial classes;
selecting a subset of the initial training data corresponding to each of the plurality of initial classes;
creating the plurality of exemplar training sets based on the determined number of examples, wherein each of the plurality of exemplar training sets includes the subset of the initial training data corresponding to each of the plurality of initial classes, respectively;
retraining the neural network classifier based on the subset of the plurality of additional examples and the plurality of exemplar training sets;
receiving data comprising a plurality of information categories;
classifying each information category according to a schema comprising a plurality of classes, wherein the classification is performed by the neural network classifier trained using the plurality of exemplar training sets; and
adding the data to a database based on the classification, wherein the database is organized according to the schema.