US 12,361,736 B2
Multi-stage machine learning model training for key-value extraction
Yazhe Hu, Bellevue, WA (US); Jeaff Wang, Sammamish, WA (US); Mengqing Guo, Redmond, WA (US); Tao Sheng, Bellevue, WA (US); and Jun Qian, Bellevue, WA (US)
Filed by Oracle International Corporation, Redwood Shores, CA (US)
Filed on Jan. 4, 2023, as Appl. No. 18/149,795.
Prior Publication US 2024/0221407 A1, Jul. 4, 2024
Int. Cl. G06N 3/08 (2023.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06V 30/14 (2022.01); G06V 30/19 (2022.01)
CPC G06V 30/19147 (2022.01) [G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06N 3/08 (2013.01); G06V 30/1448 (2022.01)] 20 Claims
OG exemplary drawing
 
20. A system comprising:
one or more processors; and
memory storing instructions that, when executed by the one or more processors, cause the system to perform operations comprising:
executing a first training stage for training a machine learning model to generate first vectors encoding semantic information, and at least one of positional and visual information, of first textual content within documents at least by:
accessing a first plurality of training documents associated with a plurality of document categories, the first plurality of training documents including first textual information, first positional information, and first visual information;
training the machine learning model using the first plurality of training documents associated with the plurality of document categories to generate the first vectors encoding the semantic information, and the at least one of the positional and the visual information based on the first textual information, and at least one of the first positional information and the first visual information;
wherein the first training stage generates a first set of parameters for application of the machine learning model;
executing a second training stage for customizing the machine learning model to (a) generate second vectors to encode semantic information, and at least one of positional information and visual information of second textual content within documents of a particular document category and (b) extract key-value pairs at least by:
accessing a second plurality of training documents associated with the particular document category, the second plurality of training documents being tagged with key-value pairs and including second textual information, second positional information, and second visual information;
training the machine learning model using the second plurality of training documents associated with the particular document category to identify key-value pairs based at least in part on vector encodings of the second textual content that encode semantic relationships, and at least one of positional and visual relationships, between components of the key-value pairs,
wherein training the machine learning model using the second plurality of training documents results in a trained machine learning model, and
wherein the second training stage fine-tunes the first set of parameters to generate a second set of parameters for application of the machine learning model; and
applying the trained machine learning model with the second set of parameters to a first document of the particular document category to extract a first plurality of key-value pairs from the first document.