US 12,118,813 B2
Continuous learning for document processing and analysis
Stanislav Semenov, Moscow (RU)
Assigned to ABBYY Development Inc., Dover, DE (US)
Filed by ABBYY Development Inc., Dover, DE (US)
Filed on Nov. 3, 2021, as Appl. No. 17/518,191.
Claims priority of application No. 2021132196 (RU), filed on Nov. 3, 2021.
Prior Publication US 2023/0138491 A1, May 4, 2023
Int. Cl. G06V 30/41 (2022.01); G06F 40/174 (2020.01); G06N 3/048 (2023.01); G06N 3/08 (2023.01); G06V 30/148 (2022.01)
CPC G06V 30/41 (2022.01) [G06F 40/174 (2020.01); G06N 3/048 (2023.01); G06N 3/08 (2013.01); G06V 30/153 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, by a processing device, one or more documents;
performing optical character recognition on the one or more documents to detect words comprising symbols in the one or more documents;
determining an encoding value for each of the symbols;
applying a first hash function to each encoding value to generate a first set of hashed symbol values;
applying a second hash function to each hashed symbol value of the first set of hashed symbol values to generate a vector array comprising a second set of hashed symbol values;
applying a linear transformation to each value of the second set of hashed symbol values of the vector array;
applying an irreversible non-linear activation function to the vector array to obtain abstract values associated with the symbols; and
saving the abstract values to train a neural network to detect fields in an input document.