US 12,001,446 B2
System and method for extracting data from invoices and contracts
Richard Martin, London (GB)
Assigned to THINKING MACHINE SYSTEMS LTD., London (GB)
Filed by Thinking Machine Systems Ltd., London (GB)
Filed on Apr. 12, 2022, as Appl. No. 17/719,196.
Prior Publication US 2023/0325401 A1, Oct. 12, 2023
Int. Cl. G06F 16/00 (2019.01); G06F 16/25 (2019.01); G06F 16/28 (2019.01); G06N 5/022 (2023.01); G06F 40/279 (2020.01); G06V 30/10 (2022.01); G06V 30/416 (2022.01)
CPC G06F 16/254 (2019.01) [G06F 16/285 (2019.01); G06N 5/022 (2013.01); G06F 40/279 (2020.01); G06V 30/10 (2022.01); G06V 30/416 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer implemented method for processing data, the method comprising:
retrieving a first document containing unstructured or semi-structured data or a combination of both;
extracting first data from the unstructured or semi-structured data contained in the first document;
ordering the first data into a first data structure using a set of translation tables; wherein the first data structure has a first format;
retrieving a second document containing unstructured or semi-structured data or a combination of both;
extracting second data from the unstructured or semi-structured data contained in the second document;
wherein the extracting second data from the second document is performed using an extraction table having a predefined format configured to pre-order the second data;
ordering the second data into a second data structure using the set of translation tables; wherein the second data structure has a second format;
wherein the first data and the second data comprise a plurality of entities, and
comparing entities from the first data structure and the second data structure to identify matching entities,
wherein ordering the first data and the second data comprises executing a natural language processing algorithm configured to process the first data and the second data using the set of translation tables;
wherein the natural language processing algorithm is configured to identify primitive entities and to label each primitive entity with an attribute among a set of attributes;
wherein the set of attributes comprises at least one of a technology attribute, an action attribute, a destination attribute, an origination attribute, a location attribute, a country attribute, and a rule attribute.