CPC G06F 16/254 (2019.01) [G06F 16/285 (2019.01); G06N 5/022 (2013.01); G06F 40/279 (2020.01); G06V 30/10 (2022.01); G06V 30/416 (2022.01)] | 20 Claims |
1. A computer implemented method for processing data, the method comprising:
retrieving a first document containing unstructured or semi-structured data or a combination of both;
extracting first data from the unstructured or semi-structured data contained in the first document;
ordering the first data into a first data structure using a set of translation tables; wherein the first data structure has a first format;
retrieving a second document containing unstructured or semi-structured data or a combination of both;
extracting second data from the unstructured or semi-structured data contained in the second document;
wherein the extracting second data from the second document is performed using an extraction table having a predefined format configured to pre-order the second data;
ordering the second data into a second data structure using the set of translation tables; wherein the second data structure has a second format;
wherein the first data and the second data comprise a plurality of entities, and
comparing entities from the first data structure and the second data structure to identify matching entities,
wherein ordering the first data and the second data comprises executing a natural language processing algorithm configured to process the first data and the second data using the set of translation tables;
wherein the natural language processing algorithm is configured to identify primitive entities and to label each primitive entity with an attribute among a set of attributes;
wherein the set of attributes comprises at least one of a technology attribute, an action attribute, a destination attribute, an origination attribute, a location attribute, a country attribute, and a rule attribute.
|