| CPC G06F 16/383 (2019.01) [G06F 16/2264 (2019.01); G06F 16/278 (2019.01); G06F 16/316 (2019.01)] | 6 Claims |

|
1. A method comprising:
receiving, at a front-end service of a partition and labeling platform comprising a processor and a memory, an electronic document and corresponding metadata, the metadata comprising a file type;
persisting, by the front-end service and to a document store of the memory of the partition and labeling platform, the electronic document with a unique document identification;
persisting, by the front-end service and to a metadata store of the memory of the partition and labeling platform, the metadata corresponding to the electronic document with the unique document identification;
receiving, by an index service of the partition and labeling platform and from the front-end service using a producer API call through a messaging queue, a message comprising the unique document identification;
retrieving, by a data capture service and from the document store based on receiving the document identification from the index service, the electronic document from the document store using the document identification as a lookup key;
determining, by the data capture service, whether the electronic document comprises text and, based on a determination that the electronic document comprises text, performing optical character recognition on the electronic document;
encoding, by the data capture service, a text of the electronic document identified by the optical character recognition;
sending, by the data capture service via an API call to a unit extraction service, the text of the encoded document as a byte stream;
decoding, by the unit extraction service, the byte stream into a string file;
standardizing, by the unit extraction service, a number of hierarchical partitions parsed from the string file by formatting the number of hierarchical partitions;
determining, by the unit extraction service and based on the number of hierarchical partitions, a value of a first key, wherein the first key indicates a logical partition of the electronic document, wherein the value of the first key includes an index number of the logical partition;
determining, by the unit extraction service and based on the partition separation characters, a value of a third key, wherein the third key indicates a lowest-level logical partition of the number of hierarchical partitions that the unit extraction service is configured to determine, and wherein the value of the third key is text from the lowest-level logical partition;
providing, by a machine learning model of the partition and labeling platform, a tag value to the lowest-level logical partition;
assigning, by the machine learning model, a predicted tag value to the lowest-level logical partition based on a similarity of the lowest-level logical partition to historical data;
assigning, by the index service of the partition and labeling platform, a value of the corresponding metadata in the metadata store as a value of a second key;
indexing, by the index service, the first key, the value of the first key, the second key, and the value of the second key in a search index; and
providing, via a platform interface of the partition and labeling platform, a search function via an interface of the platform, wherein the search function searches the search index.
|