US 12,321,364 B1
System and method for automatic creation of structured data objects from unstructured data
Thomas Meyer Funk, Albany, CA (US); Thomas Corbin Madsen, San Francisco, CA (US); and David Min Kang, San Francisco, CA (US)
Assigned to Keeper Tax Inc., San Francisco, CA (US)
Filed by Keeper Tax Inc., San Francisco, CA (US)
Filed on Sep. 27, 2023, as Appl. No. 18/373,676.
Int. Cl. G06F 16/25 (2019.01); G06F 16/248 (2019.01)
CPC G06F 16/258 (2019.01) 8 Claims
OG exemplary drawing
 
1. A computer implemented method comprising the steps of:
receiving, by a processor of a hardware computing device, a set of unstructured electronic data;
parsing the unstructured electronic data to identify a plurality of substrings;
determining that each of the plurality of substrings match an entry in one or more stored data sets; and
assigning, by a processor of a hardware computing device, a label to each of the plurality of substrings in response to determining that each of the plurality of substrings matches the entry, wherein assigning the labels generates a plurality of unstructured labeled substrings, wherein each of the plurality of unstructured labeled substrings include a label portion that stores a particular label and a substring portion that stores a particular substring;
determining, by a processor of a hardware computing device, if a first unstructured labeled substring and a second unstructured labeled substring, of the plurality of unstructured labeled substrings, include particular substring portions that store overlapping substrings;
selecting, in response to determining that the first unstructured labeled substring and the second unstructured labeled substring include the particular substring portions that store the overlapping substrings, a particular unstructured labeled substring of the first unstructured labeled substring and the second unstructured labeled substring based on a comparison of particular labels stored in the label portions of the first unstructured labeled substring and the second unstructured labeled substring; and
generating, by the processor of the hardware computing device, a structured electronic data object for the unstructured electronic data using the particular unstructured labeled substring instead of a non-selected unstructured labeled substring of the first unstructured labeled substring and the second unstructured labeled substring, wherein the generating comprises:
(1) using a particular label of the particular unstructured labeled substring to determine a first field name for the structured electronic data object, wherein the particular substring of the particular unstructured labeled substring is stored in an entry of the structured electronic data object that corresponds to first field name, and
(2) using at least one different label of a selected unstructured labeled substring, of the plurality of unstructured labeled substrings that does not include the particular substring portions that store overlapping substrings, to determine a second field name for the structured electronic data object, wherein the selected substring of the selected unstructured labeled substring is stored in a second entry of the structured electronic data object that corresponds to the second field name; and
updating an electronic database, including the first field name and the second field name, with the generated structured electronic data object.