CPC G06F 16/328 (2019.01) [G06F 40/279 (2020.01); G06V 30/19 (2022.01)] | 21 Claims |
13. A system to automatically index multiple informational fields in a digital data record, the system comprising:
a plurality of trained NLP ML models, wherein each ML model of the plurality of trained NLP ML models is trained on a specific target information field of a plurality of target information fields;
a database to store indexed information;
a memory comprising instructions, coupled to a processor, wherein the instructions are executable by a processor to:
identify, based on rules defining the target information fields, for each target field of the target information fields, at least one page in the digital data record comprising content related to the target field;
extract, for each target field, from the at least one page, at least one portion of text comprising the content;
feed, for each target field, a pre-processed version of the at least one portion of text into an ML model of the plurality of trained NLP ML models, wherein the ML model is trained on the target field;
determine, for each target field, via the ML model trained on the target field, at least one candidate text comprising the content;
extract, for each target field, via the ML model trained on the target field, the at least one candidate text; and
generate for each target field, via the ML model trained on the target field, an index of the at least one candidate text.
|