| CPC G06V 30/2253 (2022.01) [G06V 30/414 (2022.01)] | 20 Claims |

|
1. A computer-implemented method, comprising:
receiving input data in a first branch comprising a first Vision-Language Model (VLM), the input data comprising facsimile representation of a paper document, the first VLM adapted to identify a first set of fields in the input data using a visualized first set of bounding boxes (BB);
labeling, by the first VLM, the first set of fields to output a labeled first set of fields;
executing in the first branch a first agentic Artificial Intelligence architecture (AAA-1), the executing causing a localization of an identified field as a desired type of filed using a corresponding visualized identified BB from the first set of BBs;
outputting from the first branch, a localized and labeled identified field;
passing the input data to a second branch comprising a second VLM and a Multimodal Large Language Model (MLLM), the second VLM adapted to identify a second set of fields in the input data using a visualized second set of BBs;
passing the input data with the second set of BBs to the MLLM executing in the second branch, the MLLM outputting a set of recognizing field within BBs of the second set of BBs;
localizing, by a second agentic Artificial Intelligence architecture (AAA-2) executing in the second branch, at least one recognized field as a target field;
labeling the target field to output from the second branch a labeled target field;
combining to form labeled data in a training data set, the input data, the labeled identified field, and the labeled target field.
|