US 12,488,609 B1
	Training-free framework for zero-shot check field detection
Sourav Halder, Chicago, IL (US); Jinjun Tong, Corona, CA (US); and Xinyu Wu, Buffalo Grove, IL (US)
Assigned to U.S. Bank National Association
Filed by U.S. Bank National Association, Minneapolis, MN (US)
Filed on Jul. 16, 2025, as Appl. No. 19/270,987.
Application 19/270,987 is a continuation of application No. 19/269,173, filed on Jul. 15, 2025.
Int. Cl. G06V 30/224 (2022.01); G06V 30/414 (2022.01)

CPC G06V 30/2253 (2022.01) [G06V 30/414 (2022.01)]

20 Claims

1. A computer-implemented method, comprising:

receiving input data in a first branch comprising a first Vision-Language Model (VLM), the input data comprising facsimile representation of a paper document, the first VLM adapted to identify a first set of fields in the input data using a visualized first set of bounding boxes (BB);

labeling, by the first VLM, the first set of fields to output a labeled first set of fields;

executing in the first branch a first agentic Artificial Intelligence architecture (AAA-1), the executing causing a localization of an identified field as a desired type of filed using a corresponding visualized identified BB from the first set of BBs;

outputting from the first branch, a localized and labeled identified field;

passing the input data to a second branch comprising a second VLM and a Multimodal Large Language Model (MLLM), the second VLM adapted to identify a second set of fields in the input data using a visualized second set of BBs;

passing the input data with the second set of BBs to the MLLM executing in the second branch, the MLLM outputting a set of recognizing field within BBs of the second set of BBs;

localizing, by a second agentic Artificial Intelligence architecture (AAA-2) executing in the second branch, at least one recognized field as a target field;

labeling the target field to output from the second branch a labeled target field;

combining to form labeled data in a training data set, the input data, the labeled identified field, and the labeled target field.