US 12,405,970 B2
Multi-layer approach to improving generation of field extraction models
Shalin Avlani, San Jose, CA (US); Rajesh M. Desai, San Jose, CA (US); Mayank Vipin Shah, San Jose, CA (US); and Xiaoying Gao, San Jose, CA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Oct. 6, 2023, as Appl. No. 18/482,671.
Prior Publication US 2025/0117405 A1, Apr. 10, 2025
Int. Cl. G06F 16/24 (2019.01); G06F 16/2457 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/24573 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for generating cluster templates used for creating extraction models, comprising:
receiving a plurality of training files associated with a selected class;
performing an automated visual analysis on each of the plurality of training files;
performing an automated contextual analysis on each of the plurality of training files;
performing a first clustering of the plurality of training files into a first plurality of clusters using results from the automated visual analysis;
performing a second clustering of one of the first plurality of clusters into a second plurality of clusters using results from the automated contextual analysis; and
generating cluster templates for the first and second plurality of clusters, wherein
the first and the second plurality of clusters are clusters of the plurality of training files.