US 11,887,395 B2
Automatic selection of templates for extraction of data from electronic documents
Hanieh Borhanazad, Sydney (AU); Jimmy Chandra, Sydney (AU); Jey Jeyaramanan, Sydney (AU); Thuwaragan Sundaramoorthy, Sydney (AU); and Mark Burch, Warrawee (AU)
Assigned to Coupa Software Incorporated, San Mateo, CA (US)
Filed by Coupa Software Incorporated, San Mateo, CA (US)
Filed on Mar. 29, 2023, as Appl. No. 18/192,170.
Application 18/192,170 is a continuation of application No. 16/953,784, filed on Nov. 20, 2020, granted, now 11,663,843.
Claims priority of provisional application 63/057,146, filed on Jul. 27, 2020.
Prior Publication US 2023/0237829 A1, Jul. 27, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 30/418 (2022.01); G06F 40/186 (2020.01); G06V 30/412 (2022.01)
CPC G06V 30/418 (2022.01) [G06F 40/186 (2020.01); G06V 30/412 (2022.01)] 22 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
programmatically receiving an input electronic document and a first set of candidate templates;
identifying a plurality of attributes from a top-third portion of the input electronic document;
for each particular candidate template from among candidate templates in the first set of candidate templates, calculating a template similarity ratio value that represents a similarity of the particular candidate template to the input electronic document, resulting in digitally storing a plurality of template similarity ratios;
programmatically ranking the candidate templates according to the template similarity ratios;
matching the candidate templates to the input electronic document, resulting in generating a normalized similarity score for each particular candidate template from among the candidate templates;
determining differences in normalized similarity scores of successive pairs of the candidate templates;
when a particular difference in the normalized similarity scores of a particular pair of the candidate templates exceeds a specified threshold value, establishing a breaking point of the candidate templates at the particular pair;
forming a second set of candidate templates by selecting, from among the candidate templates, only those candidate templates that are ranked above the breaking point;
extracting data from the input electronic document using the second set of candidate templates.