US 12,265,516 B2
Automated document processing for detecting, extracting, and analyzing tables and tabular data
Stephen M. Thompson, Bonsall, CA (US); Iurii Vymenets, St. Petersburg (RU); Donghan Lee, Anaheim, CA (US); and Markus Georg Lust, Freiburg (DE)
Assigned to TUNGSTEN AUTOMATION CORPORATION, Irvine, CA (US)
Filed by TUNGSTEN AUTOMATION CORPORATION, Irvine, CA (US)
Filed on Dec. 13, 2022, as Appl. No. 18/080,627.
Application 18/080,627 is a continuation of application No. 17/571,327, filed on Jan. 7, 2022, granted, now 11,977,533.
Claims priority of provisional application 63/170,268, filed on Apr. 2, 2021.
Prior Publication US 2023/0237040 A1, Jul. 27, 2023
Int. Cl. G06F 16/22 (2019.01); G06V 10/70 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01); G06V 30/414 (2022.01)
CPC G06F 16/2282 (2019.01) [G06V 10/70 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01); G06V 30/414 (2022.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method for detecting and classifying columns of tables and/or tabular data arrangements within image data, the method comprising:
detecting one or more tables and/or one or more tabular data arrangements within the image data;
extracting the one or more tables and/or the one or more tabular data arrangements from the image data; and
classifying either:
a plurality of columns of the one or more extracted tables;
a plurality of columns of the one or more extracted tabular data arrangements; or
both the columns of the one or more extracted tables and the columns of the one or more extracted tabular data arrangements; and
wherein the classifying comprises:
performing a pairwise comparison of one or more feature vectors corresponding to columns of tables and/or tabular data arrangements in a training dataset to one or more feature vectors corresponding to columns of tables and/or tabular data arrangements in a test dataset; and
generating a pairwise similarity score for each pair of compared feature vectors; and
wherein each pairwise similarity score is a function of a number of common OCR elements between the corresponding pair of compared feature vectors.