CPC G06F 16/2282 (2019.01) [G06V 10/70 (2022.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01); G06V 30/414 (2022.01)] | 18 Claims |
1. A computer-implemented method for detecting and classifying columns of tables and/or tabular data arrangements within image data, the method comprising:
detecting one or more tables and/or one or more tabular data arrangements within the image data;
extracting the one or more tables and/or the one or more tabular data arrangements from the image data; and
classifying either:
a plurality of columns of the one or more extracted tables;
a plurality of columns of the one or more extracted tabular data arrangements; or
both the columns of the one or more extracted tables and the columns of the one or more extracted tabular data arrangements; and
wherein the classifying comprises:
performing a pairwise comparison of one or more feature vectors corresponding to columns of tables and/or tabular data arrangements in a training dataset to one or more feature vectors corresponding to columns of tables and/or tabular data arrangements in a test dataset; and
generating a pairwise similarity score for each pair of compared feature vectors; and
wherein each pairwise similarity score is a function of a number of common OCR elements between the corresponding pair of compared feature vectors.
|