| CPC G06F 16/906 (2019.01) [G06F 16/93 (2019.01); G06V 30/414 (2022.01); G06V 30/418 (2022.01)] | 27 Claims |

|
1. A computer-implemented method for classifying a document, comprising:
receiving a plurality of reference documents;
at a hardware processing device, for each of the reference documents:
automatically identifying a plurality of bounding boxes, each surrounding a block of content within the reference document; and
automatically identifying a subset of the bounding boxes for each reference document as representing noise;
at the hardware processing device, generating a feature vector for each of the reference documents based on the bounding boxes identified in the reference document that are not included in the identified subset representing noise;
storing the generated feature vectors at a storage device;
receiving a target document for classification;
at the hardware processing device:
automatically identifying a plurality of bounding boxes for the target document, each surrounding a block of content within the target document;
automatically identifying a subset of the bounding boxes for the target document as representing noise;
generating a feature vector based on the bounding boxes identified in the target document that are not included in the identified subset representing noise;
comparing the feature vector for the target document with the feature vectors for the reference documents, to determine which reference document feature vector is most closely aligned with the target document feature vector; and
classifying the target document based on the comparing step; and
at an output device, outputting results of the classifying step.
|