CPC G06V 30/416 (2022.01) [G06F 18/2163 (2023.01); G06F 18/22 (2023.01); G06T 5/30 (2013.01); G06T 5/40 (2013.01); G06V 10/462 (2022.01); G06V 30/18143 (2022.01); G06T 2207/20076 (2013.01); G06T 2207/30176 (2013.01)] | 20 Claims |
1. A method, comprising:
receiving, by a processing device, a first set of document images;
extracting a plurality of keypoint regions from each document image of the first set of document images;
calculating local descriptors for each keypoint region of the extracted keypoint regions;
clustering the local descriptors such that each center of a cluster of local descriptors corresponds to a respective visual word;
generating a codebook containing a set of visual words;
optimizing the codebook by maximizing mutual information (MI) between a target field of a second set of document images and at least one visual word of the set of visual words; and
detecting, using the codebook, one or more fields in a document image.
|