US 12,469,318 B2
Training and using a vector encoder to determine vectors for sub-images of text in an image subject to optical character recognition
Zhong Fang Yuan, Xi'an (CN); Tong Liu, Xi'an (CN); Yi Chen Zhong, Shanghai (CN); Xiang Yu Yang, Xi'an (CN); and Guan Chao Li, Shanghai (CN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Sep. 15, 2022, as Appl. No. 17/932,639.
Prior Publication US 2024/0096121 A1, Mar. 21, 2024
Int. Cl. G06V 30/148 (2022.01); G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 30/182 (2022.01)
CPC G06V 30/153 (2022.01) [G06V 10/7747 (2022.01); G06V 10/82 (2022.01); G06V 30/1823 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer program product for performing optical character recognition processing of an image, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein that is executable to perform operations, the operations comprising:
providing a vector encoder trained to encode images, comprising digital images representing text, into vectors in a vector space, wherein vectors of images representing similar text have a high degree of cohesion in the vector space, and wherein vectors of images representing dissimilar text have a low degree of cohesion in the vector space;
processing an input image to determine sub-images, of the input image, wherein the sub-images bound the text represented in the input image;
inputting the sub-images to the vector encoder to output sub-image vectors, wherein the sub-image vectors represent the sub-images in the vector space;
using the vector encoder to generate a search vector for search text, wherein the search vector represents the search text in the vector space;
determining the sub-image vectors that match the search vector; and
applying optical character recognition to at least one region of the input image including the sub-images having the sub-image vectors matching the search vector based on closeness of the sub-image vectors and the search vector in the vector space.