US 11,776,287 B2
Document segmentation for optical character recognition
Udi Barzelay, Haifa (IL); Ophir Azulai, Tivon (IL); and Inbar Shapira, Givat Ada (IL)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Apr. 27, 2021, as Appl. No. 17/241,784.
Prior Publication US 2022/0343103 A1, Oct. 27, 2022
Int. Cl. G06V 30/148 (2022.01); G06T 3/40 (2006.01); G06N 3/08 (2023.01); G06V 30/413 (2022.01); G06V 30/414 (2022.01); G06V 30/18 (2022.01)
CPC G06V 30/153 (2022.01) [G06N 3/08 (2013.01); G06T 3/40 (2013.01); G06V 30/18057 (2022.01); G06V 30/413 (2022.01); G06V 30/414 (2022.01)] 14 Claims
OG exemplary drawing
 
11. A computer system for detecting text within an image, the computer system comprising:
one or more computer processors;
one or more computer readable storage media; and
program instructions stored on the one or more computer readable storage media for execution by at least one of the one or more processors, the program instructions comprising:
train a model to detect text within the image, with a plurality of synthesized noisy images containing text, wherein the model comprises a Unet convolutional neural network architecture;
receive an input image;
generate a pixel probability estimate for each pixel in the input image, based at least in part on the Unet convolutional neural network architecture, wherein the probability estimate is the probability a pixel is part of a segmented text generate a segmentation map based at least in part on the per pixel probability estimate for each pixel in the input image;
generate one or more bounding boxes around the detected text, based at least in part on the segmentation map; and
mask one or more sections of the input image outside of the bounding boxes.