US 12,249,168 B2
	Text detection algorithm for separating words detected as one text bounding box
Ophir Azulai, Tivon (IL); Udi Barzelay, Haifa (IL); and Oshri Pesah Naparstek, Karmiel (IL)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jan. 31, 2022, as Appl. No. 17/649,406.
Prior Publication US 2023/0245481 A1, Aug. 3, 2023
Int. Cl. G06K 9/00 (2022.01); G06T 3/4046 (2024.01); G06T 11/60 (2006.01); G06V 30/148 (2022.01); G06V 30/414 (2022.01)

CPC G06V 30/153 (2022.01) [G06T 3/4046 (2013.01); G06T 11/60 (2013.01); G06V 30/414 (2022.01)]

20 Claims

1. A method for text detection, the method comprising:

training a text detection model, wherein the training comprises generating text using a generator created to generate random text with variability in size, font, and/or background, injected with additional noise;

performing text detection on an inputted image using the trained text detection model;

determining whether at least one of a plurality of bounding boxes generated using the inputted image has an aspect ratio above a threshold;

based upon determining that at least one of the plurality of bounding boxes generated using the inputted image has the aspect ratio above the threshold, upscaling any text within the at least one bounding box;

copying the at least one of the plurality of generated bounding boxes having the aspect ratio above the threshold to a new image file having a same file format as the inputted image;

performing text detection on a new image using the trained text detection model;

combining both the plurality of bounding boxes generated using the inputted image and the new image file, wherein upon determining that at least one of the plurality of bounding boxes generated using the inputted image contains more than one word, replacing any corresponding portions of the inputted image with the at least one bounding box generated using the new image; and

outputting an output image, wherein the output image is comprised of at least one bounding box generated using the inputted image and at least one bounding box generated using the new image.