US 12,347,067 B2
	System and method for enhancing text in images based on super-resolution
Saptarshi Misra, West Bengal (IN); Anirban Chatterjee, Karnataka (IN); Pranay Dugar, Corvallis, OR (US); Kunal Banerjee, Karnataka (IN); and Lalitdutt Parsai, Indore (IN)
Assigned to Walmart Apollo, LLC, Bentonville, AR (US)
Filed by Walmart Apollo, LLC, Bentonville, AR (US)
Filed on Dec. 12, 2022, as Appl. No. 18/079,225.
Prior Publication US 2024/0193726 A1, Jun. 13, 2024
Int. Cl. G06T 3/4053 (2024.01); G06T 3/4007 (2024.01); G06T 5/50 (2006.01); G06T 7/73 (2017.01); G06V 10/774 (2022.01); G06V 20/62 (2022.01); G06V 10/77 (2022.01)

CPC G06T 3/4053 (2013.01) [G06T 3/4007 (2013.01); G06T 5/50 (2013.01); G06T 7/73 (2017.01); G06V 10/774 (2022.01); G06V 20/62 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06V 10/7715 (2022.01)]

19 Claims

1. A system for enhancing text in images, comprising:

a non-transitory memory having instructions stored thereon; and

at least one processor operatively coupled to the non-transitory memory, and configured to read the instructions to:

obtain a high resolution image;

generate a low resolution image based on the high resolution image, wherein the low resolution image is generated by:

down-sampling, based on a first interpolation method, the high resolution image to one-fourth of a dimension of the high resolution image to generate an intermediate image, and

up-sampling, based on a second interpolation method, the intermediate image to one-half of the dimension of the high resolution image to generate the low resolution image;

generate a super resolution image based on the low resolution image, using a super resolution model with a set of parameters;

based on the high resolution image and the super resolution image, compute a total loss function based on the set of parameters and at least one of: a detection loss function, a recognition loss function, and a gradient loss function;

generate a trained super resolution model with an optimized set of the parameters that minimizes the total loss function; and

enhance text in at least one image using the trained super resolution model.