US 12,333,832 B2
Method and system to detect a text from multimedia content captured at a scene
Apurba Das, Bangalore (IN); Pallavi Saha, Bangalore (IN); Jaimin Ashokbhai Bhoi, Bangalore (IN); Nikhil Shaw, Bangalore (IN); and Govind Jee, Bangalore (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Mar. 3, 2023, as Appl. No. 18/117,041.
Claims priority of application No. 202221013459 (IN), filed on Mar. 11, 2022.
Prior Publication US 2023/0290165 A1, Sep. 14, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 30/146 (2022.01); G06V 20/62 (2022.01); G06V 30/12 (2022.01); G06V 30/14 (2022.01); G06V 30/19 (2022.01)
CPC G06V 30/1463 (2022.01) [G06V 20/63 (2022.01); G06V 30/133 (2022.01); G06V 30/1444 (2022.01); G06V 30/147 (2022.01); G06V 30/191 (2022.01)] 18 Claims
OG exemplary drawing
 
1. A processor implemented method, comprising:
receiving, via one or more hardware processors, an original image captured from a scene as an input;
processing, via the one or more hardware processors, the original image by a trained model to obtain at least one individual character associated with at least one bounding box on the original image;
determining, via the one or more hardware processors, whether a number of detected characters is equal to a number of expected characters (N) on the original image based on the at least one individual character associated with the at least one bounding box on the original image, wherein performing, in response to determining that the number of detected characters is not equal to the number of expected characters (N) on the original image, the steps of:
determining, via the one or more hardware processors, a gradient at which one or more texts are inclined by the at least one bounding box from the original image;
positioning, via the one or more hardware processors, the original image by the gradient to obtain a rotated image, and wherein the rotated image corresponds to the one or more texts aligned in a vertical orientation; and
processing, via the one or more hardware processors, the rotated image by the trained model to obtain at least one individual character associated with at least one bounding box on the rotated image;
determining, via the one or more hardware processors, whether a number of detected characters is equal to a number of expected characters (N) on the rotated image based on the at least one individual character associated with the at least one bounding box on the rotated image, wherein performing, in response to determining that the number of detected characters is not equal to the number of expected characters (N) on the rotated image, the steps of:
estimating, via the one or more hardware processors, one or more missing character bounding boxes on the original image, and one or more missing character bounding boxes on the rotated image; and
constructing, via the one or more hardware processors, a horizontal text image; and
detecting, via the one or more hardware processors, one or more missing characters in the one or more estimated missing character bounding boxes based on one or more returned texts.