US 12,254,707 B2
Pre-training for scene text detection
Chuhui Xue, Singapore (SG); Wenqing Zhang, Singapore (SG); Yu Hao, Beijing (CN); and Song Bai, Singapore (SG)
Assigned to LEMON INC., Grand Cayman (KY); and BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD., Beijing (CN)
Filed by Lemon Inc., Grand Cayman (KY); and Beijing Youzhuju Network Technology Co., Ltd., Beijing (CN)
Filed on Sep. 28, 2022, as Appl. No. 17/955,285.
Prior Publication US 2024/0119743 A1, Apr. 11, 2024
Int. Cl. G06V 20/62 (2022.01); G06V 30/18 (2022.01); G06V 30/19 (2022.01)
CPC G06V 20/63 (2022.01) [G06V 30/18 (2022.01); G06V 30/19093 (2022.01); G06V 30/19147 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method of scene text detection, comprising:
generating, with an image encoding process, a first visual representation of a first image;
generating, with a text encoding process, based on a first plurality of symbols, a first textual representation of a first text unit in the first image, the first plurality of symbols obtained by masking a first symbol of a plurality of symbols in the first text unit;
determining, with a decoding process, a first prediction of the masked first symbol based on the first visual and textual representations; and
updating at least the image encoding process according to at least a first training objective to increase at least similarity of the first prediction and the masked first symbol.