| CPC G06V 20/63 (2022.01) [G06V 30/18 (2022.01); G06V 30/19093 (2022.01); G06V 30/19147 (2022.01)] | 20 Claims |

|
1. A method of scene text detection, comprising:
generating, with an image encoding process, a first visual representation of a first image;
generating, with a text encoding process, based on a first plurality of symbols, a first textual representation of a first text unit in the first image, the first plurality of symbols obtained by masking a first symbol of a plurality of symbols in the first text unit;
determining, with a decoding process, a first prediction of the masked first symbol based on the first visual and textual representations; and
updating at least the image encoding process according to at least a first training objective to increase at least similarity of the first prediction and the masked first symbol.
|