US 12,217,017 B2
Translation of text depicted in images
Puneet Jain, Saratoga, CA (US); Orhan Firat, Mountain View, CA (US); and Sihang Liang, Princeton, NJ (US)
Assigned to Google LLC, Mountain View, CA (US)
Appl. No. 17/791,409
Filed by GOOGLE LLC, Mountain View, CA (US)
PCT Filed Jan. 8, 2020, PCT No. PCT/US2020/012646
§ 371(c)(1), (2) Date Jul. 7, 2022,
PCT Pub. No. WO2021/141576, PCT Pub. Date Jul. 15, 2021.
Prior Publication US 2023/0124572 A1, Apr. 20, 2023
Int. Cl. G06F 40/58 (2020.01); G06V 10/44 (2022.01); G06V 10/77 (2022.01); G06V 10/82 (2022.01); G06V 20/62 (2022.01)
CPC G06F 40/58 (2020.01) [G06V 10/454 (2022.01); G06V 10/7715 (2022.01); G06V 10/82 (2022.01); G06V 20/62 (2022.01)] 19 Claims
OG exemplary drawing
 
8. A system, comprising:
one or more memory devices storing instructions; and
one or more data processing apparatus that are configured to interact with the one or more memory devices, and upon execution of the instructions, perform operations including:
obtaining a first image that depicts first text written in a source language;
inputting the first image into an image feature extractor of an image translation model that is trained, using a loss function, to extract, from an input image, a set of image features that are a description of a portion of the input image in which the text is depicted;
obtaining, from the feature extractor and in response to inputting the first image into the feature extractor, a first set of image features representing the first text present in the first image;
inputting the first set of image features into a decoder of the image translation model, wherein the decoder is trained, using the loss function, to infer text in a target language from an input set of image features, wherein the inferred text is a predicted translation of text represented by the input set of image features; and
obtaining, from the decoder and in response to inputting the first set of image features into the decoder, a second text that is in a target language and is predicted to be a translation of the first text.