US 11,809,834 B2
	Machine translation using neural network models
Zhifeng Chen, Sunnyvale, CA (US); Macduff Richard Hughes, Los Gatos, CA (US); Yonghui Wu, Fremont, CA (US); Michael Schuster, Saratoga, CA (US); Xu Chen, San Francisco, CA (US); Llion Owen Jones, San Francisco, CA (US); Niki J. Parmar, Sunnyvale, CA (US); George Foster, Ottawa (CA); Orhan Firat, Mountain View, CA (US); Ankur Bapna, Sunnyvale, CA (US); Wolfgang Macherey, Sunnyvale, CA (US); and Melvin Jose Johnson Premkumar, Sunnyvale, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Aug. 27, 2021, as Appl. No. 17/459,041.
Application 17/459,041 is a continuation of application No. 16/521,780, filed on Jul. 25, 2019, granted, now 11,138,392.
Claims priority of provisional application 62/703,518, filed on Jul. 26, 2018.
Prior Publication US 2022/0083746 A1, Mar. 17, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/58 (2020.01); G06N 3/08 (2023.01)

CPC G06F 40/58 (2020.01) [G06N 3/08 (2013.01)]

20 Claims

1. A computer-implemented method for performing machine translation of text from a first language to a second language, the method comprising:

generating, by one or more processors, a set of encoding vectors from a series of feature vectors representing characteristics of a text segment in the first language, by processing the feature vectors with an encoder neural network comprising a set of bidirectional recurrent neural network layers, each encoding vector of the set having a predetermined number of values;

generating, by the one or more processors, multiple context vectors for each encoding vector based on multiple sets of parameters, the multiple sets of parameters being respectively used to generate the context vectors from different subsets of each encoding vector;

generating, by the one or more processors, a sequence of output vectors using a decoder neural network that receives the context vectors, the decoder neural network comprising a recurrent neural network, the output vectors representing distributions over language elements of the second language; and

determining, by the one or more processors, a translation of the text segment into the second language based on the sequence of output vectors.