US 11,942,082 B2
	Facilitating communications with automated assistants in multiple languages
James Kuczmarski, San Francisco, CA (US); Vibhor Jain, Sunnyvale, CA (US); Amarnag Subramanya, Menlo Park, CA (US); Nimesh Ranjan, San Francisco, CA (US); Melvin Jose Johnson Premkumar, Sunnyvale, CA (US); Vladimir Vuskovic, Zollikerberg (CH); Luna Dai, San Francisco, CA (US); Daisuke Ikeda, Sunnyvale, CA (US); Nihal Sandeep Balani, Sunnyvale, CA (US); Jinna Lei, San Francisco, CA (US); Mengmeng Niu, San Jose, CA (US); Hongjie Chai, Palo Alto, CA (US); and Wangqing Yuan, Wilmington, MA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on May 26, 2022, as Appl. No. 17/825,778.
Application 17/825,778 is a continuation of application No. 16/792,572, filed on Feb. 17, 2020, granted, now 11,354,521.
Application 16/792,572 is a continuation in part of application No. 16/082,175, granted, now 10,984,784, issued on Apr. 20, 2021, previously published as PCT/US2018/027774, filed on Apr. 16, 2018.
Claims priority of provisional application 62/639,740, filed on Mar. 7, 2018.
Prior Publication US 2022/0284198 A1, Sep. 8, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/47 (2020.01); G06F 16/33 (2019.01); G06F 16/332 (2019.01); G06F 18/22 (2023.01); G06F 40/58 (2020.01); G06N 20/00 (2019.01); G10L 15/00 (2013.01); G10L 15/183 (2013.01); G10L 15/22 (2006.01); H04L 51/02 (2022.01)

CPC G10L 15/183 (2013.01) [G06F 16/3329 (2019.01); G06F 16/3337 (2019.01); G06F 18/22 (2023.01); G06F 40/47 (2020.01); G06F 40/58 (2020.01); G06N 20/00 (2019.01); G10L 15/005 (2013.01); G10L 15/22 (2013.01); H04L 51/02 (2013.01)]

10 Claims

1. A method for training a neural machine translation model to translate from a first language to a second language, the method implemented by one or more processors and comprising:

applying a multi-word textual query in the first language as input across a cross-lingual machine learning model that is different from the neural machine translation model to generate a first embedding of the multi-word textual query in a reduced dimensionality space;

identifying a plurality of additional embeddings in the reduced dimensionality space based on one or more respective proximities of the plurality of additional embeddings to the first embedding in the reduced dimensionality space, wherein the respective proximities are determined using cosine similarity or Euclidean distance, and the plurality of additional embeddings were generated based on a plurality of respective multi-word textual queries in the second language;

selecting one of the textual queries in the second language from the plurality of embeddings based on one or more additional criteria, wherein the one or more additional criteria include one or more of:

a shortest edit distance between the multi-word textual query in the first language and the selected one of the textual queries in the second language;

the multi-word textual query in the first language and the selected one of the textual queries in the second language being submitted to automated assistants at the most similar frequencies;

the most similar lengths of the multi-word textual query in the first language and the selected one of the textual queries in the second language; or

the multi-word textual query in the first language and the selected one of the textual queries in the second language having the most shared characters;

generating and storing at least one training example of the training data using the multi-word textual query in the first language and the selected one of the multi-word textual queries in the second language that was used to generate a respective one of the additional embeddings; and

training the neural machine translation model using the training data.