US 12,321,711 B2
	Stable real-time translations of audio streams
Dirk Ryan Padfield, Seattle, WA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Mar. 26, 2024, as Appl. No. 18/617,428.
Application 18/617,428 is a continuation of application No. 17/269,800, granted, now 11,972,226, previously published as PCT/US2020/024277, filed on Mar. 23, 2020.
Claims priority of provisional application 62/970,892, filed on Feb. 6, 2020.
Prior Publication US 2024/0265215 A1, Aug. 8, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/58 (2020.01); G10L 15/00 (2013.01); G10L 15/06 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2006.01)

CPC G06F 40/58 (2020.01) [G10L 15/005 (2013.01); G10L 15/063 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

obtaining an audio data stream comprising audio recorded in a first language;

for at least one partial transcription from the one or more partial transcriptions:

translating the at least one partial transcription into a second language that is different from the first language;

providing the translated partial transcription into a machine learning model, wherein the machine learning model is configured to predict whether a portion of an input translated partial transcription is stable, the machine learning model is trained to minimize a loss function and is trained using a set of translations of transcriptions generated from one or more training audio data files recorded in the first language and a corresponding set of stability values that each indicate whether a respective set translation is stable;

generating, using the machine learning model, a prediction indicating that a portion of the translated partial transcription is stable, wherein a portion of an input translated partial transcription is stable when the portion of the input translated partial transcription is not expected to change in another translated partial transcription of audio recorded until a time interval that is later than the time interval for the input translated partial transcription; and

providing, using the prediction, the portion of the translated partial transcription to a device.