US 12,079,587 B1
Multi-task automatic speech recognition system
Alec Radford, San Francisco, CA (US); Jong Wook Kim, San Francisco, CA (US); Tao Xu, San Francisco, CA (US); Greg Brockman, San Francisco, CA (US); Christine McLeavey-Payne, San Francisco, CA (US); and Ilya Sutskever, San Francisco, CA (US)
Assigned to OpenAI OpCo, LLC, San Francisco, CA (US)
Filed by OpenAI Opco, LLC, San Francisco, CA (US)
Filed on Apr. 18, 2023, as Appl. No. 18/302,289.
Int. Cl. G06F 40/58 (2020.01)
CPC G06F 40/58 (2020.01) 18 Claims
OG exemplary drawing
 
1. A system comprising:
at least one memory storing instructions; and
at least one processor configured to execute the instructions to perform operations for multi-language, multi-task speech recognition, the operations comprising:
obtaining a transformer model including an encoder and a decoder, the transformer model trained to transcribe or translate audio data in multiple languages using labeled audio data, the labeled audio data including first audio segments associated with first same-language transcripts of the first audio segments and second audio segments associated with second different-language transcripts of the second audio segments; and
generating an output transcript from an input audio segment using the transformer model, generation including:
configuring a decoder input with a language token corresponding to a first language;
configuring the decoder input with a task token; and
autoregressively configuring the decoder input with a first timestamp token predicted by the decoder based on an absence of a notimestamp token in the decoder input.