US 12,444,409 B2
	Hybrid language models for conversational AI systems and applications
Vladimir Bataev, Yerevan (AM); Roman Korostik, Yerevan (AM); Evgenii Shabalin, Moscow (RU); Vitaly Sergeyevich Lavrukhin, Campbell, CA (US); and Boris Ginsburg, Sunnyvale, CA (US)
Assigned to NVIDIA Corporation, Santa Clara, CA (US)
Filed by NVIDIA Corporation, Santa Clara, CA (US)
Filed on Sep. 15, 2023, as Appl. No. 18/468,086.
Claims priority of provisional application 63/417,627, filed on Oct. 19, 2022.
Prior Publication US 2024/0135920 A1, Apr. 25, 2024 Prior Publication US 2024/0233714 A9, Jul. 11, 2024
Int. Cl. G10L 15/00 (2013.01); G10L 15/065 (2013.01); G10L 15/16 (2006.01)

CPC G10L 15/16 (2013.01) [G10L 15/065 (2013.01)]

20 Claims

1. A processor comprising:

one or more circuits to perform automatic speech recognition (ASR) using one or more ASR machine learning models (MLMs), the one or more ASR MLMs trained, at least, by:

generating, using one or more ASR MLMs and first textual data, one or more spectrograms;

generating, using the one or more ASR MLMs and the one or more spectrograms, output data indicating second textual data; and

updating one or more parameters of the one or more ASR MLMs based at least on the output data and ground truth data associated with the first textual data.