US 12,033,616 B2
Method for training speech recognition model, device and storage medium
Junyao Shao, Beijing (CN); Xiaoyin Fu, Beijing (CN); Qiguang Zang, Beijing (CN); Zhijie Chen, Beijing (CN); Mingxin Liang, Beijing (CN); Huanxin Zheng, Beijing (CN); and Sheng Qian, Beijing (CN)
Assigned to BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed by BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD., Beijing (CN)
Filed on Jan. 10, 2022, as Appl. No. 17/571,805.
Claims priority of application No. 202110308608.0 (CN), filed on Mar. 23, 2021.
Prior Publication US 2022/0310064 A1, Sep. 29, 2022
Int. Cl. G10L 15/06 (2013.01); G10L 15/16 (2006.01); G10L 15/183 (2013.01); G10L 15/28 (2013.01)
CPC G10L 15/063 (2013.01) [G10L 15/16 (2013.01); G10L 15/183 (2013.01); G10L 15/28 (2013.01)] 15 Claims
OG exemplary drawing
 
1. A method for training a speech recognition model, the speech recognition model comprising an acoustic decoding model and a language model, the method comprising:
obtaining a fusion probability of each of at least one candidate text corresponding to a speech based on the acoustic decoding model and the language model;
selecting a preset number of one or more candidate texts based on the fusion probability of each of the at least one candidate text, and determining a predicted text based on the preset number of one or more candidate texts; and
obtaining a loss function based on the predicted text and a standard text corresponding to the speech, and training the speech recognition model based on the loss function,
wherein the obtaining the loss function based on the predicted text and the standard text corresponding to the speech comprises:
obtaining an accumulated number of errors of the predicted text based on the predicted text and the standard text corresponding to the speech, the accumulated error number being obtained based on a historical error number and a current error number; and
obtaining the loss function based on the accumulated error number of the predicted text.