US 12,094,469 B2
	Voice recognition method and device
Li Fu, Beijing (CN); and Xiaoxiao Li, Beijing (CN)
Assigned to JINGDONG TECHNOLOGY HOLDING CO., LTD., Beijing (CN)
Appl. No. 17/603,690
Filed by JINGDONG TECHNOLOGY HOLDING CO., LTD., Beijing (CN)
PCT Filed Mar. 3, 2020, PCT No. PCT/CN2020/077590 § 371(c)(1), (2) Date Oct. 14, 2021, PCT Pub. No. WO2020/220824, PCT Pub. Date Nov. 5, 2020.
Claims priority of application No. 201910354527.7 (CN), filed on Apr. 29, 2019.
Prior Publication US 2022/0238098 A1, Jul. 28, 2022
Int. Cl. G10L 15/26 (2006.01); G06F 17/14 (2006.01); G10L 15/06 (2013.01); G10L 25/30 (2013.01)

CPC G10L 15/26 (2013.01) [G06F 17/14 (2013.01); G10L 15/063 (2013.01); G10L 25/30 (2013.01)]

9 Claims

1. A method for recognizing speech, comprising:

by one or more processors, respectively setting initial values of a Chinese character coefficient and a Pinyin coefficient, generating a Chinese character mapping function according to the initial value of the Chinese character coefficient, and generating a Pinyin mapping function according to the initial value of the Pinyin coefficient;

by the one or more processors, training the Chinese character mapping function and the Pinyin mapping function using a plurality of preset training samples, calculating training results as parameters of a joint loss function, and generating a target mapping function according to calculation results; and

by the one or more processors, recognizing, according to the target mapping function, speech to be recognized, so as to obtain a Chinese character recognition result and a Pinyin recognition result of the speech to be recognized,

wherein the training the Chinese character mapping function and the Pinyin mapping function using a plurality of preset training samples, calculating training results as parameters of a joint loss function, and generating a target mapping function according to calculation results comprise:

acquiring a Chinese character loss value and a Pinyin loss value of each training sample according to the Chinese character mapping function, the Pinyin mapping function and the plurality of preset training samples;

calculating the Chinese character loss value and the Pinyin loss value of each training sample as parameters of the joint loss function, so as to obtain a joint loss value of each training sample; and

using a back-propagation algorithm for calculation according to the joint loss value of each training sample, to obtain target values of the Chinese character coefficient and the Pinyin coefficient, and generating a target mapping function according to the target values.