US 12,033,621 B2
Method for speech recognition based on language adaptivity and related apparatus
Dan Su, Shenzhen (CN); Tianxiao Fu, Shenzhen (CN); Min Luo, Shenzhen (CN); Qi Chen, Shenzhen (CN); Yulu Zhang, Shenzhen (CN); and Lin Luo, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed on Apr. 15, 2021, as Appl. No. 17/231,945.
Application 17/231,945 is a continuation of application No. PCT/CN2020/078806, filed on Mar. 11, 2020.
Claims priority of application No. 201910182266.5 (CN), filed on Mar. 11, 2019.
Prior Publication US 2021/0233521 A1, Jul. 29, 2021
Int. Cl. G10L 15/187 (2013.01); G10L 15/00 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/22 (2006.01)
CPC G10L 15/187 (2013.01) [G10L 15/005 (2013.01); G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01); G10L 2015/025 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method for speech recognition based on language adaptivity, comprising:
at a computing device that includes one or more processors and memory, the computing device being communicatively connected to an electronic device via a network communication link:
obtaining, from the electronic device via the network communication link, voice data of a user of the electronic device;
extracting, based on the obtained voice data, a phoneme feature representing pronunciation phoneme information:
inputting the extracted phoneme feature to a pre-trained multi-language discrimination model that is pre-trained based on a multilingual corpus corresponding to multiple languages;
determining, using the multi-language discrimination model, a language discrimination result corresponding to the extracted phoneme feature, the language discrimination result identifying a language to which the voice data belongs;
selecting, from a plurality of language acoustic models corresponding to different languages, a first language acoustic model based on the language discrimination result, the first language acoustic model corresponding to the language to which the voice data belongs;
inputting the extracted phoneme feature into the first language acoustic model to obtain a speech recognition result of the voice data based on the first language acoustic model; and
transmitting, via the network communication link to the electronic device, the obtained speech recognition result, wherein the electronic device is configured to present, to the user of the electronic device, interaction content in the language of the voice data of the user, the interaction content being responsive to the obtained voice data of the user and according to the obtained speech recognition result.