US 12,230,275 B2
Speech instruction recognition method, electronic device, and non-transient computer readable storage medium
Shaoxun Su, Beijing (CN)
Assigned to BOE Technology Group Co., Ltd., Beijing (CN)
Appl. No. 17/611,436
Filed by BOE Technology Group Co., Ltd., Beijing (CN)
PCT Filed Jan. 6, 2021, PCT No. PCT/CN2021/070516
§ 371(c)(1), (2) Date Nov. 15, 2021,
PCT Pub. No. WO2022/147692, PCT Pub. Date Jul. 14, 2022.
Prior Publication US 2023/0386470 A1, Nov. 30, 2023
Int. Cl. G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/22 (2006.01); G10L 15/26 (2006.01); G10L 25/24 (2013.01)
CPC G10L 15/26 (2013.01) [G10L 15/063 (2013.01); G10L 25/24 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A speech instruction recognition method, comprising:
acquiring a target speech;
processing the target speech to obtain a target speech vector corresponding to the target speech;
performing speech recognition on the target speech to obtain a target speech text of the target speech, and processing the target speech text to obtain a target text vector corresponding to the target speech text;
generating a to-be-trained instruction recognition model;
obtaining a pre-trained instruction recognition model by performing an iterative training on the to-be-trained instruction recognition model with sample speeches;
inputting the target speech vector and the target text vector to the pre-trained instruction recognition model to obtain an instruction category corresponding to the target speech, so that a corresponding operation is performed according to the obtained instruction category,
wherein the step of inputting the target speech vector and the target text vector to the pre-trained instruction recognition model to obtain an instruction category corresponding to the target speech comprises:
performing concat on the target speech vector and the target text vector to obtain a concat vector; and
inputting the concat vector to the pre-trained instruction recognition model to obtain the instruction category corresponding to the target speech;
performing a response or operation using a smart device or an IoT (Internet of Things) system according to the instruction category.