US 12,125,473 B2
Speech recognition method, apparatus, and device, and storage medium
Tao Li, Shenzhen (CN)
Assigned to Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed on Mar. 4, 2021, as Appl. No. 17/192,316.
Application 17/192,316 is a continuation of application No. PCT/CN2019/120558, filed on Nov. 25, 2019.
Claims priority of application No. 201811508402.7 (CN), filed on Dec. 11, 2018.
Prior Publication US 2021/0193121 A1, Jun. 24, 2021
Int. Cl. G10L 15/06 (2013.01); G10L 15/02 (2006.01); G10L 15/10 (2006.01)
CPC G10L 15/063 (2013.01) [G10L 15/02 (2013.01); G10L 15/10 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A method for speech recognition, performed by a computing device, comprising:
extracting a reference edge from a first state diagram, and searching a second state diagram for a target edge having a same label and a same prefix path as the reference edge, wherein: the first state diagram is a state diagram of a key phrase language model of a textual segment; the second state diagram is a state diagram of a basic language model; the first state diagram includes a plurality of nodes and at least one path for transitioning from one node to another node; and the first state diagram is used as a reference to modify the second state diagram;
adjusting, by using a weight of the reference edge that represents a relationship between at least one pair of elements in the first state diagram, a weight of the target edge;
mapping an incentive weight of an edge in a language recognition model corresponding to the target edge in the second state diagram to the adjusted weight of the target edge, the language recognition model being a language model obtained after the basic language model is pruned;
inputting a to-be-recognized speech into a speech recognition model comprising the language recognition model;
obtaining word sequence paths outputted by the speech recognition model for the to-be-recognized speech;
determining that a first edge in each word sequence path in the word sequence paths has a mapped incentive weight at an edge level;
determining that a second edge in the each of the word sequence paths in the word sequence paths does not have a mapped incentive weight at the edge level;
calculating a score of the each of the word sequence path based on the mapped incentive weight of the first edge and an initial weight of the second edge; and
selecting a target path from the word sequence paths which has a highest score based on the calculated score of each of the word sequence paths in the word sequence paths, to obtain the speech recognition result.