| CPC G10L 15/063 (2013.01) [G10L 15/02 (2013.01)] | 18 Claims |

|
1. A method implemented by a computing device, comprising:
extracting and encoding speech acoustic features of a received voice stream in units of frames;
performing block processing on encoded frames, and predicting a number of activation points included in a same block that need to be encoded and outputted; and
determining a position of at least one activation point that needs to be decoded and outputted according to a prediction result, to allow a decoder to perform decoding at the position of the at least one activation point and output a recognition result, wherein determining the position of the at least one activation point that needs to be decoded and outputted according to the prediction result, comprises:
comparing Attention coefficients of each frame in the same block and sort the Attention coefficients in order of magnitudes, the Attention coefficients being used to describe probabilities that respective frames need to be decoded and outputted; and
determining positions of frames associated with a corresponding number of first few highest Attention coefficients among encoding results of each frame included in the same block as the position of the at least one activation point according to the number of activation points included in the same block.
|
|
17. A system comprising:
one or more processors; and
memory storing executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising:
extracting and encoding speech acoustic features of a received voice stream in units of frames;
performing block processing on encoded frames, and predicting a number of activation points included in a same block that need to be encoded and outputted; and
determining a position of at least one activation point that needs to be decoded and outputted according to a prediction result, to allow a decoder to perform decoding at the position of the at least one activation point and output a recognition result, wherein determining the position of the at least one activation point that needs to be decoded and outputted according to the prediction result, comprises:
comparing Attention coefficients of each frame in the same block and sort the Attention coefficients in order of magnitudes, the Attention coefficients being used to describe probabilities that respective frames need to be decoded and outputted; and
determining positions of frames associated with a corresponding number of first few highest Attention coefficients among encoding results of each frame included in the same block as the position of the at least one activation point according to the number of activation points included in the same block.
|