CPC G10L 15/16 (2013.01) [G06N 3/08 (2013.01); G10L 15/063 (2013.01); G10L 15/197 (2013.01); G10L 15/22 (2013.01); G10L 15/32 (2013.01); G10L 25/18 (2013.01); G10L 15/20 (2013.01); G10L 2015/0631 (2013.01)] | 21 Claims |
1. A method for recognizing a voice, the method comprising:
inputting a target voice into a pre-trained voice recognition model to obtain an initial text output by at least one recognition network in the voice recognition model, wherein the at least one recognition network comprises an omnidirectional network and a plurality of directional networks, each of the plurality of directional networks being obtained by training using a voice sample in a different direction interval, wherein inputting the target voice into the pre-trained voice recognition model to obtain the initial text output by the at least one recognition network in the voice recognition model comprises:
inputting a transformed voice obtained from the target voice into the omnidirectional network to obtain a given voice feature output by a complex linear transformation layer of the omnidirectional network, and
inputting the given voice feature into each of the plurality of directional networks to obtain an initial sub-text output by each directional network, wherein each of the plurality of directional networks comprises a long short-term memory network layer and a streaming multi-layer truncated attention layer; and
determining a voice recognition result of the target voice, based on the initial text comprising the initial sub-text output by each directional network.
|