US 11,942,078 B2
Chunking and overlap decoding strategy for streaming RNN transducers for speech recognition
George Andrei Saon, Stamford, CT (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Feb. 26, 2021, as Appl. No. 17/186,167.
Prior Publication US 2022/0277734 A1, Sep. 1, 2022
Int. Cl. G10L 15/20 (2006.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G10L 15/16 (2006.01); G10L 15/26 (2006.01); G10L 15/28 (2013.01)
CPC G10L 15/16 (2013.01) [G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G10L 15/20 (2013.01); G10L 15/26 (2013.01); G10L 15/28 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for improving accuracy recognition of digital speech, comprising:
receiving the digital speech;
splitting the digital speech into overlapping chunks;
computing a bidirectional encoder embedding of each of the overlapping chunks to obtain bidirectional encoder embeddings;
combining the bidirectional encoder embeddings; and
interpreting, by a speech recognition system, the digital speech using the combined bidirectional encoder embeddings.