US 12,394,430 B2
Method and apparatus with decoding in neural network for speech recognition
Jinwoo Park, Seoul (KR); Min-Joong Lee, Suwon-si (KR); Jihyun Lee, Suwon-si (KR); and Hoshik Lee, Seongnam-si (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Oct. 27, 2021, as Appl. No. 17/511,900.
Claims priority of application No. 10-2021-0035353 (KR), filed on Mar. 18, 2021.
Prior Publication US 2022/0301578 A1, Sep. 22, 2022
Int. Cl. G06N 3/04 (2023.01); G10L 15/16 (2006.01); G10L 15/22 (2006.01); G10L 19/04 (2013.01); G10L 21/04 (2013.01); G10L 25/30 (2013.01)
CPC G10L 21/04 (2013.01) [G06N 3/04 (2013.01); G10L 15/22 (2013.01); G10L 19/04 (2013.01); G10L 25/30 (2013.01)] 22 Claims
OG exemplary drawing
 
1. A decoding method, the method comprising:
receiving an input sequence corresponding to an input speech at a current time; and
in a neural network (NN) for speech recognition,
generating an encoded vector sequence by encoding the input sequence,
determining reuse tokens by determining whether substrings included in candidate beams of two or more previous times match, wherein the candidate beams of two or more previous times comprise different numbers of tokens, and
decoding one or more tokens subsequent to the reuse tokens based on the reuse tokens and the encoded vector sequence,
wherein the determining of the reuse tokens comprises:
determining a reuse time of tokens at a current time n, being the current time, subsequent to a previous time n−1 subsequent to a previous time n−2 based on a comparison result between a candidate beam of the previous time n−2 and a candidate beam of the previous time n−1, wherein n is a natural number greater than or equal to “3”; and
determining candidate beams accumulated up to the reuse time to be the reuse tokens.