US 12,482,483 B2
Length perturbation techniques for improving generalization of deep neural network acoustic models
Xiaodong Cui, Chappaqua, NY (US); Brian E. D. Kingsbury, Cortlandt Manor, NY (US); and George Andrei Saon, Stamford, CT (US)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 22, 2022, as Appl. No. 18/057,967.
Prior Publication US 2024/0170005 A1, May 23, 2024
Int. Cl. G10L 21/04 (2013.01)
CPC G10L 21/04 (2013.01) 20 Claims
OG exemplary drawing
 
1. A computer-implemented system, comprising:
a memory that stores computer executable components; and
a processor that executes at least one of the computer executable components that:
performs length perturbation of an acoustic utterance comprising a group of frames in a sequence, wherein the performing the length perturbation comprises:
random sampling a first defined percentage of frames from the group of frames resulting in a first subset of drop frames;
for each drop frame, removing a first defined quantity of consecutive frames from the group of frames starting with the drop frame;
random sampling a second defined percentage of frames from the group of frames resulting in a second subset of insert frames; and
for each insert frame, inserting a second defined quantity of replacement frames into the acoustic utterance after the insert frame.