US 12,451,124 B2
	Domain adaptive speech recognition using artificial intelligence
Tohru Nagano, Tokyo (JP); and Gakuto Kurata, Tokyo (JP)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Oct. 13, 2022, as Appl. No. 17/965,226.
Prior Publication US 2024/0127801 A1, Apr. 18, 2024
Int. Cl. G10L 15/16 (2006.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/30 (2013.01)

CPC G10L 15/16 (2013.01) [G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/30 (2013.01); G10L 2015/022 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

generating a set of language data candidates, each language data candidate comprising one or more graphemes, by processing a sequence of phonemes related to user-provided domain-specific input speech data using an artificial intelligence-based data conversion model comprising a neural network model sharing a prediction network from a recurrent neural network transducer;

determining, for a target pair of one or more phonemes and one or more graphemes, a subset of graphemes from the set of language data candidates;

training at least one biasing language model using at least a portion of the subset of graphemes;

generating a first speech recognition output by processing the at least a portion of the subset of graphemes using the at least one biasing language model and an artificial intelligence-based speech recognition model comprising the recurrent neural network transducer, including the prediction network shared by the artificial intelligence-based data conversion model;

generating a second speech recognition output by replacing at least a portion of the subset of graphemes in the first speech recognition output with at least one of the one or more graphemes from the target pair; and

performing one or more automated actions based at least in part on the second speech recognition output;

wherein the method is carried out by at least one computing device.