US 12,354,002 B1
	Customized machine learning models
Frederick Weber, Bangalore (IN)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 14, 2022, as Appl. No. 18/080,957.
Int. Cl. G10L 15/00 (2013.01); G06N 3/045 (2023.01); G06N 3/0499 (2023.01); G06N 3/08 (2023.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01); G10L 15/22 (2006.01)

CPC G06N 3/08 (2013.01) [G06N 3/045 (2023.01); G06N 3/0499 (2023.01); G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/16 (2013.01); G10L 15/22 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving first input data representing first acoustic feature data received from a first neural network of an automatic speech recognition (ASR) component;

processing the first input data using a first classifier component to determine a first plurality of softmax values representing respective likelihoods that the first input data corresponds to a first plurality of categories;

selecting, from a first plurality of feedforward neural networks and using the first plurality of softmax values, a first feedforward neural network to be used to process the first input data, the first plurality of feedforward neural networks including at least a second feedforward neural network different from the first feedforward neural network;

processing the first input data using the first feedforward neural network to generate first processed data;

processing the first input data using a second classifier component to determine a second plurality of softmax values representing respective likelihoods that the first input data corresponds to a second plurality of categories;

selecting, from a second plurality of feedforward neural networks and using the second plurality of softmax values, a third feedforward neural network to be used to process the first input data, the third feedforward neural network performing a null transformation, wherein the second plurality of feedforward neural networks includes at least a fourth feedforward neural network different from the third feedforward neural network and a fifth feedforward neural network different from the fourth feedforward neural network and the third feedforward neural network;

processing the first input data using the third feedforward neural network to generate second processed data;

determining first output data by summing, using the first plurality of softmax values and the second plurality of softmax values, the first processed data, the second processed data, and the first input data, the first output data representing first adapted acoustic feature data; and

processing the first output data using at least a second neural network of the ASR component to determine ASR output data.