US 12,148,419 B2
	Reducing exposure bias in machine learning training of sequence-to-sequence transducers
Xiaodong Cui, Chappaqua, NY (US); Brian E. D. Kingsbury, Cortlandt Manor, NY (US); George Andrei Saon, Stamford, CT (US); David Haws, New York City, NY (US); and Zoltan Tueske, White Plains, NY (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Dec. 13, 2021, as Appl. No. 17/549,006.
Prior Publication US 2023/0186903 A1, Jun. 15, 2023
Int. Cl. G10L 15/16 (2006.01); G06N 3/045 (2023.01); G06N 5/04 (2023.01)

CPC G10L 15/16 (2013.01) [G06N 3/045 (2023.01); G06N 5/04 (2013.01)]

20 Claims

1. A method, in a data processing system, for performing machine learning training of a computer model, the method comprising: generating, by a perturbation generator, a modified training data comprising perturbations injected into original training data by the perturbation generator, where the original training data comprises data samples with corresponding original ground truth labels, and where the perturbations cause a data corruption of the original training data at least by modifying one or more of the original ground truth labels, associated with one or more selected data samples of the original training data to generate one or more perturbed ground truth labels, wherein the modified training data comprises the data samples and corresponding ground truth labels, wherein the corresponding ground truth labels comprise the one or more perturbed ground truth labels for the one or more selected data samples, and the original ground truth labels for data samples that are not in the one or more selected data samples; inputting the modified training data into a prediction network of the computer model; processing the modified training data through the prediction network to generate a prediction output, wherein the prediction network processes, for the one or more selected data samples, the one or more perturbed ground truth labels when generating the prediction output; executing machine learning training of the prediction network based on the prediction output and the original training data comprising the original ground truth labels to generate a trained prediction network of a trained computer model, wherein a loss function implemented in the machine learning training operates based on the original ground truth labels; and deploying the trained computer model to an artificial intelligence computing system for performance of an inference operation.

11. A computer program product comprising a non-transitory computer readable medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: generate, by a perturbation generator, a modified training data comprising perturbations injected into original training data by the perturbation generator, where the original training data comprises data samples with corresponding original ground truth labels, and where the perturbations cause a data corruption of the original training data at least by modifying one or more of the original ground truth labels, associated with one or more selected data samples of the original training data to generate one or more perturbed ground truth labels, wherein the modified training data comprises the data samples and corresponding ground truth labels, wherein the corresponding ground truth labels comprise the one or more perturbed ground truth labels for the one or more selected data samples, and the original ground truth labels for data samples that are not in the one or more selected data samples; input the modified training data into a prediction network of the computer model; process the modified training data through the prediction network to generate a prediction output, wherein the prediction network processes, for the one or more selected data samples, the one or more perturbed ground truth labels when generating the prediction output; execute machine learning training of the prediction network based on the prediction output and the original training data comprising the original ground truth labels to generate a trained prediction network of a trained computer model, wherein a loss function implemented in the machine learning training operates based on the original ground truth labels; and deploy the trained computer model to an artificial intelligence computing system for performance of an inference operation.

20. An apparatus comprising: a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to: generate, by a perturbation generator, a modified training data comprising perturbations injected into original training data by the perturbation generator, where the original training data comprises data samples with corresponding original ground truth labels, and where the perturbations cause a data corruption of the original training data at least by modifying one or more of the original ground truth labels, associated with one or more selected data samples of the original training data to generate one or more perturbed ground truth labels, wherein the modified training data comprises the data samples and corresponding ground truth labels, wherein the corresponding ground truth labels comprise the one or more perturbed ground truth labels for the one or more selected data samples, and the original ground truth labels for data samples that are not in the one or more selected data samples; input the modified training data into a prediction network of the computer model; process the modified training data through the prediction network to generate a prediction output, wherein the prediction network processes, for the one or more selected data samples, the one or more perturbed ground truth labels when generating the prediction output; execute machine learning training of the prediction network based on the prediction output and the original training data comprising the original ground truth labels to generate a trained prediction network of a trained computer model, wherein a loss function implemented in the machine learning training operates based on the original ground truth labels; and deploy the trained computer model to an artificial intelligence computing system for performance of an inference operation.