US 12,014,722 B2
	System and method for data augmentation of feature-based voice data
Dushyant Sharma, Woburn, MA (US); Patrick A. Naylor, Reading (GB); and James W. Fosburgh, Winchester, MA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Mar. 10, 2021, as Appl. No. 17/197,587.
Claims priority of provisional application 62/988,337, filed on Mar. 11, 2020.
Prior Publication US 2021/0287660 A1, Sep. 16, 2021
Int. Cl. G10L 13/02 (2013.01); G06F 3/16 (2006.01); G06N 5/02 (2023.01); G06N 20/00 (2019.01); G10K 15/08 (2006.01); G10L 13/033 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/065 (2013.01); G10L 21/0224 (2013.01); G10L 25/03 (2013.01); H04S 7/00 (2006.01)

CPC G10L 13/02 (2013.01) [G06F 3/165 (2013.01); G06N 5/02 (2013.01); G06N 20/00 (2019.01); G10K 15/08 (2013.01); G10L 13/033 (2013.01); G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/065 (2013.01); G10L 21/0224 (2013.01); G10L 25/03 (2013.01); H04S 7/30 (2013.01); H04S 7/302 (2013.01); H04S 7/303 (2013.01)]

20 Claims

1. A computer-implemented method, executed on a computing device, comprising:

receiving feature-based voice data associated with a first acoustic domain, wherein the feature-based voice data is converted from a signal in the first acoustic domain to a feature domain;

performing one or more gain-based augmentations on at least a portion of the feature-based voice data converted from the signal in the first acoustic domain to the feature domain, thus defining gain-augmented feature-based voice data;

receiving a selection of a target acoustic domain;

determining a distribution of gain levels from training data associated with the target acoustic domain varies over time for one or more of particular frequencies and particular frequency bands;

mapping the gain-augmented feature-based voice data from the first acoustic domain to the target acoustic domain, and

training a speech processing system for the target acoustic domain based on the gain-augmented feature-based voiced data mapped from the first acoustic domain to the target acoustic domain.