US 12,154,541 B2
System and method for data augmentation of feature-based voice data
Dushyant Sharma, Mountain House, CA (US); Patrick A. Naylor, Reading (GB); James W. Fosburgh, Baltimore, MD (US); and Do Yeong Kim, Lexington, MA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Mar. 10, 2021, as Appl. No. 17/197,717.
Claims priority of provisional application 62/988,337, filed on Mar. 11, 2020.
Prior Publication US 2021/0287653 A1, Sep. 16, 2021
Int. Cl. G10L 13/02 (2013.01); G06F 3/16 (2006.01); G06N 5/02 (2023.01); G06N 20/00 (2019.01); G10K 15/08 (2006.01); G10L 13/033 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/065 (2013.01); G10L 21/0224 (2013.01); G10L 25/03 (2013.01); H04S 7/00 (2006.01)
CPC G10L 13/02 (2013.01) [G06F 3/165 (2013.01); G06N 5/02 (2013.01); G06N 20/00 (2019.01); G10K 15/08 (2013.01); G10L 13/033 (2013.01); G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/065 (2013.01); G10L 21/0224 (2013.01); G10L 25/03 (2013.01); H04S 7/30 (2013.01); H04S 7/302 (2013.01); H04S 7/303 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method, executed on a computing device, comprising:
receiving feature-based voice data associated with a first acoustic domain, wherein the feature-based voice data is converted from an audio signal in the first acoustic domain to a feature domain;
extracting acoustic metadata from the audio signal before the audio signal is converted from the first acoustic domain to the feature domain;
processing the feature-based voice data associated with the audio signal based upon, at least in part, the acoustic metadata by performing one or more gain-based augmentations on at least a portion of the feature-based voice data associated with the audio signal in the first acoustic domain based upon, at least in part, the acoustic metadata;
receiving selections of various characteristics of the acoustic domain to define a target acoustic domain from a library of predefined acoustic domains;
determining a distribution of one or more reverberation levels associated with the target acoustic domain; and
performing one or more reverberation-based augmentations on at least a portion of the feature-based voice data converted from the audio signal in the first acoustic domain to the feature domain based upon, at least in part the distribution of the one or more reverberation levels associated with the target acoustic domain, thus defining reverberation-augmented feature-based voice data.