US 12,456,456 B2
	Data augmentation system and method for multi-microphone systems
Dushyant Sharma, Mountain House, CA (US); Ljubomir Milanovic, Vienna (AT); Philipp Salletmayr, Austria (AT); Rong Gong, Vienna (AT); and Patrick A. Naylor, Reading (GB)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jan. 20, 2022, as Appl. No. 17/579,806.
Prior Publication US 2023/0230582 A1, Jul. 20, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/08 (2006.01)

CPC G10L 15/08 (2013.01)

20 Claims

1. A computer-implemented method, executed on a computing device, comprising:

obtaining a first set of speech signals from a first device, thus defining one or more first device speech signals;

obtaining a second set of speech signals from a second device, thus defining one or more second device speech signals;

selecting an acoustic relative transfer function from a plurality of acoustic relative transfer functions based upon, at least in part, speaker location information associated with at least one of the one or more first device speech signals and the one or more second device speech signals, wherein selecting the acoustic relative transfer function includes:

processing speaker location information associated with the one or more first device speech signals;

processing speaker location information associated with the one or more second device speech signals; and

comparing the speaker location information associated with the one or more first device speech signals and the speaker location information associated with the one or more second device speech signals to speaker location information associated with the plurality of acoustic relative transfer functions from an acoustic relative transfer function codebook; and

selecting the acoustic relative transfer function with speaker location information that is within at least a predefined similarity threshold of the speaker location information associated with the one or more first device speech signals and the speaker location information associated with the one or more second device speech signals; and

augmenting, at run-time, the one or more second device speech signals to match reverberation properties of the one or more first device speech signals based upon, at least in part, the acoustic relative transfer function.