US 12,272,371 B1
Real-time target speaker audio enhancement
Ritwik Giri, Sunnyvale, CA (US); Shrikant Venkataramani, Champaign, IL (US); Jean-Marc Valin, Montreal (CA); Mehmet Umut Isik, Menlo Park, CA (US); and Arvindh Krishnaswamy, Palo Alo, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 30, 2021, as Appl. No. 17/364,805.
Int. Cl. G06F 17/00 (2019.01); G06N 20/00 (2019.01); G10L 21/013 (2013.01); G10L 21/0364 (2013.01); G10L 21/038 (2013.01)
CPC G10L 21/0364 (2013.01) [G06N 20/00 (2019.01); G10L 21/013 (2013.01); G10L 21/038 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
at least one processor; and
a memory, storing program instructions that when executed by the at least one processor, cause the at least one processor to implement an audio enhancement system, the audio enhancement system configured to:
receive audio data comprising speaker audio data via an interface for the audio enhancement system;
determine a plurality of input features for the audio data based on a representation of the audio data in an equivalent rectangular bandwidth scale;
obtain an embedding for a speaker generated from input features of an audio data sample for the speaker determined based on a representation of the audio data sample in the equivalent rectangular bandwidth scale;
apply a machine learning model trained to provide one or more modifications to enhance the speaker audio data within the audio data, wherein the machine learning model concatenates the embedding for the speaker with input features of the audio data to determine respective gain values for respective bands of the representation of the audio data in the equivalent rectangular bandwidth scale;
apply the one or more modifications to the audio data to generate an enhanced version of the audio data according to the respective gain values for the respective bands of the representation of the audio data in the equivalent rectangular bandwidth scale; and
send, via the interface of the audio enhancement system, the enhanced version of the audio data to a destination.