US 12,380,909 B2
Context-data based speech enhancement
Kyungguen Byun, San Diego, CA (US); Shuhua Zhang, San Diego, CA (US); Lae-Hoon Kim, San Diego, CA (US); Erik Visser, San Diego, CA (US); Sunkuk Moon, San Diego, CA (US); and Vahid Montazeri, Newport Beach, CA (US)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Jun. 14, 2023, as Appl. No. 18/334,641.
Application 18/334,641 is a continuation of application No. 17/209,621, filed on Mar. 23, 2021, granted, now 11,715,480.
Prior Publication US 2023/0326477 A1, Oct. 12, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 21/0232 (2013.01); G06N 3/0455 (2023.01); G10L 21/02 (2013.01); G10L 21/038 (2013.01)
CPC G10L 21/0232 (2013.01) [G06N 3/0455 (2023.01); G10L 21/02 (2013.01); G10L 21/038 (2013.01)] 30 Claims
OG exemplary drawing
 
1. A device to perform speech enhancement, the device comprising:
one or more processors configured to:
process image data to detect at least one of an emotion, a speaker identification, or a noise type;
generate context data that represents the at least one of the emotion, the speaker identification, or the noise type;
obtain input spectral data based on an input signal that corresponds to the image data, the input signal representing sound that includes speech;
provide the input spectral data to a first encoder of a multi-encoder transformer to generate first encoded data;
provide the context data to at least a second encoder of the multi-encoder transformer to generate second encoded data;
provide the first encoded data and the second encoded data to a decoder of the multi-encoder transformer to generate output spectral data that represents a speech enhanced version of the input signal; and
perform speech synthesis on the output spectral data to generate an output waveform corresponding to an enhanced version of the speech.