US 12,087,297 B2
	Voice filtering other speakers from calls and audio messages
Matthew Sharifi, Kilchberg (CH); and Victor Carbune, Zürich (CH)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Sep. 9, 2022, as Appl. No. 17/930,822.
Application 17/930,822 is a continuation of application No. 17/086,296, filed on Oct. 30, 2020, granted, now 11,462,219.
Prior Publication US 2023/0005480 A1, Jan. 5, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/00 (2013.01); G10L 15/02 (2006.01); G10L 15/22 (2006.01); G10L 21/0208 (2013.01); G10L 21/0272 (2013.01); G10L 25/78 (2013.01); G10L 25/87 (2013.01)

CPC G10L 15/22 (2013.01) [G10L 15/02 (2013.01); G10L 21/0208 (2013.01); G10L 21/0272 (2013.01); G10L 25/78 (2013.01); G10L 25/87 (2013.01)]

20 Claims

1. A computer-implemented method executed on data processing hardware that causes the data processing hardware to perform operations comprising:

receiving raw audio data corresponding to an utterance of audible contents spoken by a user and captured by an assistant-enabled device, the raw audio data capturing one or more additional sounds that are not spoken by the user;

receiving, from an image capture device in communication with the data processing hardware, image data capturing the user while speaking the utterance of the audible contents;

extracting, from the image data, a facial image for the user;

extracting, from the raw audio data, audio features synchronized with lips of the user moving in the extracted facial image; and

processing, using the extracted audio features, the raw audio data to generate enhanced audio data that isolates the utterance of the audible contents spoken by the user and excludes at least a portion of the one or more additional sounds that are not spoken by the user.