US 12,248,603 B2
	Sanitizing personally identifiable information (PII) in audio and visual data
Todd Mozer, Los Altos Hills, CA (US); Pieter Vermeulen, Portland, OR (US); and Jonathan Welch, Douglasville, GA (US)
Assigned to Sensory, Incorporated, Santa Clara, CA (US)
Filed by Sensory, Incorporated, Santa Clara, CA (US)
Filed on Jan. 19, 2022, as Appl. No. 17/579,383.
Prior Publication US 2023/0229803 A1, Jul. 20, 2023
Int. Cl. G06F 21/00 (2013.01); G06F 21/62 (2013.01); G06T 5/70 (2024.01); G06T 7/194 (2017.01); G06V 20/62 (2022.01); G06V 20/70 (2022.01); G06V 40/16 (2022.01); G10L 15/02 (2006.01); G10L 21/007 (2013.01)

CPC G06F 21/6245 (2013.01) [G06T 5/70 (2024.01); G06T 7/194 (2017.01); G06V 20/63 (2022.01); G06V 20/70 (2022.01); G06V 40/171 (2022.01); G10L 15/02 (2013.01); G10L 21/007 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/30196 (2013.01)]

18 Claims

1. A method comprising:

receiving, by a computing device, one or more data samples from a capture module, the one or more data samples including an audio data sample;

sanitizing, by the computing device, personally identifiable information (PII) from the audio data sample, resulting in a sanitized version of the audio data sample, the sanitizing comprising:

identifying speech regions in the audio data sample uttered by a speaker;

providing the speech regions as input to a first machine learning (ML) model, wherein the first ML model is trained by receiving training data from a plurality of different speakers speaking various sentences or words and learning to output audio samples of the various sentences or words as spoken by a default speaker;

in response to the providing, receiving from the first ML model one or more audio samples of the speech regions as spoken by the default speaker, wherein the first ML model determines the one or more audio samples without extracting speech features from the speech regions; and

combining the one or more audio samples to generate the sanitized version of the audio data sample; and

forwarding, by the computing device, the sanitized version of the audio data sample to a ML training and inference system comprising a second ML model different from the first ML model, wherein the second ML model is an ML model that operates on identity-neutral information, and wherein the ML training and inference system uses the sanitized version of the audio data sample to train the second ML model in a manner that does not rely on the PII.