US 12,242,648 B2
Privacy-aware meeting room transcription from audio-visual stream
Oliver Siohan, Mountain View, CA (US); Takaki Makino, Mountain View, CA (US); Richard Rose, Mountain View, CA (US); Otavio Braga, Mountain View, CA (US); Hank Liao, New York, NY (US); and Basilio Garcia Castillo, Mountain View, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 11, 2023, as Appl. No. 18/535,214.
Application 18/535,214 is a continuation of application No. 17/755,892, granted, now 12,118,123, previously published as PCT/US2019/062024, filed on Nov. 18, 2019.
Prior Publication US 2024/0104247 A1, Mar. 28, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/62 (2013.01); G10L 17/02 (2013.01); H04L 12/18 (2006.01)
CPC G06F 21/6254 (2013.01) [G10L 17/02 (2013.01); H04L 12/1831 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method executed by data processing hardware that causes the data processing hardware to perform operations comprising:
receiving an audio signal comprising audio data, the audio data comprising a plurality of speech utterances in a speech environment;
receiving a privacy request indicating a privacy condition, the privacy condition comprising a content-specific condition indicating a type of content to exclude from a transcript;
prior to generating any transcript of the audio data that corresponds to the type of content:
processing the audio data to identify one or more first speech utterances of the plurality of speech utterances that do not correspond to the type of content; and
processing the audio data to identify one or more second speech utterances of the plurality of speech utterances that correspond to the type of content; and
generating, based on the audio data, the transcript, the transcript comprising the one or more first speech utterances of the plurality of speech utterances that do not correspond to the type of content and excluding the one or more second speech utterances of the plurality of speech utterances that correspond to the type of content.