| CPC G10L 15/22 (2013.01) [G06F 3/013 (2013.01); G06V 20/59 (2022.01); G06V 40/166 (2022.01); G06V 40/19 (2022.01); G06V 40/20 (2022.01); G10L 15/20 (2013.01); G10L 15/25 (2013.01); G10L 15/26 (2013.01); G10L 21/0208 (2013.01); G06V 40/18 (2022.01); G10L 2015/223 (2013.01); G10L 2015/227 (2013.01); G10L 15/24 (2013.01); G10L 2021/02166 (2013.01); G10L 25/78 (2013.01); H04R 2430/20 (2013.01); H04R 2460/07 (2013.01); H04R 2499/11 (2013.01)] | 20 Claims |

|
11. A system comprising:
data processing hardware; and
memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising:
obtaining image data comprising a representation of a first user and a second user;
obtaining audio data comprising:
a first voice data corresponding to the first user speaking; and
a second voice data corresponding to the second user speaking;
associating, based on the image data, the first voice data to a first voice-recognition database of the first user speaking and the second voice data to a second voice-recognition database of the second user speaking;
generating, using speech-to-text conversion, a transcription of the audio data;
annotating, based on the first voice data associated with the first voice- recognition database, a first portion of the transcription corresponding to the first voice data with a first annotation identifying the first user; and
annotating, based on the second voice data associated with the second voice-recognition database, a second portion of the transcription corresponding to the second voice data with a second annotation identifying the second user.
|