US 12,033,656 B2
Diarisation augmented reality aide
Mauro Marzorati, Lutz, FL (US); Raghuram Srinivasan, Aurora, IL (US); Sarbajit K. Rakshit, Kolkata (IN); and Jeremy R. Fox, Georgetown, TX (US)
Assigned to KYNDRYL, INC., New York, NY (US)
Filed by KYNDRYL, INC., New York, NY (US)
Filed on Jun. 19, 2021, as Appl. No. 17/352,284.
Prior Publication US 2022/0406327 A1, Dec. 22, 2022
Int. Cl. G10L 21/00 (2013.01); G02B 27/00 (2006.01); G02B 27/01 (2006.01); G06N 3/08 (2023.01); G06T 15/20 (2011.01); G06T 19/00 (2011.01); G10L 15/18 (2013.01); G10L 21/10 (2013.01)
CPC G10L 21/10 (2013.01) [G02B 27/0093 (2013.01); G02B 27/0101 (2013.01); G02B 27/017 (2013.01); G06N 3/08 (2013.01); G06T 15/205 (2013.01); G06T 19/006 (2013.01); G10L 15/1822 (2013.01); G02B 2027/0138 (2013.01); G06T 2215/16 (2013.01)] 20 Claims
OG exemplary drawing
 
15. A system, the system comprising:
a memory, the memory containing one or more instructions; and
a processor, the processor communicatively coupled to the memory, the processor, in response to reading the one or more instructions, configured to:
receive, from an image capture device, an image of a real-world environment that includes one or more users;
determine, by a processor and based on the image, a mask status of a first user of the one or more users;
capture, from one or more audio transceivers, a stream of audio that includes speech from the one or more users;
identify, by the processor and based on the stream of audio, a first user speech from the stream of audio;
parse, by the processor and based on the first user speech and based on an audio processing technique, the stream of audio to create a first user speech element; and
generate, based on the first user speech and based on the mask status, an augmented view for an augmented reality device, wherein the augmented view includes the first user speech element, wherein the augmented view is generated for displaying by the augmented reality device.
 
18. A computer program product, the computer program product comprising:
one or more computer readable storage media; and
program instructions collectively stored on the one or more computer readable storage media, the program instructions configured to:
receive, from an image capture device, an image of a real-world environment that includes one or more users;
determine, by a processor and based on the image, a mask status of a first user of the one or more users;
capture, from one or more audio transceivers, a stream of audio that includes speech from the one or more users;
identify, by the processor and based on the stream of audio, a first user speech from the stream of audio;
parse, by the processor and based on the first user speech and based on an audio processing technique, the stream of audio to create a plurality of user speech elements; and
generate, based on the first user speech and based on the mask status, a plurality of augmented views each including a respective one of the plurality of user speech elements, wherein each of the plurality of user speech elements has a different shape to convey a different part of the first user speech, wherein the plurality of augmented views are generated for an augmented reality device.