US 12,136,433 B2
Eyewear including diarization
Jonathan Geddes, Saratoga Springs, UT (US); Jennica Pounds, Bellevue, WA (US); Ryan Pruden, Seattle, WA (US); Jonathan M. Rodriguez, II, La Habra, CA (US); and Andrei Rybin, Lehi, UT (US)
Assigned to Snap Inc., Santa Monica, CA (US)
Filed by Jonathan Geddes, Saratoga Springs, UT (US); Jennica Pounds, Bellevue, WA (US); Ryan Pruden, Seattle, WA (US); Jonathan M. Rodriguez, II, La Habra, CA (US); and Andrei Rybin, Lehi, UT (US)
Filed on May 28, 2020, as Appl. No. 16/885,606.
Prior Publication US 2021/0375301 A1, Dec. 2, 2021
Int. Cl. G10L 21/0272 (2013.01); G09G 5/32 (2006.01); G10L 17/00 (2013.01); G10L 17/02 (2013.01); G10L 17/18 (2013.01)
CPC G10L 21/0272 (2013.01) [G09G 5/32 (2013.01); G10L 17/00 (2013.01); G10L 17/02 (2013.01); G10L 17/18 (2013.01); G09G 2354/00 (2013.01)] 14 Claims
OG exemplary drawing
 
1. Eyewear, comprising:
a frame;
a display supported by the frame;
a microphone coupled to the frame; and
a camera configured to generate an image including an object; and
an electronic processor configured to:
receive speech from a plurality of human speakers via the microphone;
identify the plurality of human speakers;
perform diarization on the received speech to segment spoken language into different speakers;
display text associated with each speaker on the display;
display a user created graphical depiction of a person associated with and indicative of the identified speaker proximate the text of the associated speaker such that an eyewear user can visually associate the text to the respective speaker;
process pitch and intonation of the received speech;
establish a color for received speech based on the pitch and intonation;
display the text in the established color based on the pitch and intonation;
adjust font size of the text by increasing a font attribute based on a decibel level of the received speech above a first threshold and decreasing the font attribute based on a decibel level of the received speech below a second threshold;
determine the object in the image; and
generate speech indicative of the object responsive to a speech command.