US 12,136,433 B2
	Eyewear including diarization
Jonathan Geddes, Saratoga Springs, UT (US); Jennica Pounds, Bellevue, WA (US); Ryan Pruden, Seattle, WA (US); Jonathan M. Rodriguez, II, La Habra, CA (US); and Andrei Rybin, Lehi, UT (US)
Assigned to Snap Inc., Santa Monica, CA (US)
Filed by Jonathan Geddes, Saratoga Springs, UT (US); Jennica Pounds, Bellevue, WA (US); Ryan Pruden, Seattle, WA (US); Jonathan M. Rodriguez, II, La Habra, CA (US); and Andrei Rybin, Lehi, UT (US)
Filed on May 28, 2020, as Appl. No. 16/885,606.
Prior Publication US 2021/0375301 A1, Dec. 2, 2021
Int. Cl. G10L 21/0272 (2013.01); G09G 5/32 (2006.01); G10L 17/00 (2013.01); G10L 17/02 (2013.01); G10L 17/18 (2013.01)

CPC G10L 21/0272 (2013.01) [G09G 5/32 (2013.01); G10L 17/00 (2013.01); G10L 17/02 (2013.01); G10L 17/18 (2013.01); G09G 2354/00 (2013.01)]

14 Claims

1. Eyewear, comprising:

a frame;

a display supported by the frame;

a microphone coupled to the frame; and

a camera configured to generate an image including an object; and

an electronic processor configured to:

receive speech from a plurality of human speakers via the microphone;

identify the plurality of human speakers;

perform diarization on the received speech to segment spoken language into different speakers;

display text associated with each speaker on the display;

display a user created graphical depiction of a person associated with and indicative of the identified speaker proximate the text of the associated speaker such that an eyewear user can visually associate the text to the respective speaker;

process pitch and intonation of the received speech;

establish a color for received speech based on the pitch and intonation;

display the text in the established color based on the pitch and intonation;

adjust font size of the text by increasing a font attribute based on a decibel level of the received speech above a first threshold and decreasing the font attribute based on a decibel level of the received speech below a second threshold;

determine the object in the image; and

generate speech indicative of the object responsive to a speech command.