US 11,657,822 B2
Systems and methods for processing and presenting conversations
Yun Fu, Cupertino, CA (US); Simon Lau, San Jose, CA (US); Fuchun Peng, Cupertino, CA (US); Kaisuke Nakajima, Sunnyvale, CA (US); Julius Cheng, Cupertino, CA (US); Gelei Chen, Mountain View, CA (US); and Sam Song Liang, Palo Alto, CA (US)
Assigned to Otter.ai, Inc., Los Altos, CA (US)
Filed by Otter.ai, Inc., Los Altos, CA (US)
Filed on Mar. 8, 2021, as Appl. No. 17/195,202.
Application 17/195,202 is a continuation of application No. 16/027,511, filed on Jul. 5, 2018, granted, now 10,978,073.
Claims priority of provisional application 62/530,227, filed on Jul. 9, 2017.
Prior Publication US 2021/0217420 A1, Jul. 15, 2021
Int. Cl. G10L 17/00 (2013.01); G10L 15/26 (2006.01); G06F 16/34 (2019.01); G06F 16/35 (2019.01); G10L 15/08 (2006.01); H04N 7/15 (2006.01)
CPC G10L 15/26 (2013.01) [G06F 16/34 (2019.01); G06F 16/35 (2019.01); G10L 15/08 (2013.01); G10L 17/00 (2013.01); H04N 7/15 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for processing and presenting a conversation, the system comprising:
a sensor configured to, upon receipt of a user instruction, capture a live audio-form conversation;
a processor configured to, upon receiving the live audio-form conversation when the live audio-form conversation is being captured:
automatically transcribe, in real time or in near-real time with the live audio-form conversation, the live audio-form conversation into a live synchronized text, the live synchronized text being synchronized with the live audio-form conversation;
automatically generate, in real time or in near-real time with the live audio-form conversation, one or more segments of the live audio-form conversation and one or more segments of the live synchronized text by at least:
identifying when a speaker change occurs during the live audio-form conversation;
in response to the identifying when a speaker change occurs during the live audio-form conversation, automatically segmenting the live audio-form conversation and the live synchronized text such that each segment of the one or more segments of the live audio-form conversation is spoken by only one speaker;
identifying when a natural pause occurs during the live audio-form conversation;
in response to the identifying when a natural pause occurs during the live audio-form conversation, automatically segmenting the live audio-form conversation and the live synchronized text such that each segment of the one or more segments of the live audio-form conversation is synchronized with only one segment of the one or more segments of the live synchronized text; and
automatically assign, in real time or in near-real time with the live audio-form conversation, only one speaker label to each segment of the one or more segments of the live synchronized text, each one speaker label representing one speaker; and
a presenter configured to present, in real time or in near-real time with the live audio-form conversation, the labeled live synchronized text and the live audio-form conversation;
wherein in near-real time is a time delay less than one minute.