US 12,456,465 B2
Systems and methods for processing and presenting conversations
Yun Fu, Cupertino, CA (US); Simon Lau, San Jose, CA (US); Fuchun Peng, Cupertino, CA (US); Kaisuke Nakajima, Sunnyvale, CA (US); Julius Cheng, Cupertino, CA (US); Gelei Chen, Mountain View, CA (US); and Sam Song Liang, Palo Alto, CA (US)
Assigned to Otter.ai, Inc., Mountain View, CA (US)
Filed by Otter.ai, Inc., Los Altos, CA (US)
Filed on Apr. 7, 2023, as Appl. No. 18/131,982.
Application 18/131,982 is a continuation of application No. 17/195,202, filed on Mar. 8, 2021.
Application 17/195,202 is a continuation of application No. 16/027,511, filed on Jul. 5, 2018, granted, now 10,978,073, issued on Apr. 13, 2021.
Claims priority of provisional application 62/530,227, filed on Jul. 9, 2017.
Prior Publication US 2023/0245660 A1, Aug. 3, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 15/04 (2013.01); G06F 16/34 (2019.01); G06F 16/35 (2019.01); G10L 15/26 (2006.01); G10L 15/08 (2006.01); G10L 17/00 (2013.01); G10L 25/78 (2013.01); H04N 7/15 (2006.01)
CPC G10L 15/26 (2013.01) [G06F 16/34 (2019.01); G06F 16/35 (2019.01); G10L 15/04 (2013.01); G10L 15/08 (2013.01); G10L 17/00 (2013.01); G10L 25/78 (2013.01); H04N 7/15 (2013.01)] 13 Claims
OG exemplary drawing
 
1. A system for processing and presenting a conversation, the system comprising:
a processor configured to:
receive a live audio-form conversation involving one or more speakers;
automatically transcribe the received live audio-form conversation into a live synchronized text, the live synchronized text being synchronized with the live audio-form conversation; and
after the live audio-form conversation has been transcribed into the live synchronized text,
automatically generate a plurality of segments of the live audio-form conversation, the plurality of segments of the live audio-form conversation including a first segment of the live audio-form conversation and a second segment of the live audio-form conversation; and
automatically generate a plurality of segments of the live synchronized text, the plurality of segments of the live synchronized text including a first segment of the live synchronized text and a second segment of the live synchronized text; and
a presenter configured to present the plurality of segments of the live synchronized text;
wherein the processor is further configured to:
after the live audio-form conversation has been transcribed into the live synchronized text,
identify an occurrence of a natural pause during the live audio-form conversation in which there is an absence of speech from the one or more speakers; and
in response to the identified occurrence of the natural pause,
automatically segment the live audio-form conversation into the first segment of the live audio-form conversation and the second segment of the live audio-form conversation; and
automatically segment the live synchronized text into the first segment of the live synchronized text and the second segment of the live synchronized text;
wherein the first segment of the live audio-form conversation and the second segment of the live audio-form conversation are next to each other and are spoken by a same speaker;
wherein the processor is further configured to:
automatically assign a first speaker label to the first segment of the live synchronized text; and
automatically assign a second speaker label to the second segment of the live synchronized text;
wherein the presenter is further configured to:
present the first speaker label together with the first segment of the live synchronized text; and
present the second speaker label together with the second segment of the live synchronized text;
wherein:
the first segment of the live synchronized text and the second segment of the live synchronized text are next to each other;
the first speaker label represents the same speaker;
the second speaker label represents the same speaker; and
the first speaker label and the second speaker label are the same.