| CPC G10L 15/26 (2013.01) [G06F 16/34 (2019.01); G06F 16/35 (2019.01); G10L 15/04 (2013.01); G10L 15/08 (2013.01); G10L 17/00 (2013.01); G10L 25/78 (2013.01); H04N 7/15 (2013.01)] | 13 Claims |

|
1. A system for processing and presenting a conversation, the system comprising:
a processor configured to:
receive a live audio-form conversation involving one or more speakers;
automatically transcribe the received live audio-form conversation into a live synchronized text, the live synchronized text being synchronized with the live audio-form conversation; and
after the live audio-form conversation has been transcribed into the live synchronized text,
automatically generate a plurality of segments of the live audio-form conversation, the plurality of segments of the live audio-form conversation including a first segment of the live audio-form conversation and a second segment of the live audio-form conversation; and
automatically generate a plurality of segments of the live synchronized text, the plurality of segments of the live synchronized text including a first segment of the live synchronized text and a second segment of the live synchronized text; and
a presenter configured to present the plurality of segments of the live synchronized text;
wherein the processor is further configured to:
after the live audio-form conversation has been transcribed into the live synchronized text,
identify an occurrence of a natural pause during the live audio-form conversation in which there is an absence of speech from the one or more speakers; and
in response to the identified occurrence of the natural pause,
automatically segment the live audio-form conversation into the first segment of the live audio-form conversation and the second segment of the live audio-form conversation; and
automatically segment the live synchronized text into the first segment of the live synchronized text and the second segment of the live synchronized text;
wherein the first segment of the live audio-form conversation and the second segment of the live audio-form conversation are next to each other and are spoken by a same speaker;
wherein the processor is further configured to:
automatically assign a first speaker label to the first segment of the live synchronized text; and
automatically assign a second speaker label to the second segment of the live synchronized text;
wherein the presenter is further configured to:
present the first speaker label together with the first segment of the live synchronized text; and
present the second speaker label together with the second segment of the live synchronized text;
wherein:
the first segment of the live synchronized text and the second segment of the live synchronized text are next to each other;
the first speaker label represents the same speaker;
the second speaker label represents the same speaker; and
the first speaker label and the second speaker label are the same.
|