US 12,073,851 B2
Determining conversation analysis indicators for a multiparty conversation
Andrew Reece, San Francisco, CA (US); Peter Bull, San Francisco, CA (US); Gus Cooney, San Francisco, CA (US); Casey Fitzpatrick, San Francisco, CA (US); Gabriella Rosen Kellerman, San Francisco, CA (US); and Ryan Sonnek, New Hope, MN (US)
Assigned to BetterUp, Inc., San Francisco, CA (US)
Filed by BetterUp, Inc., San Francisco, CA (US)
Filed on Jul. 11, 2022, as Appl. No. 17/811,868.
Application 17/811,868 is a continuation of application No. 16/798,242, filed on Feb. 21, 2020, granted, now 11,417,330.
Prior Publication US 2022/0343911 A1, Oct. 27, 2022
Int. Cl. G10L 25/00 (2013.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06V 10/764 (2022.01); G06V 10/774 (2022.01); G10L 15/04 (2013.01); G10L 15/16 (2006.01); G10L 15/22 (2006.01); G10L 15/24 (2013.01); G10L 25/48 (2013.01); G10L 25/63 (2013.01); G06V 40/16 (2022.01); G06V 40/18 (2022.01)
CPC G10L 25/48 (2013.01) [G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06V 10/764 (2022.01); G06V 10/774 (2022.01); G10L 15/04 (2013.01); G10L 15/16 (2013.01); G10L 15/22 (2013.01); G10L 15/24 (2013.01); G10L 25/63 (2013.01); G06V 40/174 (2022.01); G06V 40/18 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method to generate a conversation analysis, the method comprising:
receiving multiple utterance representations,
wherein each utterance representation represents a portion of a conversation performed by at least two users, wherein one utterance representation represents a particular verbalized statement from one user, and
wherein each utterance representation is associated with one or more of: video data, acoustic data, and text data; and
generating a first utterance output by applying a first utterance representation, that is associated with a first user and that is of the multiple utterance representations, to a machine learning system, wherein generating the first utterance output includes one or more of:
applying video data of the first utterance representation to a first video processing part of the machine learning system to generate first video-based output;
applying acoustic data of the first utterance representation to a first acoustic processing part of the machine learning system to generate first acoustic-based output; and
applying text data of the first utterance representation to a first textual processing part of the machine learning system to generate first text-based output;
wherein the machine learning system includes memory functionality integration such that an internal state of the machine learning system computationally tracks utterances.