US 11,715,487 B2
	Utilizing machine learning models to provide cognitive speaker fractionalization with empathy recognition
Mohit Chawla, New Delhi (IN); Balaji Janarthanam, Chennai (IN); Dinesh Vijayakumar, Bengaluru (IN); Sanjay Tiwari, Bengaluru (IN); Ashwini Purushothaman, Chennai (IN); Bhavika Sehgal, Ludhiana (IN); Rajesh Gala, Mumbai (IN); Saran Prasad, New Delhi (IN); Vinu Varghese, Bangalore (IN); Mohit Mahajan, Gurugram (IN); and Badarayan Panigrahi, Bangalore (IN)
Assigned to Accenture Global Solutions Limited, Dublin (IE)
Filed by Accenture Global Solutions Limited, Dublin (IE)
Filed on Mar. 31, 2021, as Appl. No. 17/218,952.
Prior Publication US 2022/0319535 A1, Oct. 6, 2022
Int. Cl. G10L 25/63 (2013.01); G06F 40/30 (2020.01); G10L 17/16 (2013.01); G10L 21/0272 (2013.01); G10L 17/02 (2013.01); G10L 25/24 (2013.01)

CPC G10L 25/63 (2013.01) [G06F 40/30 (2020.01); G10L 17/02 (2013.01); G10L 17/16 (2013.01); G10L 21/0272 (2013.01); G10L 25/24 (2013.01)]

20 Claims

8. A device, comprising:

one or more memories; and

one or more processors, communicatively coupled to the one or more memories, configured to:

receive audio data identifying a conversation including a plurality of speakers;

process the audio data, with a plurality of clustering models, to identify a plurality of speaker segments,

wherein the plurality of clustering models includes a k-means clustering model, a spectral clustering model, and an agglomerative clustering model;

determine a plurality of diarization error rates for the plurality of speaker segments;

identify a plurality of errors in the plurality of speaker segments based on comparing each of the plurality of diarization error rates to a threshold;

select a rectification model to rectify each of the plurality of errors based on a cause of a corresponding one of the plurality of errors and based on features of a corresponding one of the plurality of speaker segments;

re-segment the audio data with the rectification model to generate re-segmented audio data;

determine a plurality of modified diarization error rates for the plurality of speaker segments based on the re-segmented audio data;

select one of the plurality of speaker segments based on the plurality of modified diarization error rates;

calculate an empathy score based on the one of the plurality of speaker segments and based on an emotion score, an intent score, and a sentiment score determined based on the re-segmented audio data;

retrain the rectification model based on calculating the empathy score and utilizing the empathy score as additional training data; and

perform one or more actions based on the empathy score.