US 12,412,562 B2
	Systems and methods to improve trust in conversations
Lan Guan, New York, NY (US); Neeraj D Vadhan, Los Altos, CA (US); Guanglei Xiong, Pleasanton, CA (US); Anwitha Paruchuri, San Jose, CA (US); Sukryool Kang, Dublin, CA (US); Sujeong Cha, Long Island City, NY (US); Anupam Anurag Tripathi, Chicago, IL (US); Thomas Wayne Hancock, Oklahoma City, OK (US); Jill Gengelbach-Wylie, Austin, TX (US); and Jayashree Subrahmonia, San Jose, CA (US)
Assigned to Accenture Global Solutions Limited, Dublin (IE)
Filed by Accenture Global Solutions Limited, Dublin (IE)
Filed on Apr. 29, 2022, as Appl. No. 17/732,944.
Prior Publication US 2023/0352003 A1, Nov. 2, 2023
Int. Cl. G10L 15/22 (2006.01); G06F 40/169 (2020.01); G06F 40/20 (2020.01); G06N 20/00 (2019.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/16 (2006.01); G06T 13/20 (2011.01); G10L 25/18 (2013.01); G10L 25/30 (2013.01)

CPC G10L 15/063 (2013.01) [G06F 40/169 (2020.01); G06F 40/20 (2020.01); G06N 20/00 (2019.01); G10L 15/02 (2013.01); G10L 15/16 (2013.01); G10L 15/22 (2013.01); G06T 13/205 (2013.01); G10L 25/18 (2013.01); G10L 25/30 (2013.01)]

17 Claims

1. A system comprising:

a non-transitory memory storing instructions executable to construct a machine learning network to quantify a trust score and to automate trust delivery with a digital avatar by generating a trustworthy voice for the digital avatar; and

a processor in communication with the non-transitory memory, wherein, the processor executes the instructions to cause the system to:

obtain a set of vocal features and a set of text features for each sample in audio samples;

obtain a trust score for each sample;

perform a preprocess on the set of vocal features and the set of text features to obtain a set of input features for each sample;

determine a type of machine-learning algorithm for the machine-learning network based on a training result of the machine-learning network;

tune a set of hyper parameters for the machine-learning network based on a cross validation according to the machine-learning network;

generate a predicated trust score by the machine-learning network with the sets of input features for each sample;

train the machine-learning network based on the predicated trust score and the trust score for each sample to obtain the training result;

generate a set of trust components for a user by the machine-learning network;

concatenate the set of trust components with a user profile of the user to obtain an expanded user profile;

train a second machine-learning network by input the expanded user profile to recommend features for improving trust scores; and

generate a list of recommended features for the user by the trained second machine learning network based on the expanded user profile,

wherein generating the trustworthy voice for the digital avatar comprises:

receiving an input text and a reference trustworthy tone sample;

collecting a sequence of phonemes and a Mel spectrogram from the input text using a text to speech module;

encoding the Mel spectrogram with an input encoder to generate an input embedding;

encoding the reference trustworthy tone sample with a trust encoder and concatenating with the input embedding to generate an output;

processing the output of the concatenation through a location-sensitive attention layer using cumulative attention weights to generate an encoded input sequence;

predicting a Mel spectrogram with a decoder from the encoded input sequence; and

generating the trustworthy voice for the digital avatar from the Mel spectrogram using a vocoder, wherein the digital avatar is configured to replace the user in a conversation.