US 12,413,671 B2
	Computer-based systems configured to monitor a communication session and generate a training session and methods of use thereof
Joshua Edwards, Philadelphia, PA (US); Alexander Lin, San Francisco, CA (US); Mia Rodriguez, Broomfield, CO (US); Guadalupe Bonilla, Hyattsville, MD (US); Aysu Ezen Can, Cary, NC (US); Michael Mossoba, Great Falls, VA (US); Feng Qiu, West Chester, PA (US); Tyler Maiman, Melville, NY (US); and Meredith L. Critzer, Midlothian, VA (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Aug. 16, 2023, as Appl. No. 18/450,886.
Prior Publication US 2025/0063118 A1, Feb. 20, 2025
Int. Cl. H04M 3/00 (2024.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); H04M 3/51 (2006.01)

CPC H04M 3/5175 (2013.01) [G10L 15/1815 (2013.01); G10L 15/22 (2013.01); H04M 2201/39 (2013.01); H04M 2201/40 (2013.01); H04M 2203/403 (2013.01)]

21 Claims

1. A computer-implemented method comprising:

retrieving, by at least one processor, a predefined call script from a predefined call script library, the predefined call script having predefined intent mappings encoding predefined text representing predefined correct dialogue associated with a sample topic;

utilizing, by the at least one processor, a computer-based monitoring module to detect an error in a call script conversation based at least in part on the predefined call script;

wherein the computer-based monitoring module is configured to: utilize at least one speech-to-text deep machine learning model to transcribe a call script conversation comprising text representative of the call script conversation;

utilizing, by at least one processor, at least one natural language processing deep machine learning model to generate a plurality of call intent mappings associated from the script conversation, wherein the at least one natural language processing deep machine learning model comprises a plurality of parameters configured to encode the text of the call script into the plurality of call intent mappings to produce semantic encodings indicative of call script;

utilizing, by the at least one processor, at least one similarity measurement model to determine a semantic similarity between the predefined intent mappings and the plurality of call intent mappings based at least in part on at least one similarity measure;

determining, based on the semantic similarity, at least one error in at least one call intent mapping of the plurality of call intent mappings;

determining, by the at least one processor, a user training need based at least in part on the at least one error;

selecting, by the at least one processor, training data for a training call based at least in part on the at least one error and the script conversation;

selecting, by the at least one processor, a training call voice for the training call based at least in part on the training data;

initiating, by the at least one processor, the training call by calling a user and loading the training data in a user dashboard of a user computing device associated with the user;

utilizing, by the at least one processor, a call generation module to automatically generate caller speech for the training call based at least in part on the training data and the training call voice;

wherein the call generation module is configured to: receive user speech data representative of speech performed by a user during the training call in response to the generated caller speech;

utilizing, by the at least one processor, the at least one speech-to-text deep machine learning model to transcribe a user speech script representative of the user speech data;

utilizing, by the at least one processor, the at least one natural language processing deep machine learning model to generate at least one user speech intent mapping associated with the user speech data;

detecting, by the at least one processor, a new error in the user speech script based at least in part on the call script and the user speech script;

determining, by the at least one processor, a user training need based at least in part on, the at least one new error; and

determining, by the at least one processor, a training session initiation based at least in part on the user training need and the at least one new error.