US 12,315,515 B2
	Method and system for user voice identification using ensembled deep learning algorithms
Shanshan Tuo, San Jose, CA (US); Divya Beeram, Fremont, CA (US); Meng Chen, Sunnyvale, CA (US); Neo Yuchen, Arcadia, CA (US); Wan Yu Zhang, Milpitas, CA (US); Nivethitha Kumar, Cupertino, CA (US); Kavita Sundar, Redwood City, CA (US); and Tomer Tal, Cupertino, CA (US)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by INTUIT INC., Mountain View, CA (US)
Filed on Jan. 30, 2024, as Appl. No. 18/426,488.
Application 18/426,488 is a continuation of application No. 17/183,006, filed on Feb. 23, 2021, granted, now 11,929,078.
Prior Publication US 2024/0169994 A1, May 23, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 17/04 (2013.01); G06F 21/32 (2013.01); G06N 20/20 (2019.01); G10L 17/18 (2013.01); G10L 17/26 (2013.01); G10L 21/0208 (2013.01)

CPC G10L 17/04 (2013.01) [G06F 21/32 (2013.01); G06N 20/20 (2019.01); G10L 17/18 (2013.01); G10L 17/26 (2013.01); G10L 21/0208 (2013.01)]

20 Claims

1. A method for training a user detection model to identify a user of a software application based on voice recognition, comprising:

receiving a data set including a plurality of voice recordings;

generating, for each respective recording in the data set, a spectrogram representation based on the respective recording;

training one or more voice recognition models, wherein each model of the one or more voice recognition models is trained based on the spectrogram representation for each of the plurality of voice recordings in the data set;

selecting, for a selected speaker of a plurality of speakers, an evaluation set of recordings;

identifying a similar speaker to the selected speaker by:

providing inputs based on the evaluation set of recordings to the one or more voice recognition models, and

receiving an output from the one or more voice recognition models identifying the similar speaker as the selected speaker;

re-training the one or more voice recognition models based on a mapping of the selected speaker to the identified similar speaker; and

deploying the one or more voice recognition models to an interactive voice response system.