| CPC G10L 15/063 (2013.01) [G06N 3/045 (2023.01); G06N 20/00 (2019.01); G10L 15/16 (2013.01); G10L 25/27 (2013.01)] | 20 Claims |

|
1. A computer-implemented method for authenticating audio signals using deep phoneprint (DP) embedding vectors, the method comprising:
executing, by the computer, a plurality of task-specific machine learning models using a plurality of features of speech and non-speech portions of an enrollment audio signal having one or more enrollment speaker-independent characteristics as an input to extract a plurality of enrollment speaker-independent embeddings for the enrollment audio signal using one or more embedding extraction layers of each of the plurality of task-specific machine learning models, the plurality of features of the enrollment audio signal including at least one of a spectro-temporal feature of the enrollment audio signal and metadata associated with the enrollment audio signal;
extracting, by the computer, an enrollment DP vector for the enrollment audio signal based upon the plurality of enrollment speaker-independent embeddings extracted for the enrollment audio signal;
executing, by the computer, the plurality of task-specific machine learning models using a plurality of features of speech and non-speech portions of an inbound audio signal having one or more inbound speaker-independent characteristics as the input to extract a plurality of inbound speaker-independent embeddings for the inbound audio signal using one or more embedding extraction layers of each of the plurality of task-specific machine learning models, the plurality of features of the inbound audio signal including at least one of a spectro-temporal feature of the inbound audio signal and metadata associated with the inbound audio signal;
extracting, by the computer, an inbound DP vector for the inbound audio signal based upon the plurality of inbound speaker-independent embeddings extracted for the inbound audio signal; and
generating, by the computer, one or more similarity scores for the inbound audio signal using the inbound DP vector and the enrollment DP vector for the enrolled audio signal.
|