CPC G10L 25/60 (2013.01) [G06N 20/20 (2019.01); G10L 15/02 (2013.01)] | 17 Claims |
1. A computer-implemented method comprising:
extracting from an inbound audio signal for an inbound speaker, by a computer, a feature vector for one or more acoustic features;
generating, by the computer, one or more quality measures and an overall quality measure for the inbound audio signal, by applying a first machine-learning architecture to the feature vector for the one or more acoustic features, the one or more quality measures corresponding to_a similarity between one or more expected quality descriptors and one or more quality descriptors for the call audio of the inbound audio signal;
extracting, by the computer, an inbound speaker embedding for the inbound speaker from the one or more acoustic features for the inbound audio signal, by applying a second machine-learning architecture to the feature vector for the one or more acoustic features of the inbound audio signal;
generating, by the computer, a first similarity score for the inbound speaker based upon the inbound speaker embedding and an enrolled voiceprint for an enrolled speaker, by applying the second machine-learning architecture;
generating, by the computer, a second similarity score for verifying the inbound speaker, the second similarity score generated based upon the one or more quality measures and the first similarity score; and
verifying, by the computer, the inbound speaker as the enrolled speaker based upon comparing the second similarity score against a verification threshold.
|