US 12,354,607 B2
	Secure automatic speaker verification system
Hafiz Malik, Ann Arbor, MI (US); Syed Irtaza, Ann Arbor, MI (US); and Muteb Aljasem, Dearborn, MI (US)
Assigned to The Regents of The University of Michigan, Ann Arbor, MI (US)
Appl. No. 17/792,283
Filed by THE REGENTS OF THE UNIVERSITY OF MICHIGAN, Ann Arbor, MI (US)
PCT Filed Jan. 12, 2021, PCT No. PCT/US2021/013131 § 371(c)(1), (2) Date Jul. 12, 2022, PCT Pub. No. WO2021/146214, PCT Pub. Date Jul. 22, 2021.
Claims priority of provisional application 62/960,356, filed on Jan. 13, 2020.
Prior Publication US 2023/0073364 A1, Mar. 9, 2023
Int. Cl. G10L 17/04 (2013.01); G10L 17/06 (2013.01); G10L 25/24 (2013.01)

CPC G10L 17/04 (2013.01) [G10L 17/06 (2013.01); G10L 25/24 (2013.01)]

18 Claims

9. A computer-implemented method for speaker verification, comprising:

receiving, by a signal processor, an audio signal from an unknown speaker;

extracting, by the signal processor, a first feature from the audio signal, where the first feature is indicative of variability of the audio signal, wherein the first feature is derived by grouping data samples of the audio signal into frames and, for each frame, quantizing each data sample in a given frame in accordance with a difference in magnitude of a data sample with magnitude of a reference data sample in the given frame, thereby creating a pattern of values indicative of the variability of the audio signal;

extracting, by the signal processor, additional features from the audio signal, where the additional features represent the power spectrum of the audio signal;

constructing, by the signal processor, a feature vector by concatenating the first feature with the additional features;

classifying, by a first classifier, the audio signal using the feature vector, where the first classifier is trained to identify recorded audio signals;

classifying, by a second classifier, the audio signal using the feature vector, where the second classifier is trained to identify computer generated audio signals;

classifying, by a third classifier, the audio signal using the feature vector, where the third classifier is trained to identify authentic audio signals; and

labeling the audio signal as one of authentic, record or computer generated based on output from the first classifier, the second classifier and the third classifier.