US 12,462,813 B1
	Data-driven audio deepfake detection
Gaurav Bharaj, Los Angeles, CA (US); Petr Grinberg, Lausanne (CH); Ankur Kumar, Los Angeles, CA (US); and Surya Koppisetti, Coquitlam (CA)
Assigned to Reality Defender, Inc., New York, NY (US)
Filed by Reality Defender, Inc., New York, NY (US)
Filed on May 6, 2025, as Appl. No. 19/200,316.
Claims priority of provisional application 63/760,008, filed on Feb. 18, 2025.
Int. Cl. G10L 17/26 (2013.01); G10L 17/18 (2013.01); G10L 25/57 (2013.01)

CPC G10L 17/26 (2013.01) [G10L 17/18 (2013.01); G10L 25/57 (2013.01)]

22 Claims

1. A method for generating a visual representation of manipulations in an audio signal, comprising:

inputting the audio signal into a trained machine-learning model, wherein the machine-learning model is trained by:

generating, based on a training bona fide audio signal, a training bona fide time-frequency representation;

generating, based on a training spoofed audio signal, a training spoofed time-frequency representation, wherein the training spoofed audio signal is a manipulated version of the training bona fide audio signal;

generating a training visual representation of manipulations in the training spoofed audio signal based at least on a difference between the training bona fide time-frequency representation and the training spoofed time-frequency representation; and

training the audio deepfake detection machine-learning model based on the training visual representation of the manipulations in the training spoofed audio signal; and

generating, by the machine-learning model, the visual representation of the manipulations in the audio signal.