US 12,411,910 B2
Audio spoof detection using attention-based contrastive learning
Gaurav Bharaj, Los Angeles, CA (US); Chirag Goel, Montreal (CA); Surya Koppisetti, Coquitlam (CA); Ben Colman, New York, NY (US); and Ali Shahriyari, Las Vegas, NV (US)
Assigned to Reality Defender, Inc., New York, NY (US)
Filed by Reality Defender, Inc., New York, NY (US)
Filed on Nov. 20, 2024, as Appl. No. 18/954,182.
Application 18/954,182 is a continuation of application No. 18/426,016, filed on Jan. 29, 2024, granted, now 12,189,712.
Prior Publication US 2025/0245296 A1, Jul. 31, 2025
This patent is subject to a terminal disclaimer.
Int. Cl. G10L 25/30 (2013.01); G06F 18/2132 (2023.01); G06F 18/2415 (2023.01); G10L 21/14 (2013.01); G10L 25/18 (2013.01); G10L 25/51 (2013.01)
CPC G06F 18/2132 (2023.01) [G06F 18/2415 (2023.01); G10L 21/14 (2013.01); G10L 25/18 (2013.01); G10L 25/30 (2013.01); G10L 25/51 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method for detecting fake audios, comprising:
converting audio data into an image representation of the audio data;
providing the image representation of the audio data to a trained machine-learning model, wherein the trained machine-learning model comprises a trained patch split component for splitting the image representation of the audio data into a sequence of image patches, the machine learning model:
generating, using a trained self-attention branch, one or more representation embeddings based on the image representation of the audio data; and
receiving, using a trained classifier component, the one or more representation embeddings and outputting a classification result; and
providing the classification result.