| CPC G06F 18/2132 (2023.01) [G06F 18/2415 (2023.01); G10L 21/14 (2013.01); G10L 25/18 (2013.01); G10L 25/30 (2013.01); G10L 25/51 (2013.01)] | 17 Claims |

|
1. A method for detecting fake audios, comprising:
converting audio data into an image representation of the audio data;
providing the image representation of the audio data to a trained machine-learning model, wherein the trained machine-learning model comprises a trained patch split component for splitting the image representation of the audio data into a sequence of image patches, the machine learning model:
generating, using a trained self-attention branch, one or more representation embeddings based on the image representation of the audio data; and
receiving, using a trained classifier component, the one or more representation embeddings and outputting a classification result; and
providing the classification result.
|