| CPC G06N 3/084 (2013.01) [G06F 16/68 (2019.01); G06N 3/045 (2023.01); G06N 20/00 (2019.01); G10L 15/16 (2013.01)] | 15 Claims |

|
1. A method for training a Sound Effect Recommendation Network, comprising:
a) generating a positive audio embedding from a positive audio signal wherein the positive audio signal is related to a reference image;
b) generating a negative audio embedding from a negative audio signal; and
c) using a machine learning algorithm with the reference image, the positive audio embedding and the negative audio embedding as inputs to train a visual-to-audio correlation neural network to output a smaller distance between the positive audio embedding and the reference image than the negative audio embedding and the reference image, wherein the visual-to-audio correlation neural network is trained to identify one or more visual elements in the reference image and map the one or more visual elements to one or more sound categories or subcategories within an audio database,
wherein a reference audio signal is part of the audio database and wherein the reference audio signal is part of an audio visual sequence having the reference image, and wherein the positive audio signal is the reference audio signal.
|