US 12,277,501 B2
	Training a sound effect recommendation network
Sudha Krishnamurthy, Foster City, CA (US)
Assigned to Sony Interactive Entertainment Inc., Tokyo (JP)
Filed by Sony Interactive Entertainment Inc., Tokyo (JP)
Filed on Jul. 3, 2023, as Appl. No. 18/217,745.
Application 18/217,745 is a continuation of application No. 16/848,484, filed on Apr. 14, 2020, granted, now 11,694,084.
Prior Publication US 2023/0385646 A1, Nov. 30, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/084 (2023.01); G06F 16/68 (2019.01); G06N 3/045 (2023.01); G06N 20/00 (2019.01); G10L 15/16 (2006.01)

CPC G06N 3/084 (2013.01) [G06F 16/68 (2019.01); G06N 3/045 (2023.01); G06N 20/00 (2019.01); G10L 15/16 (2013.01)]

15 Claims

1. A method for training a Sound Effect Recommendation Network, comprising:

a) generating a positive audio embedding from a positive audio signal wherein the positive audio signal is related to a reference image;

b) generating a negative audio embedding from a negative audio signal; and

c) using a machine learning algorithm with the reference image, the positive audio embedding and the negative audio embedding as inputs to train a visual-to-audio correlation neural network to output a smaller distance between the positive audio embedding and the reference image than the negative audio embedding and the reference image, wherein the visual-to-audio correlation neural network is trained to identify one or more visual elements in the reference image and map the one or more visual elements to one or more sound categories or subcategories within an audio database,

wherein a reference audio signal is part of the audio database and wherein the reference audio signal is part of an audio visual sequence having the reference image, and wherein the positive audio signal is the reference audio signal.