US 12,080,068 B2
	Deep learning system for determining audio recommendations based on video content
Karunakar Gautam, New Delhi (IN); Rahul Gandhi, New Delhi (IN); and Anandita Chopra, Patiala (IN)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Jun. 28, 2021, as Appl. No. 17/361,014.
Prior Publication US 2022/0414381 A1, Dec. 29, 2022
Int. Cl. G06V 20/40 (2022.01); G06F 16/64 (2019.01); G06F 18/21 (2023.01); G06F 18/214 (2023.01); G06F 18/25 (2023.01); G06N 3/08 (2023.01); G10L 19/02 (2013.01)

CPC G06V 20/46 (2022.01) [G06F 16/64 (2019.01); G06F 18/214 (2023.01); G06F 18/217 (2023.01); G06F 18/253 (2023.01); G06N 3/08 (2013.01); G06V 20/41 (2022.01); G10L 19/02 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

receiving an input including a video sequence and a request to determine an audio sequence recommendation for the video sequence;

analyzing the video sequence to determine frame level video features and video level video features of the video sequence;

sending the frame level video features and the video level video features to an audio reasoning module to generate an audio vector for the video sequence; and

determining the audio sequence recommendation for the video sequence by comparing the generated audio vector for the video sequence with stored audio vectors for a plurality of stored audio sequences.

8. A non-transitory computer-readable storage medium including instructions stored thereon which, when executed by at least one processor, cause the at least one processor to:

receive an input including a video sequence and a request to determine an audio sequence recommendation for the video sequence;

analyze the video sequence to determine frame level video features and video level video features of the video sequence;

send the frame level video features and the video level video features to an audio reasoning module to generate an audio vector for the video sequence; and

determine the audio sequence recommendation for the video sequence by comparing the generated audio vector for the video sequence with stored audio vectors for a plurality of stored audio sequences.

15. A computer-implemented method comprising:

receiving, by a machine-learning backed service, a request to determine an audio sequence recommendation for a video sequence;

processing, by an audio reasoning module, frame level video features and video level video features of the video sequence to generate an audio vector representation of the video sequence;

determining the audio sequence recommendation for the video sequence using the generated audio vector representation of the video sequence; and

returning the determined audio sequence recommendation.