CPC G06V 20/46 (2022.01) [G06F 16/64 (2019.01); G06F 18/214 (2023.01); G06F 18/217 (2023.01); G06F 18/253 (2023.01); G06N 3/08 (2013.01); G06V 20/41 (2022.01); G10L 19/02 (2013.01)] | 20 Claims |
1. A computer-implemented method comprising:
receiving an input including a video sequence and a request to determine an audio sequence recommendation for the video sequence;
analyzing the video sequence to determine frame level video features and video level video features of the video sequence;
sending the frame level video features and the video level video features to an audio reasoning module to generate an audio vector for the video sequence; and
determining the audio sequence recommendation for the video sequence by comparing the generated audio vector for the video sequence with stored audio vectors for a plurality of stored audio sequences.
|
8. A non-transitory computer-readable storage medium including instructions stored thereon which, when executed by at least one processor, cause the at least one processor to:
receive an input including a video sequence and a request to determine an audio sequence recommendation for the video sequence;
analyze the video sequence to determine frame level video features and video level video features of the video sequence;
send the frame level video features and the video level video features to an audio reasoning module to generate an audio vector for the video sequence; and
determine the audio sequence recommendation for the video sequence by comparing the generated audio vector for the video sequence with stored audio vectors for a plurality of stored audio sequences.
|
15. A computer-implemented method comprising:
receiving, by a machine-learning backed service, a request to determine an audio sequence recommendation for a video sequence;
processing, by an audio reasoning module, frame level video features and video level video features of the video sequence to generate an audio vector representation of the video sequence;
determining the audio sequence recommendation for the video sequence using the generated audio vector representation of the video sequence; and
returning the determined audio sequence recommendation.
|