| CPC H04N 21/4307 (2013.01) [H04N 19/60 (2014.11)] | 24 Claims |

|
1. A system comprising:
a hardware processor; and
a memory storing a video/audio (V/A) synchronizer including a video encoder and an audio encoder;
the hardware processor configured to execute the V/A synchronizer to:
receive raw video and raw audio extracted from media content;
partition the raw video into a plurality of video frame patches;
partition the raw audio into a plurality of audio samples;
pre-process the plurality of video frame patches for encoding to provide a plurality of pre-processed video frame patches;
pre-process the plurality of audio samples for encoding to provide a plurality of pre-processed audio samples;
encode, using the video encoder, the plurality of pre-processed video frame patches to provide a plurality of pre-processed and encoded video frame patches;
encode, using the audio encoder, the plurality of pre-processed audio samples to provide a plurality of pre-processed and encoded audio samples;
provide, using one or more of the plurality of pre-processed and encoded video frame patches, a latent representation of the raw video;
provide, using the plurality of pre-processed and encoded audio samples, a latent representation of the raw audio; and
synchronize, using the latent representation of the raw video and the latent representation of the raw audio, the raw audio with the raw video.
|