CPC H04N 21/2353 (2013.01) [G06N 20/00 (2019.01); G06V 20/47 (2022.01); H04N 21/26603 (2013.01)] | 16 Claims |
12. A system for annotating media content, including:
a memory; and
one or more processors coupled to the memory and configured to:
obtain media content;
generate, use one or more machine learning models, a metadata file for at least a portion of the media content, the metadata file including one or more metadata descriptions;
obtain a plurality of template sentences, each template sentence of the plurality of template sentences including one or more placeholder metadata tags;
determine a subset of metadata descriptions from the one or more metadata descriptions having confidence scores greater than a confidence threshold;
generate a plurality of sentences at least in part by replacing placeholder metadata tags of the plurality of template sentences with one or more metadata descriptions of the subset of metadata descriptions;
determine, using a machine learning model, a subset of sentences from the plurality of sentences to generate a scene description; and
annotate the media content use the scene description.
|