CPC G06V 10/82 (2022.01) [G06F 18/2155 (2023.01); G06F 40/169 (2020.01); G06N 3/08 (2013.01); G06V 20/10 (2022.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 20/47 (2022.01); G06V 20/70 (2022.01); G06V 30/194 (2022.01); G06V 30/19173 (2022.01)] | 20 Claims |
1. An apparatus, comprising:
at least one memory; and
logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to:
receive a source video comprising a plurality of frames;
determine a plurality of regions for the plurality of frames;
generate at least one region-sequence connecting the determined plurality of regions based on at least one selection criterion, the at least one selection criterion comprises a coherency selection criterion configured to maximize a cosine similarity between the plurality of regions of the at least one-region sequence; and
apply a language model to the at least one region-sequence to generate description information comprising a description of at least a portion of content of the source video.
|