CPC G06F 40/40 (2020.01) [G06V 10/774 (2022.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); G06V 20/70 (2022.01)] | 20 Claims |
1. A computer-implemented method, comprising:
processing a segment of a video to generate a plurality of embedding vectors, each embedding vector of the plurality of embedding vectors corresponding to at least a portion of a subject matter represented in the segment of the video;
aggregating the plurality of embedding vectors to generate a feature embedding for the segment;
determining, based at least in part on a distance in a vector space between the feature embedding and a prior feature embedding generated for a prior segment of the video, that the segment is different than the prior segment;
in response to determining that the segment is different than the prior segment, generating, based at least in part on the feature embedding, a descriptive text indicative of the feature embedding;
generating, based at least in part on the descriptive text, a natural language description of the subject matter represented in the segment; and
presenting the natural language description.
|