US 12,141,541 B1
	Video to narration
Pragyana K. Mishra, Seattle, WA (US)
Assigned to Armada Systems, Inc., San Francisco, CA (US)
Filed by Armada Systems, Inc., San Francisco, CA (US)
Filed on Oct. 6, 2023, as Appl. No. 18/482,823.
Int. Cl. G06F 40/40 (2020.01); G06V 10/774 (2022.01); G06V 20/40 (2022.01); G06V 20/70 (2022.01)

CPC G06F 40/40 (2020.01) [G06V 10/774 (2022.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); G06V 20/70 (2022.01)]

20 Claims

1. A computer-implemented method, comprising:

processing a segment of a video to generate a plurality of embedding vectors, each embedding vector of the plurality of embedding vectors corresponding to at least a portion of a subject matter represented in the segment of the video;

aggregating the plurality of embedding vectors to generate a feature embedding for the segment;

determining, based at least in part on a distance in a vector space between the feature embedding and a prior feature embedding generated for a prior segment of the video, that the segment is different than the prior segment;

in response to determining that the segment is different than the prior segment, generating, based at least in part on the feature embedding, a descriptive text indicative of the feature embedding;

generating, based at least in part on the descriptive text, a natural language description of the subject matter represented in the segment; and

presenting the natural language description.