| CPC G11B 27/036 (2013.01) [G10L 15/26 (2013.01); H04N 21/8106 (2013.01); H04N 21/8549 (2013.01)] | 20 Claims |

|
1. A computer-implemented method, comprising:
determining a video stream of an event, wherein the video stream comprises a composite audio stream that includes audio commentary and background noise;
determining, based at least in part on the video stream and using a speech transcription service, transcribed commentary of the audio commentary;
obtaining a plurality of metadata messages for the event;
determining, based at least in part on the video stream and using a clip selection system, a plurality of highlight clips; and
determining narrated highlights for each of the plurality of highlight clips by at least:
determining a first time window for a highlight clip of the plurality of highlight clips;
determining one or more metadata messages from the plurality of metadata messages from the first time window;
determining a portion of the transcribed commentary from a second time window that includes the first time window;
determining a prompt based at least in part on the one or more metadata messages and the portion of the transcribed commentary;
determining a generative narration based at least in part on the prompt and using a generative model;
validating factual consistency of the generative narration based at least in part on the one or more metadata messages and the portion of the transcribed commentary;
extracting the background noise for the highlight clip; and
producing a narrated highlight by at least replacing audio of the highlight clip with the generative narration and the background noise.
|