US 12,283,291 B1
	Factually consistent generative narrations
Noah Lirone Sarfati, Tel Aviv (IL); Ido Yerushalmy, Tel Aviv (IL); Michael Chertok, Raanana (IL); and Ianir Ideses, Raanana (IL)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Aug. 16, 2023, as Appl. No. 18/450,695.
Int. Cl. G11B 27/036 (2006.01); G10L 15/26 (2006.01); H04N 21/81 (2011.01); H04N 21/8549 (2011.01)

CPC G11B 27/036 (2013.01) [G10L 15/26 (2013.01); H04N 21/8106 (2013.01); H04N 21/8549 (2013.01)]

20 Claims

1. A computer-implemented method, comprising:

determining a video stream of an event, wherein the video stream comprises a composite audio stream that includes audio commentary and background noise;

determining, based at least in part on the video stream and using a speech transcription service, transcribed commentary of the audio commentary;

obtaining a plurality of metadata messages for the event;

determining, based at least in part on the video stream and using a clip selection system, a plurality of highlight clips; and

determining narrated highlights for each of the plurality of highlight clips by at least:

determining a first time window for a highlight clip of the plurality of highlight clips;

determining one or more metadata messages from the plurality of metadata messages from the first time window;

determining a portion of the transcribed commentary from a second time window that includes the first time window;

determining a prompt based at least in part on the one or more metadata messages and the portion of the transcribed commentary;

determining a generative narration based at least in part on the prompt and using a generative model;

validating factual consistency of the generative narration based at least in part on the one or more metadata messages and the portion of the transcribed commentary;

extracting the background noise for the highlight clip; and

producing a narrated highlight by at least replacing audio of the highlight clip with the generative narration and the background noise.