US 12,260,883 B1
	System and method to enhance audio and video media using generative artificial intelligence
Faisal Shariff, Middlesex (GB); Richard Marsh, Warwickshire (GB); Chandrakanta Rana, Ontario (CA); Salvatore Restivo, Port Washington, NY (US); Aniroodh Suddhapalli, Telangana (IN); and Prashant Jha, Maharashtra (IN)
Assigned to Morgan Stanley Services Group Inc., New York, NY (US)
Filed by Morgan Stanley Services Group Inc., New York, NY (US)
Filed on Aug. 19, 2024, as Appl. No. 18/808,386.
Int. Cl. G11B 27/34 (2006.01); G06F 3/0484 (2022.01); G06F 40/166 (2020.01); G06F 40/58 (2020.01)

CPC G11B 27/34 (2013.01) [G06F 3/0484 (2013.01); G06F 40/166 (2020.01); G06F 40/58 (2020.01)]

17 Claims

1. A system, comprising:

a media source configured to provide an original media including first audio;

a media enhancement system, including:

a hardware-based processor;

a memory configured to store instructions and configured to provide the instructions to the hardware-based processor;

an input/output device configured to display a graphic user interface (GUI) with a media player; and

a set of modules configured to implement the instructions provided to the hardware-based processor, the set of modules including:

a transcoding media-to-text module, including a first media conversion module, executed by the hardware-based processor to automatically generate text corresponding to the first audio;

a summarizing module, including a first large language model, executed by the hardware-based processor to automatically generate a summary of the generated text; and

a chapterizing module, including a second large language model, executed by the hardware-based processor to automatically generate a plurality of chapter headings with each chapter heading corresponding to a respective portion of the generated text;

wherein the GUI outputs an enhanced media including the original media, the summary, and the plurality of chapter headings, with the media player configured to play the original media to a user,

wherein the GUI includes a display region displaying the summary and the plurality of chapter headings to the user,

wherein the original media is in a first language,

wherein the transcoding media-to-text module is executed by the hardware-based processor to generate the generated text in the first language,

wherein the GUI is configured to receive a language selection selectable from at least one second language from the user, and

wherein the set of modules includes:

a translating module, including a third large language model, executed by the hardware-based processor to automatically convert the generated text in the first language to a translated text in the language selection,

the summarizing module executed by the hardware-based processor to automatically generate a summary of the translated text, and

the chapterizing module executed by the hardware-based processor to automatically generate a plurality of chapter headings of the translated text.