US 12,033,619 B2
Intelligent media transcription
Clement Decrop, Arlington, VA (US); Tushar Agrawal, West Fargo, ND (US); Jeremy R. Fox, Georgetown, TX (US); and Sarbajit K. Rakshit, Kolkata (IN)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Nov. 12, 2020, as Appl. No. 17/095,797.
Prior Publication US 2022/0148583 A1, May 12, 2022
Int. Cl. G10L 15/183 (2013.01); G06N 20/00 (2019.01); G09B 5/02 (2006.01); G09B 19/00 (2006.01); G10L 15/01 (2013.01); G10L 15/02 (2006.01); G10L 15/06 (2013.01); G10L 15/22 (2006.01)
CPC G10L 15/183 (2013.01) [G06N 20/00 (2019.01); G10L 15/01 (2013.01); G10L 15/02 (2013.01); G10L 15/063 (2013.01); G10L 15/22 (2013.01); G09B 5/02 (2013.01); G09B 19/003 (2013.01); G10L 2015/225 (2013.01)] 13 Claims
OG exemplary drawing
 
1. A computer-implemented method for transcribing media, the method comprising:
collecting media of a user, wherein the media comprises content of a presentation given by the user;
extracting one or more features from the media, wherein the extracting is performed using machine learning techniques comprising a convolutional neural network and long short-term memory to parse the collected media and extract the one or more features, and wherein the one or more extracted features comprise one or more speech features;
determining, using one or more models, a transcription style based on the one or more extracted speech features, wherein the one or more models are trained through use of a feedback loop to weight the one or more extracted speech features such that features having a greater correlation with determined particular transcription styles are weighted greater than other features, and wherein the transcription style specifies a transcription format;
transcribing, using the one or more models, the media, according to the determined transcription style, based on the one or more features, and their associated weights, wherein one or more text portions of the transcription are highlighted and bolded based on respective importance values extracted from the one or more features, and wherein an importance value of a text portion of the transcription indicates whether or not a topic of the text portion will be on an exam;
notifying the user of the highlighted and bolded transcription in the determined particular style via a device of the user, wherein the notifying is performed according to preferences of the user; and
receiving, from the user, confirmation of an accuracy of the transcription and approval of the transcription prior to notifying one or more other users of the transcription.