US 12,431,112 B2
Systems and methods for transforming digital audio content
John Ivers, Winnetka, IL (US); Theo Rosendorf, Tucker, GA (US); Kevin Carlson, Woodstock, GA (US); Michael Kakoyiannis, New York, NY (US); and Sherry Mills, Setauket, NY (US)
Assigned to Tree Goat Media, INC, New York, NY (US)
Filed by TREE GOAT MEDIA, INC., New York, NY (US)
Filed on Feb. 15, 2022, as Appl. No. 17/672,154.
Application 17/672,154 is a continuation in part of application No. 17/172,201, filed on Feb. 10, 2021, granted, now 11,749,241.
Application 17/172,201 is a continuation of application No. 16/506,231, filed on Jul. 9, 2019, granted, now 10,971,121, issued on Apr. 6, 2021.
Claims priority of provisional application 63/149,891, filed on Feb. 16, 2021.
Claims priority of provisional application 62/814,018, filed on Mar. 5, 2019.
Claims priority of provisional application 62/695,439, filed on Jul. 9, 2018.
Prior Publication US 2022/0208155 A1, Jun. 30, 2022
Int. Cl. G10H 1/00 (2006.01); G06F 16/68 (2019.01); G06F 16/683 (2019.01)
CPC G10H 1/0008 (2013.01) [G06F 16/685 (2019.01); G06F 16/686 (2019.01); G10H 2220/106 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system for creating multimedia moments from audio data comprising:
(a) a server comprising one or more processors;
(b) a model database configured to store a plurality of moment models, wherein each moment model of the plurality of moment models is configured to identify a unique moment type, wherein the plurality of moment models comprises a base moment model;
(c) a transcript database configured to store a plurality of transcript datasets, wherein each transcript dataset of the plurality of transcript datasets comprises text derived from corresponding audio data and is time indexed to the corresponding audio data;
wherein the one or more processors are configured to:
(i) receive an episode audio dataset;
(ii) create a transcript dataset based on the episode audio dataset, and add the transcript dataset to the plurality of transcript datasets;
(iii) determine whether the plurality of moment models comprises a focused moment model for the episode audio dataset, and use the focused moment model as a selected moment model;
(iv) where the focused moment model is not within the plurality of moment models, use the base moment model as the selected moment model;
(v) analyze the transcript dataset using the selected moment model to identify a plurality of moments within the transcript dataset, wherein the plurality of moments comprises a set of positive moments that are of high relevance to the unique moment type;
(vi) for at least one positive moment of the set of positive moments, create a multimedia moment based on that positive moment, wherein the multimedia moment comprises a transcript text from the transcript dataset that corresponds to that positive moment, an audio segment from the episode audio dataset that corresponds to the transcript text, and a moment type that describes the unique moment type associated with that positive moment; and
(vii) cause a user interface that is based on the multimedia moment to display on a user device, wherein the user interface is configured to
(1) present the transcript text in synchronized alignment with the audio segment,
(2) accept user feedback regarding relevance of the multimedia moment to the unique moment type, and
(3) update a training dataset associated with the selected moment model based on the user feedback.