US 12,069,345 B2
	Characterizing content for audio-video dubbing and other transformations
Grant L. Duncan, Burbank, CA (US); Elizabeth McKenzie, Sherman Oaks, CA (US); Iris Grethmann, London (GB); and Jonathan Whiles, London (GB)
Assigned to WARNER BROS. ENTERTAINMENT INC., Burbank, CA (US)
Filed by WARNER BROS. ENTERTAINMENT INC., Burbank, CA (US)
Filed on Apr. 17, 2021, as Appl. No. 17/233,443.
Application 17/233,443 is a continuation of application No. PCT/US2019/056828, filed on Oct. 17, 2019.
Claims priority of provisional application 62/747,634, filed on Oct. 18, 2018.
Prior Publication US 2021/0352380 A1, Nov. 11, 2021
Int. Cl. H04N 21/81 (2011.01); H04N 21/439 (2011.01); H04N 21/44 (2011.01); H04N 21/488 (2011.01); H04N 21/845 (2011.01)

CPC H04N 21/8106 (2013.01) [H04N 21/4394 (2013.01); H04N 21/44008 (2013.01); H04N 21/4884 (2013.01); H04N 21/8456 (2013.01)]

20 Claims

1. A computer-implemented method for creating a scripting data for dubbed audio of a media data, the method comprising:

receiving, by one or more processors, the media data comprising one or more recorded vocal instances;

extracting, by the one or more processors, the one or more recorded vocal instances from an audio portion of the media data;

determining, by the one or more processors, a semantic encoding for a human-perceivable message encoded in the media data based on an analysis of a scene or a character represented in the one or more recorded vocal instances;

assigning, by the one or more processors, a time code to each of the extracted vocal instances and to the determined semantic encoding, correlated to a specific frame of the media data;

converting, by the one or more processors, the extracted recorded vocal instances into a text data;

generating, by the one or more processors, a dubbing list comprising the text data and the time code;

displaying, by the one or more processors, a selected portion of the dubbing list corresponding to the one or more vocal instances;

assigning, by the one or more processors, a set of annotations corresponding to the one or more vocal instances based on the determined semantic encoding; and

generating, by the one or more processors, the scripting data comprising the dubbing list and the set of annotations.