US 12,114,048 B2
	Automated voice translation dubbing for prerecorded videos
Terrance Paul McCartney, Jr., Allison Park, PA (US); Brian Colonna, Pittsburgh, PA (US); and Michael Nechyba, Pittsburgh, PA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Feb. 13, 2023, as Appl. No. 18/109,243.
Application 18/109,243 is a continuation of application No. 16/975,696, granted, now 11,582,527, previously published as PCT/US2018/019779, filed on Feb. 26, 2018.
Prior Publication US 2023/0199264 A1, Jun. 22, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. H04N 7/10 (2006.01); G06F 40/30 (2020.01); G06F 40/58 (2020.01); H04N 21/43 (2011.01); H04N 21/488 (2011.01)

CPC H04N 21/4884 (2013.01) [G06F 40/30 (2020.01); G06F 40/58 (2020.01); H04N 21/43074 (2020.08)]

20 Claims

1. A method comprising:

identifying, by a processing device, original caption data for a video having audio that includes speech recorded in an original language, wherein the original caption data comprises a plurality of caption character strings in the original language and associated with audio portion of the video;

identifying, by the processing device, translated language caption data for the video, wherein the translated language caption data comprises a plurality of translated character strings associated with the audio portion of the video;

generating, by the processing device, a set of caption sentence fragments from the plurality of caption character strings and a set of translated sentence fragments from the plurality of translated character strings;

mapping, by the processing device, caption sentence fragments of the set of caption sentence fragments to corresponding translated sentence fragments of the set of translated sentence fragments based on timing associated with the original caption data and the translated language caption data;

estimating, by the processing device, time intervals for individual caption sentence fragments of the set of caption sentence fragments using timing information corresponding to individual caption character strings;

assigning, by the processing device, time intervals to individual translated sentence fragments of the set of translated sentence fragments based on estimated time intervals of the individual caption sentence fragments;

generating, by the processing device, a set of translated sentences using consecutive translated sentence fragments of the set of translated sentence fragments; and

aligning, by the processing device, the set of translated sentences with the audio portion of the video using assigned time intervals of individual translated sentence fragments from corresponding translated sentences.