US 12,316,827 B2
Enhanced audiovisual synchronization using synthesized natural signals
Andrew Collins, Eversley (GB); Alexander Charles Mackin, Stonehouse (GB); Benoit Quentin Arthur Vallade, London (GB); David William Higham, Sheffield (GB); and Erdem Durgut, London (GB)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Dec. 12, 2022, as Appl. No. 18/064,680.
Prior Publication US 2024/0195949 A1, Jun. 13, 2024
Int. Cl. H04N 17/00 (2006.01); H04N 5/272 (2006.01); H04N 5/28 (2006.01); H04N 21/218 (2011.01); H04N 21/242 (2011.01)
CPC H04N 17/00 (2013.01) [H04N 5/272 (2013.01); H04N 5/28 (2013.01); H04N 21/21805 (2013.01); H04N 21/242 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for using synthesized content to detect audiovisual synchronization in a video pipeline, the method comprising:
receiving, by at least one processor of a first device, a first video feed from a first camera at a first televised event, the first video feed comprising first synthesized content captured by the camera from a second device and preceding the first televised event, the first synthesized content comprising a first virtual representation of a first person performing a first action having a first sound, wherein the first synthesized content is generated based on content of the first televised event;
receiving, by the at least one processor, a second video feed from a second camera at the televised event, the second video feed comprising second synthesized content captured by the camera from a third device and preceding the first televised event, the second synthesized content comprising a second virtual representation of a second person performing a second action having a second sound, wherein the second synthesized content is generated based on the content of the first televised event;
detecting, by the at least one processor, a first delay time between first audio associated with the first sound and first video associated with the first action of the first synthesized content in the first video feed;
detecting, by the at least one processor, a second delay time between second audio associated with the second sound and second video associated with the second action of the second synthesized content in the second video feed;
generating, by the at least one processor, based on the first delay time and the second delay time, time-synchronized audio and video content comprising a first portion of the first video feed for a first time period and a second portion of the second video feed for a second time period; and
sending, by the at least one processor, the time-synchronized audio and video content to a fourth device for presentation of the televised event.