US 12,417,070 B2
	Video-informed spatial audio expansion
Marcin Gorzel, Dublin (IE); and Balineedu Adsumilli, Sunnyvale, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by GOOGLE LLC, Mountain View, CA (US)
Filed on Jun. 1, 2023, as Appl. No. 18/327,134.
Application 18/327,134 is a continuation of application No. 16/779,921, filed on Feb. 3, 2020, granted, now 11,704,087.
Prior Publication US 2023/0305800 A1, Sep. 28, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 3/16 (2006.01); G06V 20/40 (2022.01); G10L 25/51 (2013.01); H04S 5/00 (2006.01)

CPC G06F 3/165 (2013.01) [G06V 20/41 (2022.01); G10L 25/51 (2013.01); H04S 5/005 (2013.01)]

20 Claims

1. A method of assigning spatial information to audio segments, comprising:

receiving first video frames that include a visual object;

receiving a first audio segment, wherein the first audio segment includes an auditory event associated with the visual object, and wherein the first audio segment is non-spatialized;

upon determining that second video frames do not include the visual object and that a first time difference between the first video frames and the second video frames are not longer than a certain time, using a motion vector of the visual object to assign a spatial location to the auditory event in at least one of the second video frames;

receiving a second audio segment, wherein the second audio segment includes the auditory event;

receiving third video frames, wherein the third video frames do not include the visual object;

upon determining that the third video frames do not include the visual object and that a second time difference between the first video frames and the third video frames are longer than the certain time, assigning the auditory event to a diffuse sound field; and

generating an audio output that includes spatial locations of the visual object to a listener.