US 12,273,697 B2
	Systems and methods for upmixing audiovisual data
Aren Jansen, Mountain View, CA (US); Manoj Plakal, New York, NY (US); Dan Ellis, New York, NY (US); Shawn Hershey, Kirkland, WA (US); and Richard Channing Moore, III, Brooklyn, NY (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 18/042,258
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Aug. 26, 2020, PCT No. PCT/US2020/047930 § 371(c)(1), (2) Date Feb. 20, 2023, PCT Pub. No. WO2022/046045, PCT Pub. Date Mar. 3, 2022.
Prior Publication US 2023/0308823 A1, Sep. 28, 2023
Int. Cl. H04S 7/00 (2006.01); H04S 3/00 (2006.01)

CPC H04S 7/301 (2013.01) [H04S 2400/01 (2013.01)]

20 Claims

1. A computer-implemented method for upmixing audiovisual data, the computer-implemented method comprising:

obtaining, by a computing system comprising one or more computing devices, audiovisual data comprising input audio data and video data accompanying the input audio data, wherein each frame of the video data depicts only a portion of a larger scene, and wherein the input audio data has a first number of audio channels;

providing, by the computing system, the audiovisual data as input to a machine-learned audiovisual upmixing model, the audiovisual upmixing model comprising a sequence-to-sequence model configured to model a respective location of one or more audio sources within the larger scene over multiple frames of the video data; and

receiving, by the computing system, upmixed audio data from the audiovisual upmixing model, the upmixed audio data having a second number of audio channels, the second number of audio channels greater than the first number of audio channels.