US 12,079,731 B2
Audiovisual source separation and localization using generative adversarial networks
Chuang Gan, Cambridge, MA (US); and Yang Zhang, Cambridge, MA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Oct. 20, 2022, as Appl. No. 17/969,848.
Application 17/969,848 is a continuation of application No. 16/394,261, filed on Apr. 25, 2019, granted, now 11,501,532.
Prior Publication US 2023/0044635 A1, Feb. 9, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 20/40 (2022.01); G06N 3/045 (2023.01); G06N 3/088 (2023.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01)
CPC G06N 3/088 (2013.01) [G06N 3/045 (2023.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06V 20/48 (2022.01)] 24 Claims
OG exemplary drawing
 
1. A method for audiovisual source separation processing, the method comprising:
receiving video data including images of a plurality of sound sources;
receiving an optical flow data of the video data, the optical flow data indicating motions of pixels between frames of the video data; and
encoding, by a generative adversarial network (GAN) system, the received video data into video localization data comprising information associating pixels in the frames of video data with different channels of sound; and
encoding, by the GAN system, the received optical flow data into video separation data.