US 11,887,313 B2
Computing platform using machine learning for foreground mask estimation
Hugh Ross Sanderson, Shenton Park (AU); Henrik Levring, Manila (PH); and Julien Charles Flack, Swanbourne (AU)
Assigned to SplitmediaLabs Limited, Kwun Tong (HK)
Appl. No. 18/247,308
Filed by SplitmediaLabs Limited, Kwun Tong (HK)
PCT Filed Sep. 26, 2021, PCT No. PCT/CN2021/120724
§ 371(c)(1), (2) Date Mar. 30, 2023,
PCT Pub. No. WO2022/068735, PCT Pub. Date Apr. 7, 2022.
Claims priority of provisional application 63/085,285, filed on Sep. 30, 2020.
Prior Publication US 2023/0267620 A1, Aug. 24, 2023
Int. Cl. G06T 7/194 (2017.01); G06V 20/40 (2022.01); G06V 10/82 (2022.01); G06V 10/774 (2022.01)
CPC G06T 7/194 (2017.01) [G06V 10/774 (2022.01); G06V 10/82 (2022.01); G06V 20/49 (2022.01); G06T 2200/24 (2013.01); G06T 2207/10016 (2013.01); G06T 2207/10024 (2013.01); G06T 2207/20076 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computing platform comprising:
at least one processor;
a communication interface communicatively coupled to the at least one processor; and
memory storing computer-readable instructions that, when executed by the at least one processor, cause the computing platform to:
receive a set of images and corresponding ground truth foreground masks;
train, using the set of images and corresponding ground truth foreground masks, a first neural network to distinguish between image foregrounds and backgrounds, wherein training the first neural network results in a first set of foreground masks each corresponding to an image of the set of images, wherein distinguishing between the image foregrounds and the backgrounds comprises separating an identified subject of a video from a corresponding background, and wherein training the first neural network comprises training the first neural network using a first subset of the set of images corresponding to a first resolution and a first component configuration and a second subset of the set of images corresponding to a second resolution and a second component configuration;
estimate, for each image of the set of images and based on the foreground mask inferred by the first neural network, and images from a sequence temporarily related to the image, a first background clean plate;
train, using the set of images, the first background clean plates, and a set of corresponding ground truth mask images, a second neural network, wherein training the second neural network configures the second neural network to output foreground masks based on video input information; and
deploy, to an implementation computing device, the second neural network.