US 11,853,892 B2
Learning to segment via cut-and-paste
Matthew Alun Brown, Seattle, WA (US); Jonathan Chung-Kuan Huang, Seattle, WA (US); and Tal Remez, Tel-Aviv (IL)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Appl. No. 17/252,663
Filed by Google LLC, Mountain View, CA (US)
PCT Filed Jul. 10, 2019, PCT No. PCT/US2019/041103
§ 371(c)(1), (2) Date Dec. 15, 2020,
PCT Pub. No. WO2020/014294, PCT Pub. Date Jan. 16, 2020.
Claims priority of provisional application 62/696,447, filed on Jul. 11, 2018.
Prior Publication US 2021/0256707 A1, Aug. 19, 2021
Int. Cl. G06T 7/194 (2017.01); G06T 7/11 (2017.01); G06T 11/20 (2006.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06N 3/084 (2023.01); G06N 3/045 (2023.01)
CPC G06N 3/084 (2013.01) [G06N 3/045 (2023.01); G06T 7/11 (2017.01); G06T 7/194 (2017.01); G06T 11/20 (2013.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2210/12 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computing system, comprising:
one or more processors; and
one or more non-transitory computer-readable media that collectively store:
a generative adversarial network that comprises a generator model and a discriminator model; and
instructions that, when executed by the one or more processors, cause the computing system to perform operations to train the generative adversarial network for object segmentation, the operations comprising:
obtaining a first image that depicts an object, the first image comprising a plurality of pixels;
predicting, by the generator model, a segmentation mask for the object, wherein the segmentation mask identifies a subset of the plurality of pixels that correspond to the object;
extracting a first portion of the first image based at least in part on the segmentation mask, wherein the first portion comprises the subset of the plurality of pixels;
generating a second image by pasting the first portion of the first image onto a background image portion;
providing, by the discriminator model, a discrimination output that indicates a judgment by the discriminator model that the second image is authentic or inauthentic; and
modifying one or more parameters of the generator model based at least in part on the discrimination output provided by the discriminator model.