US 12,217,472 B2
	Segmenting and removing objects from visual media items
Orly Liba, Mountain View, CA (US); Nikhil Karnad, Mountain View, CA (US); Nori Kanazawa, Mountain View, CA (US); Yael Pritch Knaan, Mountain View, CA (US); Huizhong Chen, Mountain View, CA (US); and Longqi Cai, Mountain View, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Oct. 18, 2022, as Appl. No. 17/968,634.
Claims priority of provisional application 63/257,114, filed on Oct. 18, 2021.
Prior Publication US 2023/0118460 A1, Apr. 20, 2023
Int. Cl. G06V 10/26 (2022.01); G06T 5/20 (2006.01); G06T 5/77 (2024.01); G06T 5/94 (2024.01); G06T 11/00 (2006.01); G06V 10/764 (2022.01); G06V 10/774 (2022.01); G06V 20/20 (2022.01)

CPC G06V 10/273 (2022.01) [G06T 5/20 (2013.01); G06T 5/77 (2024.01); G06T 5/94 (2024.01); G06T 11/00 (2013.01); G06V 10/764 (2022.01); G06V 10/774 (2022.01); G06V 20/20 (2022.01); G06T 2200/24 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

generating training data that includes a first set of media items and a second set of media items, wherein the first set of media items include distracting objects and the second set of media items include manual segmentations of the distracting objects;

identifying one or more original media items in the first set of media items that include one or more broken powerlines;

generating one or more corrected media items that correct the one or more broken powerlines;

generating one or more augmented media items for the training data by blending portions of the one or more corrected media items with portions of respective one or more original media items to increase a randomness of augmentation; and

training a segmentation machine-learning model based on the training data to receive a media item with one or more distracting objects and to output a segmentation mask for one or more segmented objects that correspond to the one or more distracting objects.

9. A computer-implemented method to remove a distracting object from a media item, the method comprising:

receiving a media item from a user;

identifying one or more distracting objects in the media item;

providing the media item to a trained segmentation machine-learning model;

outputting, with the trained segmentation machine-learning model, a segmentation mask for the one or more distracting objects in the media item; and

inpainting a portion of the media item that matches the segmentation mask to obtain an output media item, wherein the one or more distracting objects are absent from the output media item;

wherein the trained segmentation machine-learning model is trained by generating training data by:

identifying one or more original media items in a first set of media items that include one or more broken powerlines;

generating one or more corrected media items that correct the one or more broken powerlines; and