US 12,437,358 B2
Performing segmentation of objects in media items based on user input
Orly Liba, Mountain View, CA (US); Navin Sarma, Mountain View, CA (US); Yael Pritch Knaan, Mountain View, CA (US); Alexander Schiffhauer, Mountain View, CA (US); Longqi Cai, Mountain View, CA (US); David Jacobs, Mountain View, CA (US); Huizhong Chen, Mountain View, CA (US); Siyang Li, Mountain View, CA (US); and Bryan Feldman, Mountain View, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Oct. 18, 2022, as Appl. No. 17/968,645.
Claims priority of provisional application 63/257,111, filed on Oct. 18, 2021.
Prior Publication US 2023/0118361 A1, Apr. 20, 2023
Int. Cl. G06T 7/11 (2017.01); G06T 3/40 (2006.01); G06T 5/77 (2024.01)
CPC G06T 3/40 (2013.01) [G06T 5/77 (2024.01); G06T 7/11 (2017.01); G06T 2200/24 (2013.01); G06T 2207/20081 (2013.01); G06T 2210/12 (2013.01); G06T 2210/22 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving user input that indicates one or more objects to be erased from a media item;
translating the user input to a bounding box;
providing a crop of the media item based on the bounding box to a segmentation machine-learning model;
outputting, with the segmentation machine-learning model, a segmentation mask for one or more segmented objects in the crop of the media item and a corresponding segmentation score that indicates a quality of the segmentation mask;
determining that the segmentation mask is invalid based on one or more of: the corresponding segmentation score failing to meet a threshold score, a number of valid mask pixels falling below a threshold number of pixels, a segmentation mask size falling below a threshold size, or the segmentation mask being greater than a threshold distance from a region indicated by the user input; and
responsive to determining that the segmentation mask is invalid, generating a different mask based on a region within the user input.