US 12,266,181 B2
Text-based framework for video object selection
Shivam Nalin Patel, Mountain View, CA (US); Kshitiz Garg, Santa Clara, CA (US); Han Guo, San Jose, CA (US); Ali Aminian, San Jose, CA (US); and Aashish Misraa, San Jose, CA (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by Adobe Inc., San Jose, CA (US)
Filed on Nov. 19, 2021, as Appl. No. 17/531,568.
Prior Publication US 2023/0162502 A1, May 25, 2023
Int. Cl. G06V 20/40 (2022.01); G06F 18/214 (2023.01); G06F 18/23 (2023.01); G06F 18/25 (2023.01); G06F 40/205 (2020.01)
CPC G06V 20/49 (2022.01) [G06F 18/214 (2023.01); G06F 18/23 (2023.01); G06F 18/251 (2023.01); G06F 40/205 (2020.01); G06V 20/46 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving a user input and an input video comprising a plurality of frames;
generating a plurality of segmentation masks for the plurality of frames;
determining a set of reference masks corresponding to the user input and an object;
generating a set of fusion masks by combining the plurality of segmentation masks and the set of reference masks;
propagating the set of fusion masks between the plurality of segmentation masks; and
outputting a final set of masks for the input video.