US 12,315,031 B2
	High fidelity interactive segmentation for video data with deep convolutional tessellations and context aware skip connections
Anthony Rhodes, Portland, OR (US); and Manan Goel, Portland, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by INTEL CORPORATION, Santa Clara, CA (US)
Filed on Sep. 28, 2023, as Appl. No. 18/374,508.
Application 18/374,508 is a division of application No. 16/773,715, filed on Jan. 27, 2020, granted, now 11,928,753.
Prior Publication US 2024/0029193 A1, Jan. 25, 2024
Int. Cl. G06T 1/20 (2006.01); G06F 18/241 (2023.01); G06T 3/4046 (2024.01); G06T 7/11 (2017.01); G06T 7/174 (2017.01); G06T 9/00 (2006.01); G06V 10/26 (2022.01); G06V 10/764 (2022.01); G06V 20/40 (2022.01)

CPC G06T 1/20 (2013.01) [G06F 18/241 (2023.01); G06T 3/4046 (2013.01); G06T 7/11 (2017.01); G06T 7/174 (2017.01); G06T 9/002 (2013.01); G06V 10/26 (2022.01); G06V 10/764 (2022.01); G06V 20/46 (2022.01); G06V 20/49 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20221 (2013.01)]

15 Claims

1. A system for providing segmentation in video, the system comprising:

a memory to store a current video frame;

machine-readable instructions; and

at least one processor circuit to be programmed based on the machine-readable instructions to:

resize the current video frame to a resized current video frame, the resized current video frame including a plurality of sub-images having dimensions corresponding to dimensions of an object classification convolutional neural network;

apply the object classification convolutional neural network to the sub-images and retrieve, for each pixel of each of the sub-images, a plurality of feature values each from one of a plurality of layers of the object classification convolutional neural network to generate an object classification output volume;

resize the object classification output volume to dimensions of the current video frame;

combine a feature volume with a plurality of feature frames each including features compressed from the resized object classification output volume to generate an input volume, the feature volume including at least the current video frame, a temporally previous video frame, a temporally previous segmentation frame, and an object of interest indicator frame, the object of interest indicator frame including one or more indicators of an object of interest in the current video frame; and

apply a segmentation convolutional neural network to the input volume to generate a current segmentation frame for the current video frame.