| CPC G06T 7/248 (2017.01) [G06T 7/74 (2017.01); G06T 2207/20084 (2013.01)] | 14 Claims |

|
1. An information processing apparatus comprising:
at least one processor; and
at least one memory storing instructions, which when executed by the processor, cause the information processing apparatus to:
obtain a plurality of frame images in time-sequential order;
generate a reference image that is a partial image including a target object image to be tracked from a first frame image included in the plurality of frame images;
determine a plurality of positions in the reference image to be used for tracking processing of the target object image;
generate a plurality of first features corresponding to the plurality of positions by inputting the reference image to a feature extraction neural network;
generate a search image that is a target of the tracking processing of the target object image and is a partial image of a second frame image that is included in the plurality of frame images and follows the first frame image;
generate a second feature by inputting the search image to the feature extraction neural network;
identify a position of the target object image included in the search image based on a result of a correlation operation between each of the plurality of first features and the second feature;
generate a plurality of likelihood maps each having a likelihood value representing a position likelihood of the target object image included in the search image based on the result of the correlation operation between each of the plurality of first features and the second feature;
identify the position of the target object image included in the search image based on the plurality of likelihood maps; and
generate a size map of the target object image included in the search image based on the result of the correlation operation between each of the plurality of first features and the second feature, and further identifies a size of the target object image based on the size map and the position of the target object image,
wherein the search image is cut out from the second frame image based on the position and/or the size of the target object image in a frame image preceding the second frame image.
|