| CPC G06V 40/161 (2022.01) [G06V 40/171 (2022.01); G06V 2201/07 (2022.01)] | 20 Claims |

|
1. A method of tracking a target face across successive input frames of an input media, the method comprising the steps of:
acquiring a plurality of input frames corresponding to the input media;
detecting a presence of the target face within at least a subset of input frames from the plurality of input frames;
determining a location of the target face in each of the input frames in which the target face was detected;
determining an interruption in detection of target faces in one or more intermediate input frames in the plurality of input frames and determining if the interruption meets a predetermined threshold value;
responsive to determining that the interruption meets the predetermined threshold:
identifying a location of a target bounding box on a predetermined number of input frames preceding the interruption, wherein each target bounding box has an area determined by the target face and at least partially encloses the target face;
identifying a location of one or more facial landmarks associated with the target face for each of the preceding input frames;
identifying a location of a target bounding box on a predetermined number of input frames succeeding the interruption, wherein each target bounding box has an area determined by the target face and at least partially encloses the target face;
identifying a location of one or more facial landmarks associated with the target face for each of the succeeding input frames;
calculating a predicted location of a predicted bounding box on each of the intermediate input frames on which there was an interruption in the detection of the target face, wherein the predicted location is based on the identified locations of the target bounding boxes on the preceding input frames and the succeeding input frames; and
calculating a predicted location of one or more facial landmarks of the target faces on each of the intermediate input frames on which there was the interruption in the detection of the target face, wherein the predicted location is based on the identified locations of the one or more facial landmarks of the target faces on the preceding input frames and the succeeding input frames.
|