CPC G06T 7/11 (2017.01) [G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06V 10/454 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 40/10 (2022.01); G06V 40/103 (2022.01); G06V 40/171 (2022.01)] | 15 Claims |
1. A system for extracting human pose information from an image, the system comprising:
a feature extractor for extracting features related to human joints and shapes from the image, the feature extractor being connectable to a database comprising a dataset of reference images and provided with a first convolutional neural network (CNN) architecture including a first plurality of CNN layers,
wherein each CNN layer of the first plurality of CNN layers applies a nonlinear activation function to its input data using trained kernel weights, and
wherein each CNN layer of the first plurality of CNN layers produces an output in the form of a tensor, such that the first CNN architecture outputs a plurality of tensors that are from different CNN layers and that collectively preserve the features;
a two-dimensional (2D) body skeleton detector to which the plurality of tensors are provided, as input, to obtain, for each of multiple joints, a heat map that visually indicates a location of that joint,
wherein the 2D body skeleton detector is provided with a second CNN architecture including a second plurality of CNN layers, and wherein the second CNN architecture accepts, as input, the plurality of tensors rather than the image; and
a facial keypoints detector to which at least one of the multiple heat maps produced by the 2D body skeleton detector for the multiple joints is provided, as input, to obtain locations of one or more facial keypoints.
|