US 12,249,074 B2
	Human pose analysis system and method
Dongwook Cho, Montreal (CA); Maggie Zhang, Montreal (CA); and Paul Kruszewski, Montreal (CA)
Assigned to Hinge Health, Inc., San Francisco, CA (US)
Appl. No. 17/256,307
Filed by Hinge Health, Inc., San Francisco, CA (US)
PCT Filed Jun. 27, 2019, PCT No. PCT/CA2019/050887 § 371(c)(1), (2) Date Dec. 28, 2020, PCT Pub. No. WO2020/000096, PCT Pub. Date Jan. 2, 2020.
Claims priority of provisional application 62/691,818, filed on Jun. 29, 2018.
Prior Publication US 2021/0264144 A1, Aug. 26, 2021
Int. Cl. G06T 7/11 (2017.01); G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06V 10/44 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 40/10 (2022.01); G06V 40/16 (2022.01)

CPC G06T 7/11 (2017.01) [G06F 18/214 (2023.01); G06N 3/045 (2023.01); G06V 10/454 (2022.01); G06V 10/764 (2022.01); G06V 10/82 (2022.01); G06V 40/10 (2022.01); G06V 40/103 (2022.01); G06V 40/171 (2022.01)]

15 Claims

1. A system for extracting human pose information from an image, the system comprising:

a feature extractor for extracting features related to human joints and shapes from the image, the feature extractor being connectable to a database comprising a dataset of reference images and provided with a first convolutional neural network (CNN) architecture including a first plurality of CNN layers,

wherein each CNN layer of the first plurality of CNN layers applies a nonlinear activation function to its input data using trained kernel weights, and

wherein each CNN layer of the first plurality of CNN layers produces an output in the form of a tensor, such that the first CNN architecture outputs a plurality of tensors that are from different CNN layers and that collectively preserve the features;

a two-dimensional (2D) body skeleton detector to which the plurality of tensors are provided, as input, to obtain, for each of multiple joints, a heat map that visually indicates a location of that joint,

wherein the 2D body skeleton detector is provided with a second CNN architecture including a second plurality of CNN layers, and wherein the second CNN architecture accepts, as input, the plurality of tensors rather than the image; and

a facial keypoints detector to which at least one of the multiple heat maps produced by the 2D body skeleton detector for the multiple joints is provided, as input, to obtain locations of one or more facial keypoints.