US 12,148,097 B2
Methods and systems for 3D hand pose estimation from RGB images
Yannick Verdie, Toronto (CA); Zihao Yang, Richmond Hill (CA); Deepak Sridhar, San Diego, CA (US); Steven George McDonagh, London (GB); and Juwei Lu, North York (CA)
Assigned to HUAWEI TECHNOLOGIES CO., LTD., Shenzhen (CN)
Filed by Yannick Verdie, Toronto (CA); Zihao Yang, Richmond Hill (CA); Deepak Sridhar, San Diego, CA (US); Steven George McDonagh, London (GB); and Juwei Lu, North York (CA)
Filed on Dec. 9, 2022, as Appl. No. 18/078,832.
Prior Publication US 2024/0193866 A1, Jun. 13, 2024
Int. Cl. G06T 17/00 (2006.01); G06F 3/01 (2006.01); G06T 17/20 (2006.01); G06V 40/20 (2022.01)
CPC G06T 17/20 (2013.01) [G06F 3/017 (2013.01); G06V 40/28 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computing system comprising:
a processing unit configured to execute instructions to cause the computing system to estimate a set of 3D keypoints representing a 3D hand pose by:
processing a 2D image containing a detected hand using a U-net network to obtain a global feature vector and a heatmap for each of the keypoints;
concatenating information from the global feature vector and the heatmap to obtain a set of input tokens;
processing the input tokens using a transformer encoder to obtain a first set of 2D keypoints representing estimated 2D locations of the keypoints in a first 2D view;
inputting the first set of 2D keypoints as a query to a transformer decoder, with cross-attention from the transformer encoder, to obtain a second set of 2D keypoints representing estimated 2D locations of the keypoints in a second 2D view; and
aggregating the first and second sets of 2D keypoints to output the set of estimated 3D keypoints.