US 11,783,496 B2
Scalable real-time hand tracking
Valentin Bazarevsky, San Jose, CA (US); Fan Zhang, Sunnyvale, CA (US); Andrei Vakunov, Mountain View, CA (US); Andrei Tkachenka, Mountain View, CA (US); and Matthias Grundmann, San Jose, CA (US)
Assigned to GOOGLE LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Nov. 16, 2021, as Appl. No. 17/527,463.
Application 17/527,463 is a continuation of application No. 16/709,128, filed on Dec. 10, 2019, granted, now 11,182,909.
Prior Publication US 2022/0076433 A1, Mar. 10, 2022
Int. Cl. G06T 7/246 (2017.01); G06T 7/73 (2017.01); G06V 40/20 (2022.01)
CPC G06T 7/251 (2017.01) [G06T 7/75 (2017.01); G06V 40/28 (2022.01); G06T 2207/20081 (2013.01); G06T 2207/30196 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A computing system for hand tracking, the system comprising:
one or more processors; and
one or more non-transitory computer-readable media that collectively store instructions that, when executed by the one or more processors, cause the one or more processors to perform operations, the operations comprising:
obtaining a first image frame, wherein the first image frame is descriptive of a hand comprising a palm;
processing the first image frame with a machine-learned palm detection model to generate one or more bounding boxes associated with a position of the palm, wherein the position of the palm is determined with the machine-learned palm detection model based on one or more features in the first image frame;
processing the one or more bounding boxes with a machine-learned hand landmark model to determine a first plurality of hand landmark positions within the first image frame based at least in part on the one or more bounding boxes;
obtaining ground truth data associated with ground truth hand landmark positions;
determining a loss function associated with the first plurality of hand landmark positions relative to the ground truth data; and
backpropagating the loss function associated with the first plurality of hand landmark positions to the machine-learned palm detection model to train the machine-learned palm detection model.