CPC G06V 40/23 (2022.01) [G06F 3/011 (2013.01); G06F 18/2133 (2023.01); G06T 7/246 (2017.01); G06V 20/46 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20076 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30196 (2013.01); G06V 2201/00 (2022.01)] | 21 Claims |
1. A computer-implemented method comprising:
instantiating a bi-directional spatial-temporal transformer neural network; and
training the bi-directional spatial-temporal transformer neural network to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames by:
obtaining a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints;
producing a spatially masked frame from the specific frame by masking the original coordinates of the skeletal joint;
providing the specific frame, the spatially masked frame, and at least one more of the plurality of frames to a coordinate prediction head of the bi-directional spatial-temporal transformer network;
obtaining, from the coordinate prediction head, a prediction of coordinates for the skeletal joint in the spatially masked frame; and
adjusting parameters of the bi-directional spatial-temporal transformer neural network until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.
|