US 11,854,305 B2
Skeleton-based action recognition using bi-directional spatial-temporal transformer
Bo Wu, Cambridge, MA (US); Chuang Gan, Cambridge, MA (US); Dakuo Wang, Cambridge, MA (US); and Kaizhi Qian, Champaign, IL (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on May 9, 2021, as Appl. No. 17/315,319.
Prior Publication US 2022/0374629 A1, Nov. 24, 2022
Int. Cl. G06V 40/20 (2022.01); G06T 7/246 (2017.01); G06F 3/01 (2006.01); G06V 20/40 (2022.01); G06F 18/2133 (2023.01)
CPC G06V 40/23 (2022.01) [G06F 3/011 (2013.01); G06F 18/2133 (2023.01); G06T 7/246 (2017.01); G06V 20/46 (2022.01); G06T 2207/10016 (2013.01); G06T 2207/20076 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30196 (2013.01); G06V 2201/00 (2022.01)] 21 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
instantiating a bi-directional spatial-temporal transformer neural network; and
training the bi-directional spatial-temporal transformer neural network to predict original coordinates of a skeletal joint in a specific frame through relative relationships of the skeletal joint to other joints and to the state of the skeletal joint in other frames by:
obtaining a plurality of frames comprising coordinates of the skeletal joint and coordinates of other joints;
producing a spatially masked frame from the specific frame by masking the original coordinates of the skeletal joint;
providing the specific frame, the spatially masked frame, and at least one more of the plurality of frames to a coordinate prediction head of the bi-directional spatial-temporal transformer network;
obtaining, from the coordinate prediction head, a prediction of coordinates for the skeletal joint in the spatially masked frame; and
adjusting parameters of the bi-directional spatial-temporal transformer neural network until a mean-squared error, between the prediction of coordinates for the skeletal joint and the original coordinates of the skeletal joint, converges.