US 11,783,615 B2
Systems and methods for language driven gesture understanding
Sandeep Gupta, Tempe, AZ (US); and Ayan Banerjee, Tempe, AZ (US)
Assigned to Arizona Board of Regents on Behalf of Arizona State University, Tempe, AZ (US)
Filed by Sandeep Gupta, Tempe, AZ (US); and Ayan Banerjee, Tempe, AZ (US)
Filed on Jun. 21, 2021, as Appl. No. 17/353,312.
Claims priority of provisional application 63/041,746, filed on Jun. 19, 2020.
Prior Publication US 2021/0397266 A1, Dec. 23, 2021
Int. Cl. G06V 40/10 (2022.01); H04W 4/02 (2018.01); G06N 20/10 (2019.01); G06F 1/16 (2006.01); H04W 4/80 (2018.01); G06V 20/40 (2022.01); G06V 40/20 (2022.01); G06F 3/01 (2006.01); G06N 3/08 (2023.01)
CPC G06V 40/107 (2022.01) [G06F 1/163 (2013.01); G06F 3/017 (2013.01); G06N 3/08 (2013.01); G06N 20/10 (2019.01); G06V 20/46 (2022.01); G06V 40/28 (2022.01); H04W 4/027 (2013.01); H04W 4/80 (2018.02)] 18 Claims
OG exemplary drawing
 
1. A system, comprising:
a sensor operable to capture sensor data indicative of a gesture; and
a processor in communication with a memory and the sensor, the processor configured to execute instructions stored in the memory, which, when executed, cause the processor to:
receive sensor data indicative of a gesture, the sensor data including a plurality of frames, each frame of the plurality of frames including data indicative of a hand performing the gesture;
decompose the gesture into a canonical gesture form, the canonical gesture form defining a string of gesture components arranged in a spatio-temporal order;
store the canonical gesture form for the gesture as a single example of a plurality of examples associated with the gesture; and
train a neural network to recognize a gesture component in the canonical gesture form using the plurality of examples associated with the gesture,
wherein the processor is further configured to
extract, for a frame of the plurality of frames, a second gesture component associated with a physical movement of the palm of a hand relative to a body;
identify a location of a wrist associated with the hand relative to the body with respect to at least three reference points of the body for a middle grouping of frames of the plurality of frames; and
generate a plurality of movement attributes indicative of the physical movement of the palm using the locations of the wrist of the middle grouping of frames of the plurality of frames.