US 12,002,236 B2
Automated gesture identification using neural networks
Trevor Chandler, Thornton, CO (US); Dallas Nash, Frisco, TX (US); and Michael Menefee, Richardson, TX (US)
Assigned to AVODAH, INC., Wilmington, DE (US)
Filed by AVODAH, INC., Wilmington, DE (US)
Filed on Aug. 9, 2021, as Appl. No. 17/397,523.
Application 17/397,523 is a continuation of application No. 16/421,158, filed on May 23, 2019, granted, now 11,087,488.
Application 16/421,158 is a continuation of application No. 16/258,514, filed on Jan. 25, 2019, granted, now 10,304,208, issued on May 28, 2019.
Claims priority of provisional application 62/693,821, filed on Jul. 3, 2018.
Claims priority of provisional application 62/629,398, filed on Feb. 12, 2018.
Prior Publication US 2022/0026992 A1, Jan. 27, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 3/01 (2006.01); G06F 3/03 (2006.01); G06F 18/25 (2023.01); G06N 3/045 (2023.01); G06T 7/20 (2017.01); G06T 7/73 (2017.01); G06T 7/90 (2017.01); G06V 40/16 (2022.01); G06V 40/20 (2022.01)
CPC G06T 7/73 (2017.01) [G06F 3/017 (2013.01); G06F 3/0304 (2013.01); G06F 18/256 (2023.01); G06N 3/045 (2023.01); G06T 7/20 (2013.01); G06T 7/90 (2017.01); G06V 40/165 (2022.01); G06V 40/23 (2022.01); G06V 40/28 (2022.01)] 16 Claims
OG exemplary drawing
 
1. A device for processing images associated with a gesture, comprising:
at least one camera; and
at least one processor configured to implement:
one or more three-dimensional convolution neural networks (3D CNNs), each of the 3D CNNs comprising:
an input to receive a plurality of input images from the at least one camera, and
an output to provide recognition information produced by each of the 3D CNNs, and
at least one recurrent neural network (RNN) comprising:
an input to receive a second type of recognition information, and
an output that is coupled to the input of the at least one RNN to provide a feedback connection,
wherein the at least one processor is configured to:
receive a plurality of captured images at a pre-processing module, perform pose estimation on each of the plurality of captured images, and overlay pose estimation pixels onto the plurality of captured images to generate the plurality of input images for consumption by the one or more 3D CNNs, and
receive the recognition information produced by each of the one or more 3D CNNs at a fusion module, and aggregate the received recognition information to generate the second type of recognition information for consumption by the at least one RNN,
wherein each of the one or more 3D CNNs is operable to produce the recognition information comprising at least one characteristic associated with the gesture in each of the plurality of input images, and provide the recognition information to the fusion module, the at least one characteristic comprising a pose, a color or a gesture type, and
wherein the at least one RNN is operable to determine whether the recognition information produced by the one or more 3D CNNs corresponds to a singular gesture across the plurality of input images.