US 11,055,521 B2
Real-time gesture recognition method and apparatus
Trevor Chandler, Thornton, CO (US); Dallas Nash, Frisco, TX (US); and Michael Menefee, Richardson, TX (US)
Assigned to AVODAH, INC., Wilmington, DE (US)
Filed by AVODAH, INC., Wilmington, DE (US)
Filed on Dec. 30, 2019, as Appl. No. 16/730,587.
Application 16/730,587 is a continuation of application No. 16/270,532, filed on Feb. 7, 2019, granted, now 10,521,928.
Application 16/270,532 is a continuation in part of application No. 16/258,524, filed on Jan. 25, 2019, granted, now 10,346,198, issued on Jul. 9, 2019.
Application 16/258,524 is a continuation in part of application No. 16/258,514, filed on Jan. 25, 2019, granted, now 10,304,208, issued on May 28, 2019.
Application 16/258,514 is a continuation in part of application No. 16/258,509, filed on Jan. 25, 2019, granted, now 10,489,639, issued on Nov. 26, 2019.
Application 16/258,509 is a continuation in part of application No. 16/258,531, filed on Jan. 25, 2019, granted, now 10,289,903, issued on May 14, 2019.
Claims priority of provisional application 62/693,841, filed on Jul. 3, 2018.
Claims priority of provisional application 62/660,739, filed on Apr. 20, 2018.
Claims priority of provisional application 62/629,398, filed on Feb. 12, 2018.
Claims priority of provisional application 62/693,821, filed on Jul. 3, 2018.
Claims priority of provisional application 62/664,883, filed on Apr. 30, 2018.
Claims priority of provisional application 62/654,174, filed on Apr. 6, 2018.
Prior Publication US 2020/0387697 A1, Dec. 10, 2020
Int. Cl. G06K 9/00 (2006.01); G06N 3/08 (2006.01); G06N 20/00 (2019.01); G06F 3/01 (2006.01); G09B 21/00 (2006.01); G06F 40/40 (2020.01); G06F 40/58 (2020.01); G10L 15/26 (2006.01); G06F 3/16 (2006.01); G10L 15/22 (2006.01); G10L 15/24 (2013.01); H04N 5/247 (2006.01); G06T 7/73 (2017.01); G06N 3/04 (2006.01); G06T 7/20 (2017.01); G10L 13/00 (2006.01); G06T 3/40 (2006.01); G06T 17/00 (2006.01)
CPC G06K 9/00355 (2013.01) [G06F 3/013 (2013.01); G06F 3/017 (2013.01); G06F 3/167 (2013.01); G06F 40/40 (2020.01); G06F 40/58 (2020.01); G06K 9/00248 (2013.01); G06K 9/00315 (2013.01); G06N 3/0454 (2013.01); G06N 3/08 (2013.01); G06T 7/20 (2013.01); G06T 7/73 (2017.01); G09B 21/00 (2013.01); G09B 21/009 (2013.01); G10L 15/22 (2013.01); G10L 15/24 (2013.01); G10L 15/26 (2013.01); H04N 5/247 (2013.01); G06K 9/00335 (2013.01); G06N 20/00 (2019.01); G06T 3/4046 (2013.01); G06T 17/00 (2013.01); G06T 2207/20084 (2013.01); G10L 13/00 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method for real-time recognition, using one or more multi-threaded processors, of a gesture communicated by a subject, the method comprising:
receiving, by a first thread of the one or more multi-threaded processors, a first set of image frames associated with the gesture, the first set of image frames captured during a first time interval;
receiving, by a second thread of the one or more multi-threaded processors, a first set of non-visual data associated with the gesture, the first set of non-visual data captured during the first time interval;
storing information representative of the first set of images and information representative of the first set of non-visual data in a shared memory accessible to the one or more multi-threaded processors; and
performing, by a third thread of the one or more multi-threaded processors, a gesture recognition operation on the first set of image frames, the first set of non-visual data, and a second set of image frames associated with the gesture, the second set of image frames captured during a second time interval that is different from the first time interval,
wherein performing the gesture recognition operation comprises:
using a first processor of the one or more multi-threaded processors that implements a first three-dimensional convolutional neural network (3D CNN) to perform an optical flow operation on the information representative of the first set of images that is accessed from the shared memory, wherein the optical flow operation is enabled to recognize a motion associated with the gesture,
using a second processor of the one or more multi-threaded processors that implements a second 3D CNN to perform spatial and color processing operations on the information representative of the first set of images that is accessed from the shared memory,
using a third processor of the one or more multi-threaded processors that implements a third 3D CNN to generate contextual data based on the information representative of the first set of non-visual data that is accessed from the shared memory,
fusing results of the optical flow operation, results of the spatial and color processing operations, and the contextual data to produce an identification of the gesture; and
using a recurrent neural network (RNN) to determine that the identification corresponds to a singular gesture across at least the first and second sets of image frames.