US 12,340,627 B2
	System and methods for gesture inference using computer vision
Dexter Ang, Boston, MA (US); David Cipoletta, Boston, MA (US); and Henry Valk, Boston, MA (US)
Assigned to Pison Technology, Inc., Boston, MA (US)
Filed by Pison Technology, Inc., Boston, MA (US)
Filed on Jan. 28, 2023, as Appl. No. 18/161,052.
Application 18/161,052 is a continuation in part of application No. 17/935,480, filed on Sep. 26, 2022, granted, now 11,914,791.
Prior Publication US 2024/0104961 A1, Mar. 28, 2024
Int. Cl. G06V 40/20 (2022.01); G06V 10/762 (2022.01)

CPC G06V 40/28 (2022.01) [G06V 10/762 (2022.01)]

20 Claims

1. A system for gesture inference, the system comprising:

at least one camera configured to capture video having image(s) of an environment, the image(s) having image timestamps;

a wearable device configured to be worn on a portion of an arm of a user, the wearable device comprising:

a biopotential sensor, the biopotential sensor being configured to obtain biopotential data indicating electrical signals generated by nerves and muscles in the arm of the user; and

a motion sensor, the motion sensor being configured to obtain motion data relating to a motion of the portion of the arm of the user, the biopotential data and/or the motion data having sensor data timestamps;

a first machine learning model, the first machine learning model being configured to output a first gesture inference of the user's hand/arm based on a plurality of sets of key-point values determined based on the image(s) of the environment from the video, the first gesture inference indicating a gesture from a plurality of defined gestures; and

a second machine learning model, the second machine learning model being configured to output a second gesture inference of the user's hand/arm using a combination at least the biopotential data and the motion data relating to the motion of the portion of the arm of the user;

wherein the system is configured to:

obtain the image(s) of the environment from the video;

determine a plurality of sets of key-point values, each set of key-point values indicating locations of portions of a hand of the user for an image of the image(s);

using the first machine learning model, process the plurality of sets of key-point values to obtain the first gesture inference;

based on the image timestamps, assign a first gesture inference timestamp to the first gesture inference;

select a subset of the biopotential data and the motion data having sensor data timestamps that overlap the first gesture inference timestamp;

using the second machine learning model, process the subset of the biopotential data and the motion data to generate the second gesture inference; and

based on at least a comparison between the first gesture inference and the second gesture inference, modify the second machine learning model.