US 12,326,979 B2
	System and method for acoustic based gesture tracking and recognition using spiking neural network
Andrew Gigie, Bangalore (IN); Arun George, Bangalore (IN); Achanna Anil Kumar, Bangalore (IN); Sounak Dey, Kolkata (IN); Kuchibhotla Aditi, Bangalore (IN); and Arpan Pal, Kolkata (IN)
Assigned to Tata Consultancy Services Limited, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Jul. 19, 2022, as Appl. No. 17/813,376.
Claims priority of application No. 202121048453 (IN), filed on Oct. 24, 2021.
Prior Publication US 2023/0168743 A1, Jun. 1, 2023
Int. Cl. G06F 3/01 (2006.01); G01S 15/62 (2006.01); G01S 15/66 (2006.01)

CPC G06F 3/017 (2013.01) [G01S 15/62 (2013.01); G01S 15/66 (2013.01)]

16 Claims

1. A processor implemented method, comprising:

transmitting, via a waveform transmitter, a filtered signal having a band limited random waveform to a user;

receiving, via a plurality of microphones, a reflecting signal from the user, in response to the transmitted filtered signal;

pre-processing the reflecting signal to obtain a pre-processed signal having a real component and an imaginary component, wherein the pre-processed signal comprises a plurality of frames;

performing an autocorrelation of the transmitted filtered signal to obtain an autocorrelated signal;

applying a windowed filter on the autocorrelated signal to obtain a windowed autocorrelation output;

performing a cross correlation for every frame of the plurality of frames of the pre-processed signal with reference to the transmitted filtered signal to obtain a cross correlated signal for every frame of the plurality of frames;

estimating a difference in the cross correlated signal between consecutive frames of the pre-processed signal;

applying a shifted windowed filter on the difference, when the difference is above a pre-defined threshold, to remove unwanted signals and to obtain a windowed difference magnitude cross correlation output;

computing a delay corresponding to each of the plurality of microphones by applying a Fast Fourier Transformation on the windowed autocorrelation output and the windowed difference magnitude cross correlation output;

tracking a plurality of multi-coordinate finger position based on (i) the delay corresponding to each of the plurality of microphones and (ii) a common intersection between one or more ellipses formed with the plurality of microphones and the waveform transmitter using the delay; and

recognizing, via a Spike Neural Network (SNN), a gesture performed by the user based on the plurality of multi-coordinate finger positions by:

converting the plurality of multi-coordinate finger positions to a spike-domain;

extracting, one or more features of the spike-domain using one or more spiking neurons comprised in the SNN; and

recognizing the gesture performed by the user from the extracted one or more features by using the SNN,

wherein the SNN is obtained by:

training a Convolutional Neural Network (CNN) using training data further comprising a plurality of multi-coordinate mapped finger positions corresponding to one or more users to obtain a trained CNN;

quantizing the trained CNN to obtain a quantized CNN; and

converting the quantized CNN to the SNN.