US 12,326,979 B2
System and method for acoustic based gesture tracking and recognition using spiking neural network
Andrew Gigie, Bangalore (IN); Arun George, Bangalore (IN); Achanna Anil Kumar, Bangalore (IN); Sounak Dey, Kolkata (IN); Kuchibhotla Aditi, Bangalore (IN); and Arpan Pal, Kolkata (IN)
Assigned to Tata Consultancy Services Limited, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Jul. 19, 2022, as Appl. No. 17/813,376.
Claims priority of application No. 202121048453 (IN), filed on Oct. 24, 2021.
Prior Publication US 2023/0168743 A1, Jun. 1, 2023
Int. Cl. G06F 3/01 (2006.01); G01S 15/62 (2006.01); G01S 15/66 (2006.01)
CPC G06F 3/017 (2013.01) [G01S 15/62 (2013.01); G01S 15/66 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A processor implemented method, comprising:
transmitting, via a waveform transmitter, a filtered signal having a band limited random waveform to a user;
receiving, via a plurality of microphones, a reflecting signal from the user, in response to the transmitted filtered signal;
pre-processing the reflecting signal to obtain a pre-processed signal having a real component and an imaginary component, wherein the pre-processed signal comprises a plurality of frames;
performing an autocorrelation of the transmitted filtered signal to obtain an autocorrelated signal;
applying a windowed filter on the autocorrelated signal to obtain a windowed autocorrelation output;
performing a cross correlation for every frame of the plurality of frames of the pre-processed signal with reference to the transmitted filtered signal to obtain a cross correlated signal for every frame of the plurality of frames;
estimating a difference in the cross correlated signal between consecutive frames of the pre-processed signal;
applying a shifted windowed filter on the difference, when the difference is above a pre-defined threshold, to remove unwanted signals and to obtain a windowed difference magnitude cross correlation output;
computing a delay corresponding to each of the plurality of microphones by applying a Fast Fourier Transformation on the windowed autocorrelation output and the windowed difference magnitude cross correlation output;
tracking a plurality of multi-coordinate finger position based on (i) the delay corresponding to each of the plurality of microphones and (ii) a common intersection between one or more ellipses formed with the plurality of microphones and the waveform transmitter using the delay; and
recognizing, via a Spike Neural Network (SNN), a gesture performed by the user based on the plurality of multi-coordinate finger positions by:
converting the plurality of multi-coordinate finger positions to a spike-domain;
extracting, one or more features of the spike-domain using one or more spiking neurons comprised in the SNN; and
recognizing the gesture performed by the user from the extracted one or more features by using the SNN,
wherein the SNN is obtained by:
training a Convolutional Neural Network (CNN) using training data further comprising a plurality of multi-coordinate mapped finger positions corresponding to one or more users to obtain a trained CNN;
quantizing the trained CNN to obtain a quantized CNN; and
converting the quantized CNN to the SNN.