US 12,347,159 B2
	System and method for real-time radar-based action recognition using spiking neural network(SNN)
Sounak Dey, Kolkata (IN); Arijit Mukherjee, Kolkata (IN); Dighanchal Banerjee, Kolkata (IN); Smriti Rani, Kolkata (IN); Arun George, Bangalore (IN); Tapas Chakravarty, Kolkata (IN); Arijit Chowdhury, Kolkata (IN); and Arpan Pal, Kolkata (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Dec. 15, 2020, as Appl. No. 17/122,041.
Claims priority of application No. 202021021689 (IN), filed on May 22, 2020.
Prior Publication US 2021/0365778 A1, Nov. 25, 2021
Int. Cl. G06N 3/049 (2023.01); G01S 7/40 (2006.01); G01S 13/50 (2006.01); G06N 3/08 (2023.01); G06V 10/44 (2022.01); G06V 10/82 (2022.01); G06V 30/19 (2022.01); G06V 40/20 (2022.01)

CPC G06V 10/454 (2022.01) [G01S 7/4056 (2013.01); G01S 13/505 (2013.01); G06N 3/049 (2013.01); G06N 3/08 (2013.01); G06V 10/82 (2022.01); G06V 30/19173 (2022.01); G06V 40/20 (2022.01)]

9 Claims

1. A processor implemented method for real-time radar based recognition of an action performed by a target by employing a spiking neural network (SNN) model, the method comprising:

receiving, by a data preprocessing layer of a spiking neural network (SNN), radar data acquired by one or more radar sensors, wherein the radar data is indicative of one or more actions performed by the target, wherein the radar data comprises a plurality of Doppler frequencies reflected from the target upon motion of the target with respect to the one or more radar sensors, wherein the spiking neural network (SNN) model is employed for real-time radar based recognition of an action performed by a target, the SNN model comprising a data pre-processing layer, a plurality of Convolutional Spiking neural network (CSNN) layers and a classifier layer;

determining, by the data preprocessing layer, a first binarized matrix associated with the radar data;

extracting, by the plurality of CSNN layers pre-trained on training data, a set of features associated with the one or more actions of the target based on the first binarized matrix, the set of features comprising a first set of spatial features and a first set of temporal features, wherein each CSNN layer of the plurality of CSNN layers comprises a set of class-wise filter-blocks connected via a lateral inhibition mechanism, which forms a convolutional spiking layer that takes the preprocessed radar data as input, wherein each class-wise filter-block of the set of class-wise filter-blocks comprises a set of filters controlled by a switcher node to enable each CSNN layer of the plurality of CSNN layers to capture spatially collocated patterns within a spike frame of a single action class associated with the action, wherein the switcher node applies inhibition to force all but one filter in the each class-wise filter-block to an inactive state for a predetermined inactivity duration, wherein the predetermined inactivity duration depends on strength of inhibition, wherein the spike frame is a 1-D binary image binarized radar spectrogram representing an action, wherein each pixel within the spike frame is connected to a single neuron of the class-wise filter-block;

extracting the first set of spatial features hierarchically by convolving over the plurality of CSNN layers, wherein convolving over the plurality of CSNN layers increases complexity in the first set of spatial features from an initial CSNN layer to a last CSNN layer of the plurality of CSNN layers;

wherein convolving over the plurality of CSNN layers comprises iteratively selecting a filter from amongst the set of filters to cause a spike upon an elapse of the predetermined inactivity duration thereby enabling the spatially collocated but temporally separable features to appear on distinct filters from the set of filters, wherein a filter from amongst the set of filters causing a maximum spike is selected, and wherein iteratively selecting the filter of the set of filters comprises:

activating a class-wise filter-block from amongst the set of class-wise filter-blocks at a time for a frame sequence associated with the action; and

applying a long-term inhibition to disable more than one class-wise filter-blocks from amongst the set of class-wise filter-blocks to learn a redundant pattern and enable lateral inhibition among the set of class-wise filter-blocks and allowing the set of class-wise filter-blocks to compete for a plurality of action classes, wherein applying the long-term inhibition comprises:

initializing weights associated with the filters in the set of class-wise filter-blocks randomly;

determining a class-wise filter-block from amongst the set of class-wise filter-blocks selected for a distinct action class from amongst the plurality of action classes; and

sending an inhibition signal to remaining class-wise filter-blocks from amongst the set of class-wise filter-blocks to prevent the remaining class-wise filter-blocks from being activated; and

identifying, by the classifier layer, a type of the action from amongst the one or more actions performed by the target based on the set of features, wherein the spatial features and the temporal features from the CSNN layer corresponding to respective actions is taken as an input to the classifier layer,

using the type of action identified by the classifier layer in at least one of monitoring, surveillance and healthcare applications, wherein the plurality of CSNN layers are trained using an unsupervised training technique for identifying the plurality of actions using the training data, wherein training the plurality of CSNN layers comprises:

receiving, by the data preprocessing layer, the training data acquired by the one or more radar sensors, the training data indicative of the plurality of actions performed by one or more targets, wherein the training data comprises time series data comprising a plurality of Doppler frequencies reflected from the plurality of targets upon motion of the plurality of targets with respect to the one or more radar sensors;

determining a plurality of second binarized matrices associated with each of the plurality of actions by the preprocessing layer, wherein determining the plurality of second binarized matrices comprises:

computing a plurality of spectrograms for the plurality of actions as a time-frequency domain representation of the time series data by using a Short-time Fourier Transform (STFT) model;

performing a modulus operation on the STFT model to obtain a real valued matrix;

consecutively converting the real matrix into a grayscale image into a second binary matrix of the plurality of second binary matrices using a threshold;

extracting, by the plurality CSNN layers, a set of training features associated with the plurality of actions of the target based on the second plurality of binarized matrices, the set of training features comprising a second set of spatial features and a second set of temporal features; and

identifying, by the classifier layer, a type of the action from amongst the plurality of actions performed by the target based on the second set of features.