US 11,748,658 B2
System and method for categorical time-series clustering
Sakyajit Bhattacharya, Kolkata (IN); and Avik Ghose, Kolkata (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Sep. 18, 2020, as Appl. No. 17/25,137.
Claims priority of application No. 201921037652 (IN), filed on Sep. 18, 2019.
Prior Publication US 2021/0081844 A1, Mar. 18, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 16/28 (2019.01)
CPC G06N 20/00 (2019.01) [G06F 16/285 (2019.01)] 9 Claims
OG exemplary drawing
 
1. A processor implemented method, comprising:
obtaining, from a plurality of sensors, a plurality of categorical time-series associated with a plurality of subjects, via one or more hardware processors, wherein each categorical time-series from amongst the plurality of categorical time-series associated with a distinct subject from amongst the plurality of subjects; and
clustering, based on the plurality of categorical time-series, the plurality of subjects into a plurality of clusters in an unsupervised manner by using a Markov chain model to assign the plurality of subjects to the plurality of clusters, via the one or more hardware processors, assigning a subject from amongst the plurality of subjects into a cluster from amongst the plurality of clusters comprises:
determining a plurality of cluster-specific transition matrices (Mh) based on a transitional probability of the subject's transitioning from a first state to a second state, the plurality of cluster-specific transition matrices (Mh) associated with the Markov Chain Model and obtained from the plurality of categorical time-series;
constructing, for each of the plurality of cluster-specific transitional matrices, a semi-distance function between the first state and the second state at multiple time instances, the semi-distance function indicative of a conditional probability of movement of the subject from the first state at a first time instant to the second state at a second time instant; and
obtaining, by using an expectation maximization (EM) model, one or more latent variables of each of the cluster-specific transitional matrices to determine an association of the subject to the cluster;
wherein the one or more latent variables comprises a logarithmic function as shown below:

OG Complex Work Unit Math
where,
LiM1j,k(mjk1)Ni,jk and LiM2j,k(mjk2)Ni,jk
Wherein, the one or more latent variables are Yi=1 if yi is generated by M2 and Yi=0 otherwise for i=1, 2, . . . ,n,
Wherein, the logarithmic function Log L is a summation of cluster-specific transitional matrices based on the transitional probability of the subject's transitioning from the first state LiM1 to the second state LiM2 associated with the Markov Chain Model,
Where, mjk1 is an off-diagonal element of M1, mjk2 is an off-diagonal element of M2, Ni,jk is a cardinality of transitions from state j to state k observed in time series i and sj is a jth diagonal element in a diagonal matrix with a state j of the Markov Chain Model.