US 11,748,658 B2
	System and method for categorical time-series clustering
Sakyajit Bhattacharya, Kolkata (IN); and Avik Ghose, Kolkata (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Sep. 18, 2020, as Appl. No. 17/25,137.
Claims priority of application No. 201921037652 (IN), filed on Sep. 18, 2019.
Prior Publication US 2021/0081844 A1, Mar. 18, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 16/28 (2019.01)

CPC G06N 20/00 (2019.01) [G06F 16/285 (2019.01)]

9 Claims

1. A processor implemented method, comprising:

obtaining, from a plurality of sensors, a plurality of categorical time-series associated with a plurality of subjects, via one or more hardware processors, wherein each categorical time-series from amongst the plurality of categorical time-series associated with a distinct subject from amongst the plurality of subjects; and

clustering, based on the plurality of categorical time-series, the plurality of subjects into a plurality of clusters in an unsupervised manner by using a Markov chain model to assign the plurality of subjects to the plurality of clusters, via the one or more hardware processors, assigning a subject from amongst the plurality of subjects into a cluster from amongst the plurality of clusters comprises:

determining a plurality of cluster-specific transition matrices (M_h) based on a transitional probability of the subject's transitioning from a first state to a second state, the plurality of cluster-specific transition matrices (M_h) associated with the Markov Chain Model and obtained from the plurality of categorical time-series;

constructing, for each of the plurality of cluster-specific transitional matrices, a semi-distance function between the first state and the second state at multiple time instances, the semi-distance function indicative of a conditional probability of movement of the subject from the first state at a first time instant to the second state at a second time instant; and

obtaining, by using an expectation maximization (EM) model, one or more latent variables of each of the cluster-specific transitional matrices to determine an association of the subject to the cluster;

wherein the one or more latent variables comprises a logarithmic function as shown below:

where,

L_i^M^₁=Π_j,k(m_jk¹)^N^_i,jkand L_i^M^₂=Π_j,k(m_jk²)^N^_i,jk

Wherein, the one or more latent variables are Y_i=1 if y_iis generated by M₂and Y_i=0 otherwise for i=1, 2, . . . ,n,

Wherein, the logarithmic function Log L is a summation of cluster-specific transitional matrices based on the transitional probability of the subject's transitioning from the first state L_i^M^₁to the second state L_i^M^₂associated with the Markov Chain Model,

Where, m_jk¹is an off-diagonal element of M₁, m_jk²is an off-diagonal element of M₂, N_i,jkis a cardinality of transitions from state j to state k observed in time series i and s_jis a j^thdiagonal element in a diagonal matrix with a state j of the Markov Chain Model.