US 11,755,626 B1
	Systems and methods for classifying data objects
Ningwei Liu, Palo Alto, CA (US); Deepanjan Basu, Oakland, CA (US); Todd M. Miller, Gilroy, CA (US); and Craig Morea, Newark, DE (US)
Assigned to Splunk Inc., San Francisco, CA (US)
Filed by SPLUNK Inc., San Francisco, CA (US)
Filed on Jul. 30, 2021, as Appl. No. 17/390,289.
Int. Cl. G06F 16/28 (2019.01); G06F 16/22 (2019.01); G06F 16/93 (2019.01)

CPC G06F 16/285 (2019.01) [G06F 16/2237 (2019.01); G06F 16/2264 (2019.01); G06F 16/93 (2019.01)]

15 Claims

1. A computer-implemented method, comprising:

receiving a ticketing document to be classified, wherein the ticketing document includes a text portion;

performing pre-processing operations on the text portion of the ticketing document resulting in generation of a tokenized document;

performing first word embedding operations on the tokenized document including mapping each token of the tokenized document to a corresponding pre-generated vector resulting in generation of a vectorized document, wherein a first pre-generated vector mapped to a token of the tokenized document represents a semantic meaning of the token of the tokenized document, and wherein the mapping of each token of the tokenized document to the corresponding pre-generated vector is performed without use of machine learning techniques;

performing second word embedding operations on a set of tokenized topics, wherein each tokenized topic is a set of tokens comprising a set of related words, wherein the second word embedding operations include mapping each token of each tokenized topic of the set of tokenized topics to a corresponding pre-generated vector resulting in generation of a set of topic vectors, wherein a second pre-generated vector mapped to a token of the set of tokenized topics represents a semantic meaning of the token of the set of tokenized topics, and wherein the mapping of each token of each tokenized topic to the corresponding pre-generated vector is performed without use of the machine learning techniques;

performing text similarity operations on the vectorized document and each of the set of topic vectors without use of the machine learning techniques resulting in a set of similarity scores indicating a level of semantic similarity between the vectorized document and a topic vector, wherein a first similarity score indicates a level of similarity between the vectorized document and a first topic vector, and wherein each topic vector represents one of a predetermined set of topics; and

classifying the ticketing document into one of the predetermined set of topics based on the set of similarity scores.