US 12,277,588 B2
Categorization based on text and attribute factorization
Bagya Lakshmi Vasudevan, Chennai (IN); and Sudesna Baruah, Chennai (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Oct. 27, 2022, as Appl. No. 17/974,763.
Claims priority of application No. 202121053177 (IN), filed on Nov. 18, 2021.
Prior Publication US 2023/0153880 A1, May 18, 2023
Int. Cl. G06Q 30/00 (2023.01); G06F 40/30 (2020.01); G06Q 30/0601 (2023.01)
CPC G06Q 30/0627 (2013.01) [G06F 40/30 (2020.01)] 12 Claims
OG exemplary drawing
 
1. A processor implemented method for product data categorization, the method comprising:
acquiring, via one or more hardware processors, an input describing a set of product data from an application data store for categorization;
preprocessing, via the one or more hardware processors, the set of product data by removing one of extraneous text and unwanted text based on a predefined template, wherein the extraneous text includes comma, dots, error code, dropping unnecessary columns, unmatched product data text from the predefined template;
creating, via the one or more hardware processors, a dictionary for the set of product data based on a set of attributes further comprising a product key with its corresponding product value, wherein a dictionary data is built with a product category and products as a key-value pair;
extracting, via the one or more hardware processors, a multi-level contextual data for the set of product data, by assigning a weight to each product data based on likelihood of the product for arriving at a suitable relevancy or probability score, and creating a set of datapoints for each product data, wherein the weight adds to sharpness or steepness of the input, which leverages semantic similarity and iteration of the weights across each product data depicts probability scenario of contextual data understanding and weighted data represents contextuality; and
categorizing, via the one or more hardware processors, the set of product data by feeding the set of data points to a set of predefined parameters to compute a minimum count, a total size, total number of epochs, a skip gram value and a hierarchical softmax upon pivoting the set of product data to obtain counts based on the product category and reindexing the assigned weights to align with a pivot table index followed by creating the set of data points based on the assigned weights using the pivot table index for each product categorization, wherein contextually aware datapoints are created through dictionary and weight assignment are passed based on a pretrained Word2Vec model, and wherein the pre-trained Word2Vec model is trained by feeding the created datapoints for similarity mapping and training inputs are a product of word vectors of text data and the assigned weights.