| CPC G06F 16/313 (2019.01) | 6 Claims |

|
1. A method comprising:
receiving, at a classification data store, a plurality of subject text strings, a plurality of associated class text strings, and a relationship of a description of a class of each class text string, the plurality of subject text strings each comprising a name of an organization, the class text strings each comprising a merchant category code that describe the class;
generating, by a machine learning model, a subject vector embedding based on each of the plurality of subject text strings and a class vector embedding based on each of the plurality of class text strings;
receiving, at a scoring engine from the machine learning model and as input to a binary search process, each of the subject vector embeddings and each of the class vector embeddings;
generating, by the scoring engine, a similarity score, wherein the similarity score is a measurement of similarity between the subject vector embedding and the class vector embedding;
determining, by the scoring engine, that the similarity score is below a threshold value;
splitting, by the scoring engine, the plurality of subject text strings into a first new plurality of subject text strings and a second new plurality of subject text strings;
receiving, by the scoring engine, a new subject vector embedding, wherein the new subject vector embedding is generated from the first new plurality of subject text strings;
calling, by the scoring engine, the binary search process using the new subject vector embedding and the class vector embedding as input to the binary search process;
generating, by the scoring engine executing the binary search process and from the classification data store, two or more subject text strings of the plurality of subject text strings that are concatenated with a separation character and removing the separation character;
providing, from the scoring engine and to a large language model, the concatenated string and one class text string of the class text strings, the one class string being associated with the class vector embedding;
generating, by the large language model in communication with the classification data store and as a result of a query, the concatenated string using the one class text string as a lookup key, the large language model determining a subject text string from the query is similar to the one class text string.
|