US 12,326,883 B2
Systems and methods for detecting miscategorized text-based objects
Francesca Mosca, London (GB); Jessica Staddon, Redwood City, CA (US); Vineeth Ravi, Jersey City, NJ (US); Simran Lamba, Manhattan, NY (US); Jay Katukuri, San Jose, CA (US); and Cecilia Tilli, London (GB)
Assigned to JPMORGAN CHASE BANK, N.A., New York, NY (US)
Filed by JPMORGAN CHASE BANK, N.A., New York, NY (US)
Filed on Oct. 31, 2023, as Appl. No. 18/498,936.
Prior Publication US 2025/0139135 A1, May 1, 2025
Int. Cl. G06F 16/00 (2019.01); G06F 16/31 (2019.01)
CPC G06F 16/313 (2019.01) 6 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, at a classification data store, a plurality of subject text strings, a plurality of associated class text strings, and a relationship of a description of a class of each class text string, the plurality of subject text strings each comprising a name of an organization, the class text strings each comprising a merchant category code that describe the class;
generating, by a machine learning model, a subject vector embedding based on each of the plurality of subject text strings and a class vector embedding based on each of the plurality of class text strings;
receiving, at a scoring engine from the machine learning model and as input to a binary search process, each of the subject vector embeddings and each of the class vector embeddings;
generating, by the scoring engine, a similarity score, wherein the similarity score is a measurement of similarity between the subject vector embedding and the class vector embedding;
determining, by the scoring engine, that the similarity score is below a threshold value;
splitting, by the scoring engine, the plurality of subject text strings into a first new plurality of subject text strings and a second new plurality of subject text strings;
receiving, by the scoring engine, a new subject vector embedding, wherein the new subject vector embedding is generated from the first new plurality of subject text strings;
calling, by the scoring engine, the binary search process using the new subject vector embedding and the class vector embedding as input to the binary search process;
generating, by the scoring engine executing the binary search process and from the classification data store, two or more subject text strings of the plurality of subject text strings that are concatenated with a separation character and removing the separation character;
providing, from the scoring engine and to a large language model, the concatenated string and one class text string of the class text strings, the one class string being associated with the class vector embedding;
generating, by the large language model in communication with the classification data store and as a result of a query, the concatenated string using the one class text string as a lookup key, the large language model determining a subject text string from the query is similar to the one class text string.