| CPC G06Q 10/087 (2013.01) [G06F 16/90335 (2019.01); G06F 18/213 (2023.01); G06N 3/08 (2013.01)] | 9 Claims |

|
1. A computer-implemented method for identifying a parent company in a corporate hierarchy, the method comprising:
using a number of processors to perform the steps of:
training a neural network iteratively with training data of known corporate hierarchies using a triplet-based loss function that creates vector embeddings wherein distances between companies in a same corporate hierarchy are shorter than distances between unrelated companies and wherein the training data includes randomly transformed company names comprising at least one of:
adding common shipping terms to a beginning of a company name;
adding common shipping terms to an end of the company name;
changing an order of words in the company name;
deleting random letters from the company name; or
adding random spaces in the company name;
building and maintaining an approximate nearest neighbors (ANN) index by indexing the vector embeddings according to euclidean distances between the vector embeddings, wherein the ANN index enables similarity searches across corporate databases;
converting, by the trained neural network, a number of company names into respective vector embeddings wherein the neural network utilizes a character-based convolutional architecture that accounts for minor misspellings in shipping records;
receiving a query regarding a specified query company within the number of company names, wherein the query is for identifying a parent company of the specified query company;
in response to the query, extracting from the ANN index a top N number of nearest neighbor companies to the specified query company;
determining, by a machine learning voting model, which, if any, of the extracted nearest neighbor companies has the parent company that best corresponds to the specified query company, wherein the machine learning voting model evaluates multiple features derived from the extracted nearest neighbor companies to identify the corporate hierarchy that best corresponds to the specified query company, including determining a most frequently occurring (modal) parent company of the extracted nearest neighbor companies, wherein, if there is no mode among the extracted nearest neighbor companies, the machine learning voting model designates a parent company of a nearest neighbor company having a greatest euclidean distance from the specified query company as a default modal parent company for feature calculation purposes; and
responsive to a determination that an extracted nearest neighbor company has the parent company that best corresponds to the specified query company, displaying the parent company to a user through a user interface.
|