US 12,265,919 B2
	Method, non-transitory machine-readable medium, and system for identifying comparable entities using machine learning and profile graphs
Balasubramaniam Raju, Palo Alto, CA (US)
Assigned to CLARI INC., Sunnyvale, CA (US)
Filed by Clari Inc., Sunnyvale, CA (US)
Filed on Nov. 22, 2019, as Appl. No. 16/693,136.
Prior Publication US 2021/0158181 A1, May 27, 2021
Int. Cl. G06N 5/04 (2023.01); G06N 20/00 (2019.01)

CPC G06N 5/04 (2013.01) [G06N 20/00 (2019.01)]

20 Claims

1. A computer-implemented method, comprising:

scanning, periodically with a web crawler, a plurality of web sites to identify instances of a technological signature across the plurality of web sites, wherein the technological signature corresponds to a specific technology;

determining a set of entities that utilize the specific technology based on the instances of the technological signature across the plurality of web sites;

in response to a request received from a client device over a network for profiling a benchmark entity, obtaining a benchmark profile data set for the benchmark entity;

obtaining, by a computing device, a plurality of profile data sets, each of the plurality of profile data sets corresponding to a candidate entity, wherein each of the plurality of profile data sets and the benchmark profile data set comprises a plurality of dimensions with at least one dimension corresponding to each of firmographic, technographic, and public information, and wherein a dimension of the plurality of dimensions corresponds to utilization of the specific technology;

scanning, periodically, unstructured data from a plurality of sources utilizing natural language processing to infer relationships between a plurality of entities and a particular technology, wherein at least one dimension of the plurality of dimensions in the benchmark profile data set corresponds to a relationship with the particular technology by the candidate entity,

wherein obtaining the plurality of profile data sets and the benchmark profile data set comprises:

for each of the profile data sets,

scraping information for each of the plurality of dimensions,

converting the information for each of the plurality of dimensions into a numerical value that is usable by a machine learning algorithm, the machine learning algorithm including at least one of a K-nearest neighbors algorithm, a neighborhood component analysis algorithm, or a large margin nearest neighbors algorithm, and

structuring the numerical values for the plurality of dimensions into a format expected by the machine learning algorithm;

determining one or more of the plurality of profile data sets that are most similar to the benchmark profile data set with the machine learning algorithm, wherein determining the one or more of the plurality of profile data sets that are most similar comprises:

for each of the plurality of profile data sets,

determining a distance representing a difference between corresponding values of the profile data set and the benchmark profile data set for each dimension of the benchmark profile data set, and

determining an overall distance between the profile data set and the benchmark profile data set based on an average distance between the profile data set and the benchmark profile data set for each dimension of the benchmark profile data set;

identifying the one or more candidate entities corresponding to the one or more profile data sets as companies most similar to the benchmark entity;

generating a graphical representation representing the benchmark entity and the identified candidate entities, the graphical representation including one or more graphics attributes representing a degree of a similarity between the benchmark entity and each of the identified candidate entities; and

transmitting the graphical representation to the client device over the network to be presented therein.