US 11,675,790 B1
	Computing company competitor pairs by rule based inference combined with empirical validation
Aditya Jami, San Francisco, CA (US); Jinsong Guo, London (GB); Eric Aichinger, Vienna (AT); Tim Furche, London (GB); Giovanni Grasso, London (GB); Jørn Lyseggen, San Francisco, CA (US); Markus Kröll, Vienna (AT); Stéphane Reissfelder, Cambridge (GB); Lukas Schweizer, Dresden (DE); and Georg Gottlob, Oxford (GB)
Assigned to MELTWATER NEWS INTERNATIONAL HOLDINGS GMBH, Schaffhausen (CH)
Filed by Meltwater News International Holdings GmbH, Schaffhausen (CH)
Filed on Jun. 24, 2022, as Appl. No. 17/849,441.
Claims priority of provisional application 63/326,507, filed on Apr. 1, 2022.
Int. Cl. G06F 16/2455 (2019.01); G06N 7/02 (2006.01); G06F 16/2458 (2019.01); G06N 5/022 (2023.01); G06N 5/025 (2023.01); G06F 16/33 (2019.01); G06F 18/21 (2023.01)

CPC G06F 16/24556 (2019.01) [G06F 16/2468 (2019.01); G06F 16/24558 (2019.01); G06F 16/3334 (2019.01); G06F 18/217 (2023.01); G06N 5/022 (2013.01); G06N 5/025 (2013.01); G06N 7/02 (2013.01)]

22 Claims

1. A method of determining pairs of competing companies comprising:

(i) accessing, by at least one processor, a company information system (CIS) and retrieving data from the CIS, the CIS storing data regarding companies comprising business entities, institutions, and organisations in non-transitory memory;

(ii) determining, by the at least one processor, candidate competitor pairs of companies from already established competitor pairs stored in, and retrieved from, the CIS, and from other data stored in, and retrieved from, the CIS, and determining for each candidate competitor pair (company A, company C) one or more semi-final plausibility scores according to one or more criteria, each semi-final plausibility score for (company A, company C) expressing a degree of plausibility that company C is a competitor of company A;

(iii) validating, by the at least one processor, candidate competitor pairs by accessing a searchable document store and performing searches to obtain for each candidate competitor pair statistics based on frequencies of co-occurrences in documents of the document store of names of two companies in the competitor pair, the frequencies determined from numbers or from sets of identifiers of result documents for search queries issued to the document store, and determining a co-occurrence-based competition likelihood score (CLS) expressing a degree of relatedness of the companies relative to the co-occurrences in documents of the document store; and

(iv) aggregating, by the at least one processor, for each candidate competitor pair the one or more semi-final plausibility scores with the CLS to obtain a final plausibility score and selecting candidate competitor pairs as effective competitor pairs, having a final plausibility score that is in a predefined range of final plausibility scores,

where at least one candidate competitor pair is determined by applying restricted transitivity, whereby the candidate competitor pair (company A, company C) is generated from already established competitor pairs (company A, company B) and (company B, company C) that fulfill additional constraints, where at least one of the additional constraints comprises:

(i) company A and company C are in a same industrial sector;

(ii) company A is in a sector that is compatible with a sector associated with company C; and

(iii) sets of keywords associated to company A and to company B are sufficiently similar,

where, by applying the restricted transitivity, a proximity score (PScore) to the competitor candidate pair (company A, company C) is determined, the PScore expressing the plausibility of company A and company C being competitors based on the data retrieved from the CIS,

where the CLS of each candidate competitor pair (company A, company C) is determined based on a comparison of a number of search results of groups of queries to the document store comprising: (i) first queries for co-occurrences of names of A and of C together with names of some competitors of A or some competitors of C, (ii) second queries, corresponding to the first queries, where either A or C is replaced by random companies from the CIS not known to be in a competitor relationship with A or C, respectively, whereby a higher CLS is achieved if an average number of search results by the first queries is higher than an average number of search results by the second queries,

where aggregating the PScore, the CLS, and the semi-final plausibility scores to compute the final plausibility score is based on normalizing to a same numeric range, and then applying at least one of:

(i) standard numeric score aggregation functions comprising arithmetic mean, geometric mean and its variants, or the median,

(ii) fuzzy aggregation functions, when semi-final scoring functions are interpreted as membership functions in a fuzzy set,

(iii) weighted aggregation functions whose weight is at least one of fixed manually, automatically computed, and learnt from data in the CIS,

(iv) ranking-based aggregation functions which produce an aggregate ranking from the rankings induced by the single scores, and where a final plausibility score is determined from the aggregate ranking, possibly also considering the various scores,

(v) obtaining a more robust final scoring with respect to outlier scores, by aggregating a subset of the scores to compute the final plausibility score, by a score computation method comprising (a) disregarding a lowest score, (b) disregarding a highest score, (c) disregarding the lowest score and the highest score, (d) replacing the highest score, the lowest score, or both the highest and the lowest score by the arithmetic or geometric mean or an adjusted geometric mean or the median of other scores, and

(vi) in case of missing CIS data, replacing non-available score values by the arithmetic or geometric mean or an adjusted geometric mean or the median of the available scores, and, where appropriate, applying a penalty to the semi-final plausibility score to obtain the final plausibility score.