US 12,244,631 B2
Method and system for detecting outliers in processes
Amarnath Chatterjee, Bangalore (IN); and Rajat Mohanty, Fairfax, VA (US)
Assigned to BULL SAS, Les Clayes-sous-Bois (FR)
Filed by BULL SAS, Les Clayes-sous-bois (FR)
Filed on Oct. 4, 2022, as Appl. No. 17/960,084.
Claims priority of application No. 21200770 (EP), filed on Oct. 4, 2021.
Prior Publication US 2023/0164161 A1, May 25, 2023
Int. Cl. H04L 9/40 (2022.01); G06F 18/23213 (2023.01)
CPC H04L 63/1433 (2013.01) [G06F 18/23213 (2023.01); H04L 63/1425 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A method for detecting outliers in processes running in a group of machines, wherein the method is carried out by a computer and the method comprising:
a clustering stage carried out at a first frequency and comprising
fetching a list of software contained in all machines of said group of machines,
calculating a term frequency-inverse document frequency (tf-idf) value for each installed software of said list of software and for each machine of said all machines,
performing clustering of the all machines by applying a clustering algorithm and using a Jaccardian weighted distance method between said all machines based on the tf-idf value that is calculated for each installed software of said list of software and for said each machine of said all machines, to form clusters,
a preliminary outliers detection stage carried out at a second frequency to detect outliers, the second frequency being greater than the first frequency, and said preliminary outliers detection stage comprising
fetching information of processes running in the all machines,
for each cluster of said clusters, calculating tf-idf values for each process of said processes,
wherein if a tf-idf value of a process of said each process is greater than a first predetermined threshold, the process is considered as an outlier,
for all of said clusters, calculating an itf-idf value for said each process that is considered as said outlier to identify a false positive or confirm said outlier,
wherein if said itf-idf value is lower than a second predetermined threshold, the process is confirmed as said outlier;
wherein when confirming said outlier, via said clustering stage and said preliminary outliers detection stage, said method further comprises
choosing a right cluster of said clusters by calculating fitment for other clusters if said false positive is confirmed, by starting analysis of a day one forensic snapshot to detect malware, in order to
reduce false positives on the go due to large software variance of said processes.