US 12,437,239 B2
	Methods and apparatus for management of a machine-learning model to adapt to changes in landscape of potentially malicious artifacts
Richard Harang, Alexandria, VA (US); and Felipe Ducau, Oxford (GB)
Assigned to Sophos Limited, Abingdon (GB)
Filed by Sophos Limited, Abingdon (GB)
Filed on Feb. 5, 2021, as Appl. No. 17/168,913.
Application 17/168,913 is a continuation of application No. PCT/GB2019/052222, filed on Aug. 7, 2019.
Claims priority of provisional application 62/715,762, filed on Aug. 7, 2018.
Prior Publication US 2021/0241175 A1, Aug. 5, 2021
Int. Cl. G06N 20/00 (2019.01); G06F 18/21 (2023.01); G06F 18/213 (2023.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06N 5/01 (2023.01); G06N 20/20 (2019.01)

CPC G06N 20/20 (2019.01) [G06F 18/213 (2023.01); G06F 18/214 (2023.01); G06F 18/2178 (2023.01); G06F 18/22 (2023.01); G06N 5/01 (2023.01); G06N 20/00 (2019.01)]

14 Claims

1. An apparatus, comprising:

a memory; and

a processor operatively coupled to the memory, the processor configured to:

train, at a first time, a machine learning model to output (1) an identification of whether an artifact is malicious and (2) a confidence value associated with the identification of whether the artifact is malicious;

receive a set of artifacts during each time period from a set of time periods, each time period from the set of time periods being after the first time;

for each time period from the set of time periods, provide a feature vector representative of each artifact from the set of artifacts received during that time period to the machine learning model to obtain as an output of the machine learning model an indication of whether that artifact is malicious and a confidence value associated with the indication of whether that artifact is malicious based on a degree of similarity of the feature vector with a set of feature vectors representative of a set of training artifacts used to train the machine learning model at the first time;

calculate a confidence metric for each time period from the set of time periods based on the confidence value associated with a number of artifacts from the set of artifacts received during that time period meeting a confidence value threshold;

calculate a rate of change in confidence based on the confidence metric for at least two time periods from the set of time periods;

in response to the rate of change in confidence meeting a retraining criterion:

receive, over a network and at a second time after the first time, an updated set of training artifacts; and

retrain, based on the updated set of training artifacts, the machine learning model.