US 12,229,640 B2
Machine learning model trained using features extracted from n-grams of mouse event data
Andrey Finkelshtein, Beer Sheva (IL); Anton Puzanov, Mitzpe Ramon (IL); Noga Agmon, Givat Shmuel (IL); and Eitan Menahem, Beer Sheva (IL)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Nov. 30, 2020, as Appl. No. 17/106,858.
Prior Publication US 2022/0172102 A1, Jun. 2, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 3/03 (2006.01); G06F 9/54 (2006.01); G06N 5/01 (2023.01)
CPC G06N 20/00 (2019.01) [G06F 3/0312 (2013.01); G06F 9/542 (2013.01); G06N 5/01 (2023.01)] 14 Claims
OG exemplary drawing
 
1. A system, comprising a processor to:
receive mouse event data of a plurality of online sessions;
split the mouse event data of the plurality of online sessions into mouse event n-grams;
selecting a subset of n-grams of the mouse event n-grams based on a term-frequency-inverse document frequency (TF-IDF) heuristic that uses n-grams as terms and the sessions as a document;
calculate features based on the subset of n-grams; and
train a machine learning model based on the calculated features;
receive mouse event data of a session, wherein the mouse event data of the session includes consecutive mouse actions taken during the session, a first mouse action having at least an event type, a start time, a starting screen coordinate, an end time, and an ending screen coordinate;
split the mouse event data of the session into session mouse event n-grams, wherein a first mouse event n-gram aggregates at least the start time, the starting screen coordinate, the end time, and the ending screen coordinate of the first mouse action;
extract features from the session mouse event n-grams;
send the extracted features to the trained machine learning model; and
receive an output decision from the trained machine learning model indicating whether the session is malicious.