US 12,032,687 B2
	Command classification using active learning
Jack Wilson Stokes, III, North Bend, WA (US); Jonathan Bar Or, Redmond, WA (US); Christian Seifert, Seattle, WA (US); Talha Ongun, Boston, MA (US); and Farid Tajaddodianfar, Seattle, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed on Sep. 30, 2021, as Appl. No. 17/491,438.
Prior Publication US 2023/0096895 A1, Mar. 30, 2023
Int. Cl. G06F 21/55 (2013.01); G06F 18/214 (2023.01); G06F 21/54 (2013.01); G06F 21/56 (2013.01); G06N 20/00 (2019.01)

CPC G06F 21/554 (2013.01) [G06F 18/214 (2023.01); G06F 21/54 (2013.01); G06F 21/566 (2013.01); G06N 20/00 (2019.01)]

20 Claims

1. A method comprising:

receiving a data set comprising a plurality of labeled command line inputs;

transforming each of the labeled command line inputs to generate a sequence of individual terms;

translating each of the sequences of individual terms into a sequence of numerical representations comprising an activity class and a term representation, where individual terms correspond to individual numerical representations, wherein the activity class indicates that a program that is attempting to execute with respect to a term associated with the term representation;

using the activity class to calculate a term score for each of the individual numerical representations based on the individual numerical representations that represents a probability of malicious intent for the corresponding individual term;

generating an aggregated numerical representation comprising at least one of a select number of term scores, a number of terms, a number of rare terms, and the activity class; and

identifying a malicious command line input based on the aggregated numerical representation.