| CPC G06N 3/047 (2023.01) [G05B 13/027 (2013.01); G06N 3/045 (2023.01)] | 20 Claims |

|
1. A method comprising:
receiving a pre-determined constraint on a plurality of user actions;
generating a constraint vector based on the pre-determined constraint;
inputting the constraint vector into a machine learning model;
generating a first output from the machine learning model by executing the machine learning model using the constraint vector as a first input to the machine learning model;
converting the constraint vector into a legal action mask;
generating a probability vector by executing a masked softmax operator, wherein:
the masked softmax operator takes, as a second input, the first output,
the masked softmax operator takes, as a third input, the legal action mask, and
the masked softmax operator generates, as a second output, the probabilities vector; and
generating a plurality of action outputs by applying a sampling system to the probability vector, wherein the plurality of action outputs comprise a subset of the plurality user actions, and wherein the subset includes only allowed user actions.
|