| CPC G06N 3/08 (2013.01) [G06F 16/9024 (2019.01); G06F 18/214 (2023.01); G06N 3/082 (2013.01); G06N 5/01 (2023.01)] | 21 Claims |

|
1. A method for improving a computer-implemented machine learning model by way of constrained training, the method comprising:
during a training phase, using training data, as input, to train the machine learning model to derive a parametric function that minimizes error across input data associated with a prediction of output values,
the machine learning model implemented over a plurality of nodes configured to represent a neural network as a directed graph comprised of nodes and edges, edges representing connections, the directed graph representing the machine learning model's computation, and
at least one or more edges connecting the one or more nodes, an edge representing a connection between a first node and a second node, the connection being either conforming or non-conforming, and associated with at least one weight parameter;
continue training the machine learning model by iteratively adjusting weight parameters associated with the neural network nodes where the adjustment of the weight parameters is driven by input training data, output training data, one or more predicted values from the machine learning model, and a loss function associated with the prediction and training data;
determining that a first connection between two nodes in the neural network is conforming or non-conforming based on a constraint formula corresponding to a sign of the weight parameter assigned to the first connection and polarities of a source node and a destination node connected by the first connection, during the training phase, the constraint formula being based on a function of a signum value of the first connection and polarity values associated with the source node and the destination node,
wherein Ps is polarity of the source node associated with an output value to the destination node, Pd is the polarity of the destination node associated with an input value received from the source node, sgn refers to signum value, and wsd is the weight parameter assigned to the first connection connecting the source node and the destination node such that:
| |||||||||||||||||||||||||||||||||||||
and
sparsifying connections between nodes in the neural network by adjusting the weight parameter associated with the connection towards a first value for non-conforming connections,
the sparsifying being performed iteratively and gradually during the training phase to satisfy one or more constraints on the weight parameters of the neural network, wherein the corresponding constraint formula induces sparsity,
the constraint formula being applied to the weights in a plurality of connections in the neural network by interleaving one or more constraining operations such that a weight change is applied to one or more weight parameters in the neural network to improve the satisfaction of one or more constraints and such that a first quantity of connections becomes conforming and a second quantity of connections remains non-conforming,
wherein during the one or more constraining operations, a function of the weight parameters of a subset of the non-conforming connections is constrained to be less than or equal to the value of a function associated with a constraint schedule, and
wherein at least a weight parameter of a non-conforming connection is maintained at a first value so that the trained machine learning model has one or more non-conforming weight parameters set to the first value during a production phase.
|
|||||||||||||||||||||||||||||||||||||