US 11,676,035 B2
	Learning non-differentiable weights of neural networks using evolutionary strategies
Karel Lenc, London (GB); Karen Simonyan, London (GB); Tom Schaul, London (GB); and Erich Konrad Elsen, San Francisco, CA (US)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on Jan. 23, 2020, as Appl. No. 16/751,169.
Claims priority of provisional application 62/796,066, filed on Jan. 23, 2019.
Prior Publication US 2020/0234142 A1, Jul. 23, 2020
Int. Cl. G06N 3/08 (2023.01); G06N 3/086 (2023.01); G06N 3/044 (2023.01)

CPC G06N 3/086 (2013.01) [G06N 3/08 (2013.01); G06N 3/044 (2023.01)]

20 Claims

1. A method of training a neural network having a plurality of differentiable weights and a plurality of non-differentiable weights to determine trained values of the plurality of differentiable weights and the plurality of non-differentiable weights, the method comprising:

maintaining (i) current values of the plurality of differentiable weights and (ii) current values of a plurality of distribution parameters;

repeatedly performing the following operations:

generating a plurality of non-differentiable weight value samples in accordance with the current values of the plurality of distribution parameters, wherein each non-differentiable weight value sample of the plurality of non-differentiable weight value samples defines a respective value for each of the plurality of non-differentiable weights of the neural network;

assigning each of the plurality of non-differentiable weight value samples to a respective worker computing device of a plurality of worker devices;

for each non-differentiable weight value sample of the plurality of non-differentiable weight value samples:

determining, by the respective worker computing device to which the non-differentiable weight value sample is assigned, a fitness of the neural network when the plurality of differentiable weights are set to the current values of the plurality of differentiable weights and the plurality of non-differentiable weights are set to the respective values defined by the non-differentiable weight value sample, and

determining, by the respective worker computing device to which the non-differentiable weight value sample is assigned and while the plurality of differentiable weights are set to the current values of the plurality of differentiable weights and the plurality of non-differentiable weights are set to the respective values defined by the non-differentiable weight value sample, a respective update to the current values of the plurality of differentiable weights using a machine learning gradient-based training technique;

updating the current values of the plurality of differentiable weights using the respective update to the current values of the plurality of differentiable weights determined for each of the non-differentiable weight value samples;

determining, using an evolution strategies (ES) technique, an update to the current values of the plurality of distribution parameters using the fitness determined for each non-differentiable weight value sample; and

updating the current values of the plurality of distribution parameters using the update to the current values of the plurality of distribution parameters.