| CPC G06N 3/082 (2013.01) [G06N 3/04 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/047 (2023.01)] | 20 Claims |

|
1. A method of determining an architecture for a neural network for performing a particular neural network task, the method comprising:
generating, in accordance with current values of a plurality of controller parameters, a batch of output sequences, each output sequence in the batch specifying a respective subset of a plurality of components of a large neural network, wherein the large neural network includes the respective subset of the plurality of components and one or more other components of the plurality of components that are not in the respective subset, wherein the large neural network has a plurality of large network parameters, wherein the large neural network comprises a plurality of layers, and wherein the respective subset of the plurality of components of the large neural network forms a smaller neural network that (i) includes only the respective subset of the plurality of components, (ii) does not include the one or more other components of the plurality of components of the large neural network that are not in the respective subset and (iii) has, for each component in the respective subset, current values of the large neural network parameters for that component;
for each output sequence in the batch:
determining a performance metric of the smaller neural network on the particular neural network task in accordance with current values of the large network parameters for the components in the smaller neural network; and
using, by an updating engine, the performance metrics for the output sequences in the batch to adjust the current values of the controller parameters;
generating, in accordance with the adjusted values of the controller parameters, a new output sequence that specifies a new subset of the plurality of components of the large neural network, the new subset of the plurality of components forming a new neural network; and
training, by a training engine in collaboration with the updating engine, the new neural network with only the components in the new subset specified by the new output sequence on training data to determine adjusted values of the large network parameters for the components in the new subset.
|