CPC G06N 3/063 (2013.01) [G06F 9/4887 (2013.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01)] | 18 Claims |
1. A data processing system comprising a plurality of processing nodes configured to perform operations to implement a process for performing an iteration of a neural network over a plurality of time periods, the data processing system comprising at least one data storage connected to at least one of the plurality of processing nodes,
wherein at least one processor of a first of the plurality of processing nodes is configured to, during a first of the plurality of time periods, load weights associated with a second layer of the neural network and required during a subsequent one of the plurality of time periods for performing calculations to determine activations of the second layer of the neural network from the at least one data storage,
wherein at least one processor of a second of the plurality of processing nodes is configured to, during the first of the plurality of time periods, perform calculations to determine activations of a first layer of the neural network, using weights associated with the first layer loaded from the at least one data storage during an earlier one of the plurality of time periods, wherein the at least one processor of at least one of the first or the second of the plurality of processing nodes is configured to:
following the first of the plurality of time periods, receive data from the other of the first or the second of the plurality of processing nodes, the data including at least one of the activations of the first layer or the weights associated with the second layer;
during the subsequent one of the plurality of time periods, perform calculations to determine the activations of the second layer of the neural network using the data received from the other of the first or the second of the plurality of processing nodes; and
wherein the plurality of processing nodes are configured to, as part of the process, use the activations of the second layer of the neural network to determine activations of one or more further layers of the neural network, including determining output values of the neural network,
wherein the plurality of processing nodes are configured to use the process to perform training of the neural network by:
determining the output values of the neural network by performing the process for performing the iteration of the neural network;
comparing the output values of the neural network to labels of the neural network to calculate loss; and
performing a backward propagation through the neural network of the loss to produce a revised set of weights of the neural network;
wherein each of the plurality of processing nodes is configured to alternate between operating in a compute phase in which compute instructions are executed by at least one processor of the processing node and an exchange phase in which data is exchanged between the plurality of processing nodes,
wherein each of the at least one processor of the plurality of processing nodes is configured to execute a compiled code sequence comprising a synchronisation instruction indicating a barrier between the compute phase and the exchange phase, and
wherein the receiving data from the other of the first or the second of the plurality of processing nodes is performed during the exchange phase for the first and the second of the plurality of processing nodes.
|