| CPC G06F 30/392 (2020.01) [G06F 30/398 (2020.01); G06N 3/08 (2013.01)] | 20 Claims |

|
1. A method of training a node placement neural network that comprises:
an encoder neural network that is configured to, at each of a plurality of time steps, receive an input representation comprising data representing a current state of a placement of a netlist of nodes on a surface of an integrated circuit chip as of the time step and process the input representation to generate an encoder output, and
a policy neural network configured to, at each of the plurality of time steps, receive an encoded representation generated from the encoder output generated by the encoder neural network and process the encoded representation to generate a score distribution over a plurality of positions on the surface of the integrated circuit chip, the method comprising:
obtaining supervised training data comprising:
a plurality of training input representations, each training input representation representing a respective placement of a respective netlist of nodes, and
for each training input representation, a respective target value of a reward function that measures a quality of the respective placement of the respective netlist of nodes; and
training at least the encoder neural network on the plurality of training input representations using the target values of the reward function through supervised learning;
after the training through supervised learning:
receiving a new netlist of nodes; and
training the node placement neural network on the new netlist of nodes through reinforcement learning.
|