US 11,861,474 B2
	Dynamic placement of computation sub-graphs
Jakob Nicolaus Foerster, San Francisco, CA (US); and Matthew Sharifi, Kilchberg (CH)
Assigned to DeepMind Technologies Limited, London (GB)
Filed by DeepMind Technologies Limited, London (GB)
Filed on Jan. 6, 2023, as Appl. No. 18/094,308.
Application 18/094,308 is a continuation of application No. 16/761,653, granted, now 11,551,144, previously published as PCT/US2018/015896, filed on Jan. 30, 2018.
Prior Publication US 2023/0237375 A1, Jul. 27, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 20/00 (2019.01); G06F 16/901 (2019.01)

CPC G06N 20/00 (2019.01) [G06F 16/9024 (2019.01)]

20 Claims

1. A computer-implemented method for training a machine learning model, the training comprising:

obtaining data characterizing a computational graph comprising a plurality of nodes representing operations and directed edges representing data dependencies;

receiving context information for a computational environment in which to perform the operations of the computational graph, the context information including data representing a network connecting a plurality of computing devices in the computational environment;

generating, as a training example for training the machine learning model, a model input based at least upon the context information and the data characterizing the computational graph, wherein the machine learning model has a plurality of model parameter values;

processing the training example using the machine learning model in accordance with the plurality of model parameter values to generate an output defining placement assignments of the operations of the computational graph to the plurality of computing devices, each placement assignment of the placement assignments specifying an assignment of a respective operation in the computational graph to be performed by one or more respective computing devices in the computational environment;

calculating a reward for the output that measures a quality of the placement assignments defined in the output, and

updating the plurality of model parameter values based on the reward using a reinforcement learning algorithm.