US 12,265,903 B2
	Distributing tensor computations across computing devices
Noam M. Shazeer, Palo Alto, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Oct. 5, 2020, as Appl. No. 17/063,034.
Application 17/063,034 is a continuation of application No. 16/532,381, filed on Aug. 5, 2019, granted, now 10,796,225.
Claims priority of provisional application 62/714,586, filed on Aug. 3, 2018.
Prior Publication US 2021/0019626 A1, Jan. 21, 2021
Int. Cl. G06N 3/063 (2023.01); G06F 16/901 (2019.01); G06F 18/214 (2023.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01); G06F 17/11 (2006.01)

CPC G06N 3/063 (2013.01) [G06F 16/9024 (2019.01); G06F 18/214 (2023.01); G06N 3/04 (2013.01); G06N 3/08 (2013.01); G06F 17/11 (2013.01)]

20 Claims

1. A computer-implemented method comprising:

assigning, for each of one or more operations from a plurality of operations and using i) a computational graph that represents the plurality of operations and ii) specification data that specifies a distribution of one or more dimensions of a tensor to a plurality of components of a system, the operation to a component that will perform the operation on a portion of a corresponding dimension from the one or more dimensions of the tensor specified by the specification data,

the computational graph comprising a plurality of nodes and a plurality of edges, wherein each node from the plurality of nodes represents a respective operation from the plurality of operations, and each edge from the plurality of edges connects a respective first node to a respective second node that represents an operation that receives, as input, an output of an operation represented by the respective first node,

the specification data defining a mapping of the one or more dimensions of the tensor to a corresponding component from the plurality of components of the system; and

causing one or more components from the plurality of components to perform the assigned operations on corresponding dimensions of the tensor;

wherein the specification data is generated based on a received identifier that identifies at least one parallelism technique for the system to implement.