US 12,112,198 B2
	Asynchronous distributed data flow for machine learning workloads
Jeffrey Adgate Dean, Palo Alto, CA (US); Sudip Roy, San Jose, CA (US); Michael Acheson Isard, San Francisco, CA (US); Aakanksha Chowdhery, Mountain View, CA (US); Brennan Saeta, Kirkland, WA (US); Chandramohan Amyangot Thekkath, Palo Alto, CA (US); Daniel William Hurt, Westminster, CO (US); Hyeontaek Lim, Palo Alto, CA (US); Laurent El Shafey, Mountain View, CA (US); Parker Edward Schuh, Mountain View, CA (US); Paul Ronald Barham, San Francisco, CA (US); Ruoming Pang, New York, NY (US); Ryan Sepassi, Palo Alto, CA (US); Sanjay Ghemawat, Mountain View, CA (US); and Yonghui Wu, Fremont, CA (US)
Assigned to Google LLC, Mountain View, CA (US)
Filed by Google LLC, Mountain View, CA (US)
Filed on Dec. 15, 2022, as Appl. No. 18/082,415.
Application 18/082,415 is a continuation of application No. 17/738,909, filed on May 6, 2022, granted, now 11,556,381.
Claims priority of provisional application 63/186,031, filed on May 7, 2021.
Prior Publication US 2023/0118303 A1, Apr. 20, 2023
Int. Cl. G06F 17/10 (2006.01); G06F 9/48 (2006.01); G06N 3/063 (2023.01); G06N 3/08 (2023.01)

CPC G06F 9/4881 (2013.01) [G06N 3/063 (2013.01); G06N 3/08 (2013.01)]

18 Claims

1. A system comprising:

a plurality of accelerator islands, each accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators, wherein the hardware accelerators within each accelerator island are interconnected with one another over an interconnect network, and are connected to the hardware accelerators within another accelerator island over a data center network through their corresponding hosts; and

a respective scheduler for each of the plurality of accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to:

receive data representing a machine learning workload; and

assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island, wherein the respective scheduler is configured to, when the respective portion of the machine learning workload assigned to the accelerator island is a regular computation, schedule the respective portion of the machine learning workload using parallel asynchronous dispatch.