US 12,333,311 B2
Cooperative group arrays
Greg Palmer, Cedar Park, TX (US); Gentaro Hirota, San Jose, CA (US); Ronny Krashinsky, Portola Valley, CA (US); Ze Long, San Jose, CA (US); Brian Pharris, Cary, NC (US); Rajballav Dash, San Jose, CA (US); Jeff Tuckey, Saratoga, CA (US); Jerome F. Duluk, Jr., Palo Alto, CA (US); Lacky Shah, Los Altos Hills, CA (US); Luke Durant, San Jose, CA (US); Jack Choquette, Palo Alto, CA (US); Eric Werness, San Jose, CA (US); Naman Govil, Sunnyvale, CA (US); Manan Patel, San Jose, CA (US); Shayani Deb, Seattle, WA (US); Sandeep Navada, San Jose, CA (US); John Edmondson, Arlington, MA (US); Prakash Bangalore Prabhakar, San Jose, CA (US); Wish Gandhi, Sunnyvale, CA (US); Ravi Manyam, San Ramon, CA (US); Apoorv Parle, San Jose, CA (US); Olivier Giroux, Santa Clara, CA (US); Shirish Gadre, Fremont, CA (US); and Steve Heinrich, Madison, AL (US)
Assigned to NVIDIA Corporation, Santa Clara, CA (US)
Filed by NVIDIA Corporation, Santa Clara, CA (US)
Filed on Mar. 10, 2022, as Appl. No. 17/691,621.
Prior Publication US 2023/0289215 A1, Sep. 14, 2023
Int. Cl. G06F 9/38 (2018.01); G06F 9/30 (2018.01); G06F 9/48 (2006.01); G06F 9/54 (2006.01)
CPC G06F 9/3888 (2023.08) [G06F 9/3009 (2013.01); G06F 9/3851 (2013.01); G06F 9/4881 (2013.01); G06F 9/544 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A processing system comprising:
a work distributor hardware circuit configured to launch a collection of thread groups on a set of plural processors while providing a hardware-based guarantee that all thread groups of the collection can be launched at the same time;
the work distributor hardware circuit being further configured to speculatively launch the thread groups in the collection to confirm that the thread groups are able to launch and/or run concurrently on the set of plural processors before launching any of the thread groups in the collection.