US 11,907,717 B2
Techniques for efficiently transferring data to a processor
Andrew Kerr, Santa Clara, CA (US); Jack Choquette, Palo Alto, CA (US); Xiaogang Qiu, San Jose, CA (US); Omkar Paranjape, Austin, TX (US); Poornachandra Rao, Cedar Park, TX (US); Shirish Gadre, Fremont, CA (US); Steven J. Heinrich, Madison, AL (US); Manan Patel, San Jose, CA (US); Olivier Giroux, Santa Clara, CA (US); and Alan Kaatz, Santa Clara, CA (US)
Assigned to NVIDIA Corporation, Santa Clara, CA (US)
Filed by NVIDIA Corporation, Santa Clara, CA (US)
Filed on Feb. 8, 2023, as Appl. No. 18/107,374.
Application 18/107,374 is a division of application No. 17/363,561, filed on Jun. 30, 2021, granted, now 11,604,649.
Application 17/363,561 is a division of application No. 16/712,083, filed on Dec. 12, 2019, granted, now 11,080,051, issued on Aug. 3, 2021.
Claims priority of provisional application 62/927,417, filed on Oct. 29, 2019.
Claims priority of provisional application 62/927,511, filed on Oct. 29, 2019.
Prior Publication US 2023/0185570 A1, Jun. 15, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 9/30 (2018.01); G06F 9/52 (2006.01); G06F 12/0808 (2016.01); G06F 12/0888 (2016.01)
CPC G06F 9/30043 (2013.01) [G06F 9/3009 (2013.01); G06F 9/522 (2013.01); G06F 12/0808 (2013.01); G06F 12/0888 (2013.01); G06F 9/3004 (2013.01)] 21 Claims
OG exemplary drawing
 
1. A processing system comprising:
a multithreaded processor configured to concurrently execute a plurality of threads;
a plurality of data registers, each of the data registers assigned to executing threads; and
on-chip memory including a cache memory and a shared memory,
wherein the system is configured to, in response to at least one of the threads, execute a first instruction to retrieve data stored in a memory external to the processing system through the data registers and the cache memory and store the data into the shared memory, and in response to at least one of the threads, execute a second instruction to retrieve data stored in the memory external to the processing system and store the data into the shared memory without first storing the data in the data registers.