US 11,995,029 B2
Multi-tile memory management for detecting cross tile access providing multi-tile inference scaling and providing page migration
Lakshminarayanan Striramassarma, Folsom, CA (US); Prasoonkumar Surti, Folsom, CA (US); Varghese George, Folsom, CA (US); Ben Ashbaugh, Folsom, CA (US); Aravindh Anantaraman, Folsom, CA (US); Valentin Andrei, San Jose, CA (US); Abhishek Appu, El Dorado Hills, CA (US); Nicolas Galoppo Von Borries, Portland, OR (US); Altug Koker, El Dorado Hills, CA (US); Mike Macpherson, Portland, OR (US); Subramaniam Maiyuran, Gold River, CA (US); Nilay Mistry, Bangalore (IN); Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US); Selvakumar Panneer, Portland, OR (US); Vasanth Ranganathan, El Dorado Hills, CA (US); Joydeep Ray, Folsom, CA (US); Ankur Shah, Folsom, CA (US); and Saurabh Tangri, Folsom, CA (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Appl. No. 17/428,527
Filed by Intel Corporation, Santa Clara, CA (US)
PCT Filed Mar. 14, 2020, PCT No. PCT/US2020/022836
§ 371(c)(1), (2) Date Aug. 4, 2021,
PCT Pub. No. WO2020/190798, PCT Pub. Date Sep. 24, 2020.
Claims priority of provisional application 62/819,337, filed on Mar. 15, 2019.
Claims priority of provisional application 62/819,435, filed on Mar. 15, 2019.
Claims priority of provisional application 62/819,361, filed on Mar. 15, 2019.
Prior Publication US 2022/0114096 A1, Apr. 14, 2022
Int. Cl. G06F 12/00 (2006.01); G06F 7/544 (2006.01); G06F 7/575 (2006.01); G06F 7/58 (2006.01); G06F 9/30 (2018.01); G06F 9/38 (2018.01); G06F 9/50 (2006.01); G06F 12/02 (2006.01); G06F 12/06 (2006.01); G06F 12/0802 (2016.01); G06F 12/0804 (2016.01); G06F 12/0811 (2016.01); G06F 12/0862 (2016.01); G06F 12/0866 (2016.01); G06F 12/0871 (2016.01); G06F 12/0875 (2016.01); G06F 12/0882 (2016.01); G06F 12/0888 (2016.01); G06F 12/0891 (2016.01); G06F 12/0893 (2016.01); G06F 12/0895 (2016.01); G06F 12/0897 (2016.01); G06F 12/1009 (2016.01); G06F 12/128 (2016.01); G06F 15/78 (2006.01); G06F 15/80 (2006.01); G06F 17/16 (2006.01); G06F 17/18 (2006.01); G06T 1/20 (2006.01); G06T 1/60 (2006.01); H03M 7/46 (2006.01); G06N 3/08 (2023.01); G06T 15/06 (2011.01)
CPC G06F 15/7839 (2013.01) [G06F 7/5443 (2013.01); G06F 7/575 (2013.01); G06F 7/588 (2013.01); G06F 9/3001 (2013.01); G06F 9/30014 (2013.01); G06F 9/30036 (2013.01); G06F 9/3004 (2013.01); G06F 9/30043 (2013.01); G06F 9/30047 (2013.01); G06F 9/30065 (2013.01); G06F 9/30079 (2013.01); G06F 9/3887 (2013.01); G06F 9/5011 (2013.01); G06F 9/5077 (2013.01); G06F 12/0215 (2013.01); G06F 12/0238 (2013.01); G06F 12/0246 (2013.01); G06F 12/0607 (2013.01); G06F 12/0802 (2013.01); G06F 12/0804 (2013.01); G06F 12/0811 (2013.01); G06F 12/0862 (2013.01); G06F 12/0866 (2013.01); G06F 12/0871 (2013.01); G06F 12/0875 (2013.01); G06F 12/0882 (2013.01); G06F 12/0888 (2013.01); G06F 12/0891 (2013.01); G06F 12/0893 (2013.01); G06F 12/0895 (2013.01); G06F 12/0897 (2013.01); G06F 12/1009 (2013.01); G06F 12/128 (2013.01); G06F 15/8046 (2013.01); G06F 17/16 (2013.01); G06F 17/18 (2013.01); G06T 1/20 (2013.01); G06T 1/60 (2013.01); H03M 7/46 (2013.01); G06F 9/3802 (2013.01); G06F 9/3818 (2013.01); G06F 9/3867 (2013.01); G06F 2212/1008 (2013.01); G06F 2212/1021 (2013.01); G06F 2212/1044 (2013.01); G06F 2212/302 (2013.01); G06F 2212/401 (2013.01); G06F 2212/455 (2013.01); G06F 2212/60 (2013.01); G06N 3/08 (2013.01); G06T 15/06 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A graphics processor having a multi-tile architecture, comprising:
a first graphics processing unit (GPU) having a memory and a memory controller;
a second graphics processing unit (GPU) having a memory; and
a cross-GPU fabric to communicatively couple the first and second GPUs, wherein the memory controller is configured to determine whether frequent cross tile memory accesses occur between the first GPU and the second GPU in the multi-GPU configuration and to cause initiation of a data transfer between the memory of the first GPU and the memory of the second GPU when frequent cross tile memory accesses occur between the first GPU and the second GPU, wherein the memory controller is configured to detect transfer patterns automatically including accesses to page N of the memory of the second GPU and to start transferring pages N+1 and N+2 prior to requests for pages N+1 and N+2.