US 12,248,406 B2
Management of IOMMU TLB entries for compute units of a SIMD processing device
Lide Duan, Sunnyvale, CA (US); Qichen Zhang, Shanghai (CN); Shijian Zhang, Beijing (CN); and Yen-Kuang Chen, Palo Alto, CA (US)
Assigned to ALIBABA (CHINA) CO., LTD., Zhejiang Province (CN)
Filed by ALIBABA (CHINA) CO., LTD., Zhejiang Province (CN)
Filed on Dec. 13, 2022, as Appl. No. 18/065,175.
Prior Publication US 2024/0045809 A1, Feb. 8, 2024
Int. Cl. G06F 12/1027 (2016.01); G06F 12/1072 (2016.01); G06F 12/1081 (2016.01)
CPC G06F 12/1027 (2013.01) [G06F 12/1072 (2013.01); G06F 12/1081 (2013.01); G06F 2212/1016 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computing system, comprising:
a memory;
a slave computing device, comprising:
a plurality of computing units, wherein each computing unit is configured to perform multiple computations in parallel according to a single instruction multiple data (SIMD) manner; and
a first translation lookaside buffer (TLB), configured to store a plurality of virtual address entries; and
a master computing device, comprising:
a memory controller, configured to perform a read operation and a write operation to the memory;
at least one processing unit, configured to access the memory via the memory controller to execute a program; and
an input-output memory management unit (IOMMU), comprising a second translation lookaside buffer, configured to store a plurality of virtual address entries, wherein each virtual address entry is configured to store a virtual address requested by the slave computing device, a physical address corresponding to the virtual address, a recent use time and a dependent workload, wherein the virtual address is used in a specific instruction performed by a specific computing unit of the plurality of computing units, and the dependent workload of the virtual address is an amount of virtual address translations requested by the specific computing unit to perform the specific instruction;
wherein:
when the plurality of computing units access a first virtual address, the plurality of computing units transfer the first virtual address to the first translation lookaside buffer to obtain a first physical address corresponding to the first virtual address;
when the first translation lookaside buffer does not store a virtual address entry comprising the first virtual address, the first translation lookaside buffer is configured to send a first translation request to the input-output memory management unit to obtain the first physical address corresponding to the first virtual address;
when the input-output memory management unit receives the first translation request, and the second translation lookaside buffer does not store a virtual address entry comprising the first virtual address, the input-output memory management unit is configured to traverse a plurality of page tables of the memory controller to obtain the first physical address corresponding to the first virtual address, select a first virtual address entry from the plurality of virtual address entries according to a recent use time and a dependent workload of each of the plurality of virtual address entries, and clear the first virtual address entry to store the first virtual address and the first physical address.