US 11,989,536 B1
Methods and apparatus for automatic communication optimizations in a compiler based on a polyhedral representation
Muthu Manikandan Baskaran, Old Tappan, NJ (US); Richard A. Lethin, New York, NY (US); Benoit J. Meister, New York, NY (US); and Nicolas T. Vasilache, New York, NY (US)
Assigned to QUALCOMM Incorporated, San Diego, CA (US)
Filed by QUALCOMM Incorporated, San Diego, CA (US)
Filed on Jul. 28, 2021, as Appl. No. 17/387,871.
Application 17/387,871 is a continuation of application No. 15/822,996, filed on Nov. 27, 2017, granted, now 11,200,035.
Application 15/822,996 is a continuation of application No. 13/712,659, filed on Dec. 12, 2012, granted, now 9,830,133, issued on Nov. 28, 2017.
Claims priority of provisional application 61/569,413, filed on Dec. 12, 2011.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 8/41 (2018.01)
CPC G06F 8/41 (2013.01) [G06F 8/453 (2013.01); G06F 8/457 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A method of orchestrating data movements of a program on a multi-execution unit computing apparatus, the method comprising:
receiving in memory on a first computing apparatus, a computer program comprising a set of operations and at least one loop nest, the first computing apparatus comprising the memory and a processor;
transforming the computer program for execution on a second computing apparatus, the second computing apparatus comprising at least one main memory, at least one local memory, and at least one computation unit, each computation unit comprising at least one private memory region, the transformation comprising:
producing a tiled variant of the computer program;
generating operations to perform data movements for elements produced and consumed by tiles between the at least one main memory and the at least one local memory;
optimizing the operations to perform the data movements to reduce communication cost and memory traffic by eliminating redundant transfers based on placement functions and dependence information of the operations within the tiles; and
producing an optimized computer program for execution on the second computing apparatus,
wherein the redundant transfers elimination includes: a value stored in a local memory location addressable by at least two processing elements in the at least one computation unit is reused to replace one of the redundant transfers of the value from the at least one main memory to the at least one local memory.