US 12,353,863 B2
Method and system for converting a single-threaded software program into an application-specific supercomputer
Kemal Ebcioglu, Cheshire, CT (US); and Emre Kultursay, Kirkland, WA (US)
Assigned to Global Supercomputing Corporation, Yorktown Heights, NY (US)
Filed by Kemal Ebcioglu, Cheshire, CT (US); and Emre Kultursay, Kirkland, WA (US)
Filed on Jan. 16, 2023, as Appl. No. 18/097,420.
Application 18/097,420 is a continuation of application No. 17/411,116, filed on Aug. 25, 2021, granted, now 11,579,854.
Application 17/411,116 is a continuation of application No. 16/819,405, filed on Mar. 16, 2020, granted, now 11,132,186, issued on Sep. 28, 2021.
Application 16/819,405 is a continuation of application No. 16/166,164, filed on Oct. 22, 2018, granted, now 10,642,588, issued on May 5, 2020.
Application 16/166,164 is a continuation of application No. 15/257,319, filed on Sep. 6, 2016, granted, now 10,146,516, issued on Dec. 4, 2018.
Application 15/257,319 is a continuation of application No. 14/581,169, filed on Dec. 23, 2014, granted, now 9,495,223, issued on Nov. 15, 2016.
Application 14/581,169 is a continuation of application No. 13/296,232, filed on Nov. 15, 2011, granted, now 8,966,457, issued on Feb. 24, 2015.
Prior Publication US 2023/0153087 A1, May 18, 2023
Int. Cl. G06F 8/41 (2018.01); G06F 8/40 (2018.01); G06F 9/52 (2006.01); G06F 15/173 (2006.01); G06F 30/30 (2020.01); G06F 30/323 (2020.01); G06F 30/392 (2020.01); G06F 115/10 (2020.01)
CPC G06F 8/4452 (2013.01) [G06F 8/40 (2013.01); G06F 8/452 (2013.01); G06F 9/52 (2013.01); G06F 15/17381 (2013.01); G06F 30/30 (2020.01); G06F 30/323 (2020.01); G06F 30/392 (2020.01); G06F 2115/10 (2020.01)] 2 Claims
OG exemplary drawing
 
1. A hardware system automatically compiled, by a compiler, from a single-threaded software program;
where the single-threaded software program includes a recursive function f, which is defined to be a function that calls itself;
where the recursive function f is implemented within the hardware system by a customized hardware unit comprising a finite state machine;
where a plurality of copies of the customized hardware unit are interconnected by a task network, so that each customized hardware unit on the task network is able to send a task for implementing the recursive function f to any other customized hardware unit on the task network, or receive a task for implementing the recursive function f from any other customized hardware unit on the task network;
where the recursive function f calling itself is implemented within the customized hardware unit as follows:
(i) if the task network is accepting tasks at a point of call, sending a task for implementing the recursive function f over the task network toward another customized hardware unit on the task network; and
(ii) otherwise, if the task network is not accepting tasks at the point of call, performing the recursive function f locally inside the customized hardware unit with an ordinary recursive call, without sending any task to the task network at the point of call;
for purposes of avoiding a deadlock resulting from all customized hardware units on the task network having to wait after the task network gets filled with tasks, ensuring forward progress, and achieving high task parallelism in addition to fine-grain parallelism;
where the hardware system is functionally equivalent to the single-threaded software program that the hardware system is compiled from; and
where design of the hardware system is partitioned and further comprises:
a. a plurality of finite-state machines and a plurality of memory units;
b. networks connected to the plurality of finite-state machines and the plurality of memory units for managing task invocations in hardware and memory data transfers in hardware;
c. a hardware synchronization unit to ensure executing memory instructions in a same order as within the single threaded software program; and
d. a coherent memory hierarchy which signals completion of memory instructions for supporting the hardware synchronization unit.