US 11,709,701 B2
	Iterative learning processes for executing code of self-optimizing computation graphs based on execution policies
David Williams, Aptos, CA (US)
Assigned to PAYPAL, INC., San Jose, CA (US)
Filed by PayPal, Inc., San Jose, CA (US)
Filed on Dec. 31, 2019, as Appl. No. 16/732,029.
Prior Publication US 2021/0200575 A1, Jul. 1, 2021
Int. Cl. G06F 9/46 (2006.01); G06N 20/00 (2019.01)

CPC G06F 9/463 (2013.01) [G06N 20/00 (2019.01)]

19 Claims

8. A method, comprising:

receiving code of an application, the code structured as a plurality of instructions in a computation graph that corresponds to operational logic of the application;

processing the code according to an iterative learning process, each iteration of the iterative learning process comprising:

determining that a state of a computing environment in which the code is being processed is different from a previous state of the computing environment in a previous iteration of the iterative learning process based on a number of processing cores in the computing environment;

changing an exploration rate in response to the determining that the state of the computing environment is different from the previous state;

executing the plurality of instructions of the computation graph according to an execution policy in the state of the computing environment being different, wherein the execution policy indicates a first subset of the plurality of instructions to be executed in parallel with a second subset of the plurality of instructions;

determining an execution time for executing the plurality of instructions of the computation graph in the state of the computing environment;

based on the execution time and the exploration rate associated with the iterative learning process, adjusting the execution policy to reduce the execution time in a subsequent iteration, the adjusted execution policy indicating whether the first subset is to be executed in parallel or in series with the second subset based on the exploration rate, wherein the exploration rate indicates an amount of an adjustment to the execution policy of the code in the execution policy based on a difference in the state of the computing environment from the number of the processing cores, and wherein the execution policy includes the execution time associated with the exploration rate; and

decreasing the exploration rate based on the adjusted execution policy and a decay rate that is dynamically adjusted based on the iterative learning process, the decreased exploration rate not below a minimum amount that is allowable for the processing the code;

determining, based on the decay rate and the processing the code according to the iterative learning process, an optimization threshold for the execution policy has been met or exceeded when the exploration rate is at or near a zero amount; and

determining, based on the iteratively processing the code and the determining that the optimization threshold has been met or exceeded, a final execution policy.