| CPC G06N 20/00 (2019.01) [G06N 5/02 (2013.01)] | 12 Claims |

|
1. A method of reinforcement learning in a processing element, the method comprising:
receiving, by a receiving module implemented in the processing element physically arranged in an N×N array structure of processing elements on an integrated circuit, at least one reward;
computing, by a computing module implemented in the processing element, at least one Q-value in a two-dimensional array at time tn, based on the at least one reward;
storing, by the computing module, the at least one Q-value;
repeating the Q-value computation in a three-dimensional array of the processing element at time tn+1; and
performing, by a time-division multiplexing module implemented in the processing element, time-division multiplexing to replace the at least one Q-value in the two-dimensional array with at least one Q-value computed in the three-dimensional array, wherein performing the time-division multiplexing comprises mapping the three-dimensional Q-value computation onto a same hardware, of the integrated circuit, on which the two-dimensional Q-value computation is mapped, thereby reducing a silicon area of the integrated circuit.
|