US 12,468,978 B2
	Reinforcement learning in a processing element method and system thereof
Sameer Sadashiv Pawanekar, Indore (IN); and Upendra Narayan Tripathi, Bangalore (IN)
Assigned to US Technology International Pvt. Ltd., Trivandrum (IN)
Filed by US Technology International Pvt. Ltd., Trivandrum (IN)
Filed on Nov. 11, 2021, as Appl. No. 17/454,551.
Claims priority of application No. 202141042994 (IN), filed on Sep. 22, 2021.
Prior Publication US 2023/0087326 A1, Mar. 23, 2023
Int. Cl. G06N 20/00 (2019.01); G06N 5/02 (2023.01)

CPC G06N 20/00 (2019.01) [G06N 5/02 (2013.01)]

12 Claims

1. A method of reinforcement learning in a processing element, the method comprising:

receiving, by a receiving module implemented in the processing element physically arranged in an N×N array structure of processing elements on an integrated circuit, at least one reward;

computing, by a computing module implemented in the processing element, at least one Q-value in a two-dimensional array at time t_n, based on the at least one reward;

storing, by the computing module, the at least one Q-value;

repeating the Q-value computation in a three-dimensional array of the processing element at time t_n+1; and

performing, by a time-division multiplexing module implemented in the processing element, time-division multiplexing to replace the at least one Q-value in the two-dimensional array with at least one Q-value computed in the three-dimensional array, wherein performing the time-division multiplexing comprises mapping the three-dimensional Q-value computation onto a same hardware, of the integrated circuit, on which the two-dimensional Q-value computation is mapped, thereby reducing a silicon area of the integrated circuit.