US 12,153,385 B2
	Methods and systems to adapt PID coefficients through reinforcement learning
Samuel Barrett, Cambridge, MA (US); James MacGlashan, Riverside, RI (US); Varun Kompella, Kanata (CA); Peter Wurman, Acton, MA (US); Goker Erdogan, London (GB); and Fabrizio Santini, Arlington, MA (US)
Assigned to SONY GROUP CORPORATION, Tokyo (JP); and SONY CORPORATION OF AMERICA, New York, NY (US)
Filed by Sony Group Corporation, Tokyo (JP); and Sony Corporation of America, New York, NY (US)
Filed on May 7, 2021, as Appl. No. 17/314,351.
Prior Publication US 2022/0365493 A1, Nov. 17, 2022
Int. Cl. G05B 13/02 (2006.01); G05B 6/02 (2006.01); G06N 20/00 (2019.01)

CPC G05B 13/0265 (2013.01) [G05B 6/02 (2013.01); G06N 20/00 (2019.01)]

21 Claims

1. A reinforcement learning process for automatically tuning proportional-integral-derivative (PID) coefficients, the process performing the steps of:

operating a PID controller at a first frequency to minimize an error between a variable setpoint and a process output;

training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and

operating a reinforcement learning model at a second frequency, the reinforcement learning model performing the following steps:

receiving summary statistics about error terms from the PID controller and sensed information on the environment to calculate an observation;

selecting an action based on the observation by feeding the observation into the trained policy;

predicting a result of taking the action, the action including changing the PID coefficients; and

updating the policy by the reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients.