CPC G05B 13/0265 (2013.01) [G05B 6/02 (2013.01); G06N 20/00 (2019.01)] | 21 Claims |
1. A reinforcement learning process for automatically tuning proportional-integral-derivative (PID) coefficients, the process performing the steps of:
operating a PID controller at a first frequency to minimize an error between a variable setpoint and a process output;
training a policy by a reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients; and
operating a reinforcement learning model at a second frequency, the reinforcement learning model performing the following steps:
receiving summary statistics about error terms from the PID controller and sensed information on the environment to calculate an observation;
selecting an action based on the observation by feeding the observation into the trained policy;
predicting a result of taking the action, the action including changing the PID coefficients; and
updating the policy by the reward function derived from sensed information after the policy acts on the PID controller by updating the PID coefficients.
|