CPC F24F 11/62 (2018.01) [B60H 1/0073 (2019.05); B60H 1/3205 (2013.01); F24F 2110/10 (2018.01); F24F 2110/20 (2018.01); F24F 2110/50 (2018.01); G05B 13/0265 (2013.01); G06N 20/00 (2019.01)] | 9 Claims |
1. A method for controlling an air conditioning device, performed by a computing device, the method comprising:
determining a control action for the air conditioning device at a first time point by using a reinforcement learning agent, wherein the control action is used to control the air conditioning device, and wherein the air conditioning device includes a heating, ventilation, and air conditioning (HVAC) system or a thermal management system;
determining a reward for the control action at the first time point based on a delay time by using the reinforcement learning agent, wherein the delay time corresponds to a time for a target temperature to be reached using the air conditioning device, and wherein determining the reward for the control action at the first time point is based on an assumption that the control action for the air conditioning device at the first time point is maintained to a third time point; and
performing reinforcement learning related to the control of the air conditioning device based on the determined reward,
wherein a time point when the delay time elapses from the first time point corresponds to a second time point, and
wherein the reward for the control action at the first time point is calculated while excluding situations after the first time point and before the second time point, and
generating a control signal to control the air conditioning device based on the control action.
|