US 12,188,672 B2
Method for controlling air conditioning device based on delayed reward
Minseop Kim, Seoul (KR); Hongje Park, Gyeonggi-do (KR); Jongwon Park, Seoul (KR); Sanghyeok Choi, Seoul (KR); Jeonghoon Lee, Sejong-si (KR); Joongjae Kim, Daejeon (KR); and Ikchan Ju, Daejeon (KR)
Assigned to MAKINAROCKS CO., LTD., Seoul (KR); and HANON SYSTEMS, Daejeon (KR)
Filed by MakinaRocks Co., Ltd., Seoul (KR); and Hanon Systems, Daejeon (KR)
Filed on Oct. 23, 2023, as Appl. No. 18/492,215.
Claims priority of application No. 10-2022-0151351 (KR), filed on Nov. 14, 2022; and application No. 10-2023-0059155 (KR), filed on May 8, 2023.
Prior Publication US 2024/0159413 A1, May 16, 2024
Int. Cl. F24F 11/62 (2018.01); B60H 1/00 (2006.01); B60H 1/32 (2006.01); F24F 110/10 (2018.01); F24F 110/20 (2018.01); F24F 110/50 (2018.01); G05B 13/02 (2006.01); G06N 20/00 (2019.01)
CPC F24F 11/62 (2018.01) [B60H 1/0073 (2019.05); B60H 1/3205 (2013.01); F24F 2110/10 (2018.01); F24F 2110/20 (2018.01); F24F 2110/50 (2018.01); G05B 13/0265 (2013.01); G06N 20/00 (2019.01)] 9 Claims
OG exemplary drawing
 
1. A method for controlling an air conditioning device, performed by a computing device, the method comprising:
determining a control action for the air conditioning device at a first time point by using a reinforcement learning agent, wherein the control action is used to control the air conditioning device, and wherein the air conditioning device includes a heating, ventilation, and air conditioning (HVAC) system or a thermal management system;
determining a reward for the control action at the first time point based on a delay time by using the reinforcement learning agent, wherein the delay time corresponds to a time for a target temperature to be reached using the air conditioning device, and wherein determining the reward for the control action at the first time point is based on an assumption that the control action for the air conditioning device at the first time point is maintained to a third time point; and
performing reinforcement learning related to the control of the air conditioning device based on the determined reward,
wherein a time point when the delay time elapses from the first time point corresponds to a second time point, and
wherein the reward for the control action at the first time point is calculated while excluding situations after the first time point and before the second time point, and
generating a control signal to control the air conditioning device based on the control action.