US 12,222,122 B2
	Optimized HVAC control using domain knowledge combined with deep reinforcement learning (DRL)
Sagar Kumar Verma, Pune (IN); Supriya Agrawal, Pune (IN); Venkatesh Ramanathan, Pune (IN); Ulka Shrotri, Pune (IN); Srinarayana Nagarathinam, Chennai (IN); Rajesh Jayaprakash, Chennai (IN); and Aabriti Dutta, Chennai (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Sep. 26, 2022, as Appl. No. 17/935,420.
Claims priority of application No. 202121047480 (IN), filed on Oct. 19, 2021.
Prior Publication US 2023/0125620 A1, Apr. 27, 2023
Int. Cl. G05B 13/02 (2006.01); F24F 11/63 (2018.01); F24F 120/20 (2018.01)

CPC F24F 11/63 (2018.01) [G05B 13/0265 (2013.01); F24F 2120/20 (2018.01)]

12 Claims

1. A processor implemented method for an optimized Heating, Ventilation, and Air-conditioning (HVAC) control of a building, the method comprising:

receiving, via an Expressive Decision Tables (EDT) engine executed by one or more hardware processors, a plurality of HVAC parameters of the building, measured for a current time instance (t), wherein the plurality of HVAC parameters comprises: (i) a return air temperature (RAT), (ii) an occupancy count (OpCnt), (iii) an outside air temperature (OAT), (iv) an occupant discomfort measured in terms of predicted percentage dissatisfied (PPD) metric, (v) a HVAC energy consumption (E_HVAC) and (vi) a current time, and wherein the occupant discomfort and the HVAC energy consumption (E_HVAC) are measured with respect to a previous action item (a_t−1) triggered at a previous time instant (t−1);

analyzing, by the EDT engine executed by the one or more hardware processors, the plurality of HVAC parameters in accordance with a rule set predefined for the HVAC control of the building to determine an action space (A_t) comprising more than one action items (a_t1. . . a_tn) for the current time instance (t) corresponding to more than one rules that are satisfied from among the rule set, wherein the rule set is predefined in the EDT engine via a formal requirement specifier consumable by the EDT engine to capture domain knowledge of the building for the HVAC control, and wherein presence of more than one or more action items is indicative of presence of one or more conflicts in the domain knowledge;

receiving, by a Deep Reinforcement Learning (DRL) agent executed by the one or more hardware processors, the action space (A_t), a current state (S_t) of the building from the EDT engine, and a current reward (R_t) received by the DRL agent for the previous action item (a_t−1), wherein the current state (S_t) is represented by a state tuple {OAT_t, RAT_t, OpCnt_t, τ} comprising one or more HVAC parameters from among the plurality of HVAC parameters, and τ representing time-of-day capturing time related variations in the one or more plurality of HVAC parameters; and

selecting, by the DRL agent executed by the one or more hardware processors, an optimal control action item from among the action space (A_t) comprising the one or more action items (a_t1. . . a_tn) that resolves the conflicts by maximizing a cumulative reward received over an episode, wherein a target cumulative reward is computed for current state action pair (S_t, a_t) providing an expected return over the episode starting from the current state S_t, following a policy, taking an action item a_t.