| CPC F24F 11/63 (2018.01) [G05B 13/0265 (2013.01); F24F 2120/20 (2018.01)] | 12 Claims |

|
1. A processor implemented method for an optimized Heating, Ventilation, and Air-conditioning (HVAC) control of a building, the method comprising:
receiving, via an Expressive Decision Tables (EDT) engine executed by one or more hardware processors, a plurality of HVAC parameters of the building, measured for a current time instance (t), wherein the plurality of HVAC parameters comprises: (i) a return air temperature (RAT), (ii) an occupancy count (OpCnt), (iii) an outside air temperature (OAT), (iv) an occupant discomfort measured in terms of predicted percentage dissatisfied (PPD) metric, (v) a HVAC energy consumption (EHVAC) and (vi) a current time, and wherein the occupant discomfort and the HVAC energy consumption (EHVAC) are measured with respect to a previous action item (at−1) triggered at a previous time instant (t−1);
analyzing, by the EDT engine executed by the one or more hardware processors, the plurality of HVAC parameters in accordance with a rule set predefined for the HVAC control of the building to determine an action space (At) comprising more than one action items (at1 . . . atn) for the current time instance (t) corresponding to more than one rules that are satisfied from among the rule set, wherein the rule set is predefined in the EDT engine via a formal requirement specifier consumable by the EDT engine to capture domain knowledge of the building for the HVAC control, and wherein presence of more than one or more action items is indicative of presence of one or more conflicts in the domain knowledge;
receiving, by a Deep Reinforcement Learning (DRL) agent executed by the one or more hardware processors, the action space (At), a current state (St) of the building from the EDT engine, and a current reward (Rt) received by the DRL agent for the previous action item (at−1), wherein the current state (St) is represented by a state tuple {OATt, RATt, OpCntt, τ} comprising one or more HVAC parameters from among the plurality of HVAC parameters, and τ representing time-of-day capturing time related variations in the one or more plurality of HVAC parameters; and
selecting, by the DRL agent executed by the one or more hardware processors, an optimal control action item from among the action space (At) comprising the one or more action items (at1 . . . atn) that resolves the conflicts by maximizing a cumulative reward received over an episode, wherein a target cumulative reward is computed for current state action pair (St, at) providing an expected return over the episode starting from the current state St, following a policy, taking an action item at.
|