CPC H02J 3/06 (2013.01) [G06N 20/00 (2019.01); G06Q 50/06 (2013.01)] | 2 Claims |
1. A method for intelligently adjusting a power flow based on a Q-learning algorithm, comprising:
step 1: converting a variable, an action, and a goal in a power grid to a state, an action, and a reward in the algorithm, respectively;
step 2: selecting an action from an action space, giving an immediate reward based on a result of power flow calculation, and correcting a next state;
step 3: forwardly observing a next exploration action based on a strategy in the Q-learning algorithm;
step 4: updating a Q value in a corresponding position in a Q-value table based on the obtained reward;
step 5: if a final state is not reached, going back to step 2, or if a final state is reached, increasing the number of iterations by 1;
step 6: if the number of iterations does not reach predetermined value K, that is, Episode<K, going back to step 2, or if the number of iterations reaches predetermined value K, that is, Episode=K, outputting the Q-value table; and
step 7: outputting an optimal unit combination, and adjusting power flow for an IEEE 39-bus standard test system and an actually-operating power system based on the optimal unit combination;
wherein the step 1 comprises:
using active power output of each generation unit as the state and establishing an expression of a state space representing a combination of states of all generation units, as shown in formula (1):
S={Power output of generation unit 1,Power output of generation unit 2, . . . ,Power output of generation unit N} (1),
wherein
S denotes the state space, and a value of N is a positive integer, and power output of each generation unit is a respective initial power output or 0;
establishing an expression of an action space in which powering-on or powering-off of each generation unit is used as the action, as shown in formula (2):
A={Powering on of generation unit 1,Powering off of generation unit 1, . . . ,Powering off of generation unit N} (2),
wherein
A denotes the action space, and a value of N is a positive integer, and wherein when each generation unit reaches a power-on state or a power-off state, an action corresponding to the state of the generation unit is removed from the action space; and
reward design: establishing expression R of a reward design for adjusting power flow from non-convergent to convergent and simultaneously minimizing a network loss of the power grid, as shown in formula (3):
![]() |