Abstract:
With the application of wide-area measurement systems in the transient stability control, the random time delay of widearea information during the control process may cause the uncertainty of power system state during control. Moreover, the dimension of discrete decision variables for machine tripping and load shedding is extremely high, and the online emergency control decision-making of the power grid is facing challenge. Therefore, the transient stability emergency control problem is modeled as a Markov decision problem, and an decision-making method combining the deep Q-learning network(DQN) reinforcement learning and transient energy function is proposed, which can deal with the time-delay uncertainty of emergency control through the multistep sequential decision-making process. The reward function is composed of a short-term reward function considering the control objectives and constraints, and a long-term reward function considering the stability. The potential energy index of the transient energy function is introduced into the reward function to improve the learning efficiency. With the objective of maximizing the cumulative rewards, the optimal emergency control strategy is learned in the discrete action space by DQN algorithm to solve the transient stability emergency control problem. The effectiveness of the proposed method in the emergency control decision-making is verified by an IEEE 39-bus system.