王一迪, 李立新, 於益军, 杨楠, 刘蒙, 李桐. 基于深度强化学习的电力系统安全校正控制[J]. 电力系统自动化, 2023, 47(12): 121-129.
引用本文: 王一迪, 李立新, 於益军, 杨楠, 刘蒙, 李桐. 基于深度强化学习的电力系统安全校正控制[J]. 电力系统自动化, 2023, 47(12): 121-129.
WANG Yidi, LI Lixin, YU Yijun, YANG Nan, LIU Meng, LI Tong. Power System Security Correction Control Based on Deep Reinforcement Learning[J]. Automation of Electric Power Systems, 2023, 47(12): 121-129.
Citation: WANG Yidi, LI Lixin, YU Yijun, YANG Nan, LIU Meng, LI Tong. Power System Security Correction Control Based on Deep Reinforcement Learning[J]. Automation of Electric Power Systems, 2023, 47(12): 121-129.

基于深度强化学习的电力系统安全校正控制

Power System Security Correction Control Based on Deep Reinforcement Learning

  • 摘要: 新型电力系统中,源荷双侧的不确定性使得电网潮流波动大幅增加。电力系统安全校正控制能够消除系统潮流越限,保证电网安全运行。然而,传统安全校正控制方法约束众多、计算复杂,且面对大规模电网时难以进行实时多步决策。因此,提出一种基于深度确定性策略梯度(DDPG)的两阶段训练方法来确定安全校正控制策略。首先,将安全校正控制问题与深度强化学习联系起来,通过设计强化学习的状态、动作和奖励函数,构建了安全校正的马尔可夫决策过程模型。然后,提出了两阶段训练框架来求得最优校正策略。在模仿学习预训练阶段,基于专家策略,利用模仿学习为智能体提供初始神经网络,提高训练速度;在强化学习训练阶段,通过DDPG智能体与环境的不断交互进一步训练智能体。训练好的智能体可以实时应用,获得最优决策。最后,基于中国某省级电网的仿真算例验证了所提方法的有效性。

     

    Abstract: In the new power system, the uncertainty of both sides of the source and load makes the power flow fluctuation increase significantly. The power system security correction control can eliminate the system power flow over-limit and ensure the safe operation of the power grid. However, the traditional security correction control methods have many constraints and complex calculation, and it is difficult to make real-time multi-step decisions for large-scale power grids. Therefore, this paper proposes a two-stage training method based on deep deterministic policy gradient(DDPG)to determine the security correction control strategy. Firstly, combining the security correction control problem with deep reinforcement learning, the Markov decision process(MDP)model of the security correction is constructed by designing the state, action and reward function of reinforcement learning.Secondly, a two-stage training framework is proposed to obtain the optimal correction strategy. In the pre-training stage of the imitation learning, based on the expert strategy, the imitation learning is used to provide the initial neural network for agents and improve the training speed. In the training stage of the reinforcement learning, the agent is further trained through the continuous interaction between DDPG agent and the environment. The trained agent can be applied in real time to obtain the optimal decision.Finally, the effectiveness of the proposed method is verified by a simulation case based on a provincial power grid of China.

     

/

返回文章
返回