李东颖, 朱建全, 陈一熙. 基于双缓冲区生成对抗模仿学习的电力系统实时安全约束经济调度[J]. 电网技术, 2025, 49(3): 1121-1129. DOI: 10.13335/j.1000-3673.pst.2024.0849
引用本文: 李东颖, 朱建全, 陈一熙. 基于双缓冲区生成对抗模仿学习的电力系统实时安全约束经济调度[J]. 电网技术, 2025, 49(3): 1121-1129. DOI: 10.13335/j.1000-3673.pst.2024.0849
LI Dongying, ZHU Jianquan, CHEN Yixi. Security-constrained Economic Dispatch of Power Systems Based on Dual Buffer Generative Adversarial Imitation Learning[J]. Power System Technology, 2025, 49(3): 1121-1129. DOI: 10.13335/j.1000-3673.pst.2024.0849
Citation: LI Dongying, ZHU Jianquan, CHEN Yixi. Security-constrained Economic Dispatch of Power Systems Based on Dual Buffer Generative Adversarial Imitation Learning[J]. Power System Technology, 2025, 49(3): 1121-1129. DOI: 10.13335/j.1000-3673.pst.2024.0849

基于双缓冲区生成对抗模仿学习的电力系统实时安全约束经济调度

Security-constrained Economic Dispatch of Power Systems Based on Dual Buffer Generative Adversarial Imitation Learning

  • 摘要: 随着新能源渗透率不断攀升,电力系统波动性和随机性日趋加剧,电网安全经济运行正面临着严峻挑战。为此,该文提出了一种基于改进生成对抗模仿学习算法的实时安全约束经济调度方法。首先,将新能源电力系统多时段安全约束经济调度问题建模为马尔可夫决策过程。其次,针对常规深度强化学习算法训练时间冗长和设计主观性强等弊端,采用生成对抗模仿学习算法对马尔可夫决策过程进行求解。接着,提出了一种改进的生成对抗模仿学习算法,通过双缓冲区机制使生成对抗模仿学习兼容异策略深度强化学习算法,进而与柔性行动器-评判器算法结合,显著提高了算法的训练性能。算例结果表明,所提方法在保证毫秒级的决策速度的同时,在离线训练时的收敛速度、在线决策时的经济性与安全性等方面相较于传统算法均展示出了显著的提升。

     

    Abstract: The escalating penetration rate of emerging renewable energy sources exacerbates the inherent volatility and stochastic nature of power systems, thereby presenting formidable challenges to the safe and economic operation of the power system. To address this challenge, this paper presents an improved generative adversarial imitation learning algorithm tailored to power systems' real-time security-constrained economic dispatch. First, the security-constrained economic dispatch problem of renewable energy-integrated power systems is formulated as a Markov decision process. Second, recognizing the limitations of conventional deep reinforcement learning algorithms, notably high training time consumption and pronounced design subjectivity, this paper employs a generative adversarial imitation learning algorithm to address this Markov decision process. Additionally, an improved generative adversarial imitation learning algorithm is proposed, which renders the generative adversarial imitation learning algorithm compatible with various off-policy deep reinforcement learning algorithms through a dual buffer mechanism. In the proposed algorithm, the combination with the soft Actor-Critic algorithm significantly enhances the training performance. The simulation results illustrate that the proposed algorithm not only markedly accelerates the convergence speed during offline training but also improves the economy and security in online decision-making compared to traditional algorithms while ensuring a millisecond-level decision speed.

     

/

返回文章
返回