杨志学, 任洲洋, 孙志媛, 刘默斯, 姜晶, 印月. 基于近端策略优化算法的新能源电力系统安全约束经济调度方法[J]. 电网技术, 2023, 47(3): 988-997. DOI: 10.13335/j.1000-3673.pst.2022.0959
引用本文: 杨志学, 任洲洋, 孙志媛, 刘默斯, 姜晶, 印月. 基于近端策略优化算法的新能源电力系统安全约束经济调度方法[J]. 电网技术, 2023, 47(3): 988-997. DOI: 10.13335/j.1000-3673.pst.2022.0959
YANG Zhixue, REN Zhouyang, SUN Zhiyuan, LIU Mosi, JIANG Jing, YIN Yue. Security-constrained Economic Dispatch of Renewable Energy Integrated Power Systems Based on Proximal Policy Optimization Algorithm[J]. Power System Technology, 2023, 47(3): 988-997. DOI: 10.13335/j.1000-3673.pst.2022.0959
Citation: YANG Zhixue, REN Zhouyang, SUN Zhiyuan, LIU Mosi, JIANG Jing, YIN Yue. Security-constrained Economic Dispatch of Renewable Energy Integrated Power Systems Based on Proximal Policy Optimization Algorithm[J]. Power System Technology, 2023, 47(3): 988-997. DOI: 10.13335/j.1000-3673.pst.2022.0959

基于近端策略优化算法的新能源电力系统安全约束经济调度方法

Security-constrained Economic Dispatch of Renewable Energy Integrated Power Systems Based on Proximal Policy Optimization Algorithm

  • 摘要: 针对高比例新能源接入导致电力系统安全约束经济调度难以高效求解的问题,该文提出了一种基于近端策略优化算法的安全约束经济调度方法。首先,建立了新能源电力系统安全约束经济调度模型。在深度强化学习框架下,定义了该模型的马尔科夫奖励过程。设计了近端策略优化算法的奖励函数机制,引导智能体高效生成满足交流潮流以及N-1安全约束的调度计划。然后,设计了调度模型与近端策略优化算法的融合机制,建立了调度训练样本的生成与提取方法以及价值网络和策略网络的训练机制。最后,采用IEEE 30节点和IEEE 118节点2个标准测试系统,验证了本文提出方法的有效性和适应性。

     

    Abstract: To efficiently solve the security constrained economic dispatch problem in a high-proportional renewable energy integrated power system, a security-constrained economic dispatch based on a proximal policy optimization algorithm is proposed. First, a dispatch model of the power system is established based on the AC power flow. The Markov reward process of the dispatch model under the framework of deep reinforcement learning is developed. Subsequently, the reward function mechanism of the proximal policy optimization algorithm is designed to guide the agents to generate a dispatching plan that satisfies both the power flow requirements and the N-1 security constraints. Next, the incorporating mechanism of the dispatching model with the proximal policy optimization algorithm is figured out, establishing a generation and extraction of the training samples as well as the training mechanism for the value network and the policy network. Finally, the effectiveness and adaptability of the proposed method are validated by using the standard IEEE 30-node and IEEE 118-node test systems.

     

/

返回文章
返回