薛溟枫, 毛晓波, 肖浩, 浦骁威, 裴玮. 基于改进深度Q网络算法的多园区综合能源系统能量管理方法[J]. 电力建设, 2022, 43(12): 83-93.
引用本文: 薛溟枫, 毛晓波, 肖浩, 浦骁威, 裴玮. 基于改进深度Q网络算法的多园区综合能源系统能量管理方法[J]. 电力建设, 2022, 43(12): 83-93.
XUE Ming-feng, MAO Xiao-bo, XIAO Hao, PU Xiao-wei, PEI Wei. A Novel Energy Management Method Based on Modified Deep Q Network Algorithm for Multi-park Integrated Energy System[J]. Electric Power Construction, 2022, 43(12): 83-93.
Citation: XUE Ming-feng, MAO Xiao-bo, XIAO Hao, PU Xiao-wei, PEI Wei. A Novel Energy Management Method Based on Modified Deep Q Network Algorithm for Multi-park Integrated Energy System[J]. Electric Power Construction, 2022, 43(12): 83-93.

基于改进深度Q网络算法的多园区综合能源系统能量管理方法

A Novel Energy Management Method Based on Modified Deep Q Network Algorithm for Multi-park Integrated Energy System

  • 摘要: 多园区综合能源系统可通过多能互补互济显著提升运行经济性,然而园区之间的复杂互动、多能耦合决策会给多园区综合能源系统的能量管理带来决策空间庞大、算法难以收敛等挑战性问题。为解决上述问题,提出了一种基于改进深度Q网络(modified deep Q network, MDQN)算法的多园区综合能源系统能量管理方法。首先,采用独立于园区的外部气象数据、历史互动功率数据,构建了基于长短期记忆(long short-term memory, LSTM)深度网络的各园区综合能源系统外部互动环境等值模型,降低了强化学习奖励函数的计算复杂度;其次,提出一种基于k优先采样策略的MDQN算法,用k-优先采样策略来代替ε贪心策略,克服了大规模动作空间中探索效率低下的问题;最后,在含3个园区综合能源系统的算例中进行验证,结果表明MDQN算法相比原DQN算法具有更好的收敛性与稳定性,同时可以提升园区经济效益达29.16%。

     

    Abstract: Multi-park integrated energy system can significantly improve the operation economy by complementing each other with multiple energy sources. However, the complex interactions between parks and multi-energy coupling decisions can bring challenging problems such as large decision space and difficult convergence of algorithms to the energy management of multi-park integrated energy system. To solve the above problems, an energy management method based on modified deep Q network(MDQN) algorithm for multi-park integrated energy systems is proposed. Firstly, the external meteorological data and historical interactive power data independent of the park are used to construct a long short-term memory(LSTM) deep network-based external interactive environmental equivalence model for each park integrated energy system, which reduces the computational complexity of the reinforcement learning reward function. Secondly, an improved DQN algorithm based on k-first sampling strategy is proposed to replace the greedy strategy with k-first sampling strategy to overcome the inefficiency of exploration in large-scale action spaces. Finally, the results are validated in an algorithm containing three integrated energy systems in the park, and show that the MDQN algorithm has better convergence and stability compared with the original DQN algorithm, while it can improve the economic efficiency of the park by 29.16%.

     

/

返回文章
返回