史一茹, 张大海, 李立新, 李亚平, 贠韫韵, 孙锴. 基于生成对抗模仿学习的综合能源系统能量优化调度[J]. 高电压技术, 2024, 50(8): 3535-3544. DOI: 10.13336/j.1003-6520.hve.20230537
引用本文: 史一茹, 张大海, 李立新, 李亚平, 贠韫韵, 孙锴. 基于生成对抗模仿学习的综合能源系统能量优化调度[J]. 高电压技术, 2024, 50(8): 3535-3544. DOI: 10.13336/j.1003-6520.hve.20230537
SHI Yiru, ZHANG Dahai, LI Lixin, LI Yaping, YUN Yunyun, SUN Kai. Optimal Energy Dispatch for Integrated Energy Systems Based on Generative Adversarial Imitation Learning[J]. High Voltage Engineering, 2024, 50(8): 3535-3544. DOI: 10.13336/j.1003-6520.hve.20230537
Citation: SHI Yiru, ZHANG Dahai, LI Lixin, LI Yaping, YUN Yunyun, SUN Kai. Optimal Energy Dispatch for Integrated Energy Systems Based on Generative Adversarial Imitation Learning[J]. High Voltage Engineering, 2024, 50(8): 3535-3544. DOI: 10.13336/j.1003-6520.hve.20230537

基于生成对抗模仿学习的综合能源系统能量优化调度

Optimal Energy Dispatch for Integrated Energy Systems Based on Generative Adversarial Imitation Learning

  • 摘要: 近年来,基于深度强化学习的综合能源系统(integrated energy system,IES)优化调度已取得阶段性进展。然而,随着IES系统结构、规模、技术等层面的持续发展,传统深度强化学习训练时间漫长和设计复杂度高等弊端逐渐显露。为此,提出一种面向IES能量优化调度的生成对抗模仿学习方法。首先,IES智能体通过模仿具备高奖励回馈的专家调度策略自适应学习动作探索过程,避免盲目探索造成的时间与算力浪费。其次,基于生成对抗理论,增设判别器网络辨别生成策略与专家策略之间的差异,以此作为内部奖励函数辅助神经网络参数更新,避免人工设置奖励的主观偏好和经验限制对IES调度结果的影响。最后,电-热耦合系统算例分析结果表明:训练过程中所提方法的收敛速度较传统深度强化学习算法提升了52%,收敛效果提升了10%,同时使IES智能体获得了接近专家调度经验的决策能力;在线应用时无需依赖对外界环境的准确预测和精确建模,即可实现快速实时决策。

     

    Abstract: In recent years, significant progresses in integrated energy system (IES) optimization scheduling based on deep reinforcement learning (DRL) have been achieved. However, with the development of IES, the drawbacks of DRL such as long training time and high design complexity gradually arise. Therefore, a generative adversarial imitation learning method for IES energy optimization scheduling is proposed. First, the IES intelligence learns the action exploration process adaptively by imitating the expert strategy to avoid the waste of time and computing power. Second, a discriminator network is added to discriminate the difference between the generative and expert strategies, which is used as an internal reward function to assist the neural network parameter update and avoid the influence of subjective preference and experience limitation on the IES scheduling results. Finally, the analysis based on the electric-thermal coupled system example shows that the convergence speed of the proposed method is 52% higher than that of the traditional DRL algorithm, and the convergence effect is 10% higher, while the intelligence obtains the decision-making ability close to the expert experience. The online application can realize real-time decision-making without accurate prediction and precise modeling of the environment.

     

/

返回文章
返回