徐业琰, 姚良忠, 廖思阳, 程帆, 徐箭, 蒲天骄, 王新迎. 基于多智能体Actor-double-critic深度强化学习的源-网-荷-储实时优化调度方法[J]. 中国电机工程学报, 2025, 45(2): 513-526. DOI: 10.13334/j.0258-8013.pcsee.231054
引用本文: 徐业琰, 姚良忠, 廖思阳, 程帆, 徐箭, 蒲天骄, 王新迎. 基于多智能体Actor-double-critic深度强化学习的源-网-荷-储实时优化调度方法[J]. 中国电机工程学报, 2025, 45(2): 513-526. DOI: 10.13334/j.0258-8013.pcsee.231054
XU Yeyan, YAO Liangzhong, LIAO Siyang, CHENG Fan, XU Jian, PU Tianjiao, WANG Xinying. Real-time Optimal Dispatch Method of Source-grid-load-storage Based on Multi-agent Actor-double-critic Deep Reinforcement Learning[J]. Proceedings of the CSEE, 2025, 45(2): 513-526. DOI: 10.13334/j.0258-8013.pcsee.231054
Citation: XU Yeyan, YAO Liangzhong, LIAO Siyang, CHENG Fan, XU Jian, PU Tianjiao, WANG Xinying. Real-time Optimal Dispatch Method of Source-grid-load-storage Based on Multi-agent Actor-double-critic Deep Reinforcement Learning[J]. Proceedings of the CSEE, 2025, 45(2): 513-526. DOI: 10.13334/j.0258-8013.pcsee.231054

基于多智能体Actor-double-critic深度强化学习的源-网-荷-储实时优化调度方法

Real-time Optimal Dispatch Method of Source-grid-load-storage Based on Multi-agent Actor-double-critic Deep Reinforcement Learning

  • 摘要: 为保证新型电力系统的安全高效运行,针对模型驱动调度方法存在的调度优化模型求解困难、实时决策求解速度慢等问题,该文提出一种基于多智能体Actor-double-critic深度强化学习的源-网-荷-储实时优化调度方法。通过构建考虑调节资源运行约束和系统安全约束的实时优化调度模型和引入Vickey-Clark-Groves拍卖机制,设计带约束马尔科夫合作博弈模型,将集中调度模型转换为多智能体间的分布式优化问题进行求解。然后,提出多智能体Actor-double- critic算法,分别采用Self-critic和Cons-critic网络评估智能体的动作-价值和动作-成本,降低训练难度、避免即时奖励和安全约束成本稀疏性的影响,提高多智能体训练收敛速度,保证实时调度决策满足系统安全运行约束。最后,通过仿真算例验证所提方法可大幅缩短实时调度决策时间,实现保证系统运行安全可靠性和经济性的源-网-荷-储实时调度。

     

    Abstract: For the safe and high-efficient operation of the new-type power systems, a multi-agent Actor-double-critic based real-time optimal dispatch method for source-grid-load- storage is proposed in this paper. This method aims to overcome the problems of model-driven dispatch methods, such as difficulties in solving the optimization model and the slow speed of real-time decision-making. The constrained Markov coalitional game model is established based on the Vickey-Clark-Groves mechanism and the real-time optimal dispatch model with the consideration of both controllable resources operation constraints and system safe constraints. Then, the centralized dispatch model can be transformed into a distributed optimization problem among multiple agents. Furthermore, the multi-agent Actor-double-critic algorithm is proposed, which utilizes the Self-critic and cons-critic networks to evaluate the action-value and action-cost of agents respectively, ultimately reducing the training difficulty and eliminating the influence of the sparse reward and sparse safety constraint cost. The proposed algorithm enhances the training convergence and ensures that the real-time dispatch decision can meet the system operation constraints. Finally, the simulation case verifies that the proposed method can significantly save dispatch decision-making time and guarantee the safe, reliable, and economic operation of the power system.

     

/

返回文章
返回