梁泽庭, 郑杰辉, 方家琨, 李志刚, 吴青华. 基于多智能体强化学习的差异化产销者参与社区能源交易方法[J]. 电网技术, 2025, 49(5): 1826-1836. DOI: 10.13335/j.1000-3673.pst.2024.1396
引用本文: 梁泽庭, 郑杰辉, 方家琨, 李志刚, 吴青华. 基于多智能体强化学习的差异化产销者参与社区能源交易方法[J]. 电网技术, 2025, 49(5): 1826-1836. DOI: 10.13335/j.1000-3673.pst.2024.1396
LIANG Zeting, ZHENG Jiehui, FANG Jiakun, LI Zhigang, WU Qinghua. Multi-agent Reinforcement Learning for Differentiated Prosumers Participating in Community Energy Trading[J]. Power System Technology, 2025, 49(5): 1826-1836. DOI: 10.13335/j.1000-3673.pst.2024.1396
Citation: LIANG Zeting, ZHENG Jiehui, FANG Jiakun, LI Zhigang, WU Qinghua. Multi-agent Reinforcement Learning for Differentiated Prosumers Participating in Community Energy Trading[J]. Power System Technology, 2025, 49(5): 1826-1836. DOI: 10.13335/j.1000-3673.pst.2024.1396

基于多智能体强化学习的差异化产销者参与社区能源交易方法

Multi-agent Reinforcement Learning for Differentiated Prosumers Participating in Community Energy Trading

  • 摘要: 考虑到新型电力系统中产消者特征各异,产消者对能源交易中的隐私性高度重视以及传统基于模型的优化方法在多重不确定性环境下的局限性,该文提出一种面向社区能源交易考虑特征差异和隐私保护的多智能体强化学习方法。首先,分析不同产消者的地理位置、分布式资源种类和主体类型等特征差异,建立相应的典型产消者模型;其次,以社区型市场结构为基础,构建基于市场中端费率定价的社区能源交易模型;最后,以市场收益和运行成本为优化目标,将产消者参与社区能源交易的能源交易优化问题构建成部分可观测马尔可夫决策过程。针对储能的荷电状态循环约束引入的稀疏奖励问题,该文提出采用基于余弦距离的动态奖励整形对奖励函数进行改进。针对其中的多智能体环境非平稳性问题,该文提出采用平均场近似机制对柔性策略-评价算法的Q函数进行近似,并采用该算法求解得到产消者的能量管理决策。通过算例验证,所提算法解决考虑特征差异和隐私保护的能源交易问题能够提高1.39%~54.32%的训练效率和降低0.46%~50.34%的平均累积日成本。

     

    Abstract: Considering the differentiated characteristics of prosumers in new power systems, the high importance of privacy in energy trading by prosumers, and the limitations of traditional model-based optimization methods in an environment with multiple uncertainties, this paper proposes a multi-agent reinforcement learning method with differentiated characteristics and privacy preservation for community energy trading. Firstly, the differentiated characteristics of prosumer, such as geographical location, type of distributed energy resource, and intrinsic type, are analyzed, and corresponding typical prosumer models are established. Secondly, a community energy trading model based on the market structure is constructed based on the mid-market rate pricing. Finally, taking market benefits and operating costs as optimization objectives, the energy trading optimization of prosumers participating in community energy trading is constructed into a partially observable Markov decision process. Aiming at the sparse reward problem brought by the state of the content recurrent constraint of energy storage, this paper proposes to modify the reward function by the cosine distance-based dynamic reward shaping. Aiming at the non-stationary problem of a multi-agent environment, this paper proposes to approximate the Q function of the soft actor-critic algorithm by the mean-field approximation mechanism. The proposed algorithm is then employed to obtain the prosumers' energy management decisions. Results of the case study show that the proposed algorithm has a 1.39%-54.32% improvement and 0.46%-50.34% reduction in aspects of training efficiency and average cumulative daily cost in solving energy trading optimization considering differentiated characteristics and privacy preservation.

     

/

返回文章
返回