刘硕, 冯斌, 郭创新, 籍雯媗, 王炜, 张勇. 考虑自注意力和时序记忆的深度强化学习配电网无功电压控制策略[J]. 中国电机工程学报, 2025, 45(2): 565-576. DOI: 10.13334/j.0258-8013.pcsee.231348
引用本文: 刘硕, 冯斌, 郭创新, 籍雯媗, 王炜, 张勇. 考虑自注意力和时序记忆的深度强化学习配电网无功电压控制策略[J]. 中国电机工程学报, 2025, 45(2): 565-576. DOI: 10.13334/j.0258-8013.pcsee.231348
LIU Shuo, FENG Bin, GUO Chuangxin, JI Wenxuan, WANG Wei, ZHANG Yong. Volt-var Control Strategy of Distribution Network Based on Deep Reinforcement Learning Considering Self-attention and Temporal-memory[J]. Proceedings of the CSEE, 2025, 45(2): 565-576. DOI: 10.13334/j.0258-8013.pcsee.231348
Citation: LIU Shuo, FENG Bin, GUO Chuangxin, JI Wenxuan, WANG Wei, ZHANG Yong. Volt-var Control Strategy of Distribution Network Based on Deep Reinforcement Learning Considering Self-attention and Temporal-memory[J]. Proceedings of the CSEE, 2025, 45(2): 565-576. DOI: 10.13334/j.0258-8013.pcsee.231348

考虑自注意力和时序记忆的深度强化学习配电网无功电压控制策略

Volt-var Control Strategy of Distribution Network Based on Deep Reinforcement Learning Considering Self-attention and Temporal-memory

  • 摘要: 大量分布式新能源的接入,导致配电网面临电压越限、网损增加等严峻挑战。针对无功电压控制问题,深度强化学习能够有效解决传统优化方法在模型依赖和求解速度方面的不足。然而现有深度强化学习方法在面对大规模配网复杂场景时,特征提取能力有限,控制效果欠佳,因此该文提出一种考虑自注意力和时序记忆的多智能体深度强化学习控制策略。首先,将无功电压控制问题建模为分布式部分可观测马尔可夫决策过程;其次,基于自注意力编码器和时序记忆神经元,设计了特征提取网络、辅助训练网络、改进策略网络和改进价值网络4种神经网络结构;然后,引入自监督学习,介绍所提算法的中心式训练和分布式执行过程;最后,在改进的IEEE 141节点配网系统上进行了算例测试。实验结果表明,所提控制策略能够有效提取状态特征、记忆时序信息、辨识关键元件,表现出更加优异的稳压减损控制效果,同时具有较好的鲁棒性、可解释性和训练稳定性。

     

    Abstract: The increasing penetration of distributed new energy has caused the distribution network to face severe challenges such as over-limit voltage and increased power loss. For the Volt-Var control problem, deep reinforcement learning can effectively solve the shortcomings of traditional optimization methods in terms of model dependence and solution speed. However, the existing deep reinforcement learning methods have limited feature extraction capabilities and poor control effects when faced with complex scenarios of large-scale distribution networks. Therefore, this paper proposes a multi-agent deep reinforcement learning control strategy considering self-attention and temporal-memory. First, the volt-var control problem is modeled as a decentralized partially observable Markov decision process. Then, based on the self-attention encoder and temporal- memory neuron, four kinds of neural network structures are designed, including feature extraction network, auxiliary training network, improved policy network and improved value network. Next, self-supervised learning is introduced, and the centralized training with decentralized execution process of the proposed algorithm is described. Finally, a case study is carried out on the modified IEEE 141-bus distribution system. The experimental results show that the proposed control strategy can effectively extract state features, memorize temporal information, and identify key components, showing a superior control effect on voltage stabilization and loss reduction, while also boasting enhanced robustness, interpretability, and training stability.

     

/

返回文章
返回