GAO Fang, YAO Haotian, GAO Qing, et al. Two-stage Distributed Generators Optimization Based on Deep Reinforcement Learning With Parameter Sharing[J]. 2025, 45(19): 7493-7509.
DOI:
GAO Fang, YAO Haotian, GAO Qing, et al. Two-stage Distributed Generators Optimization Based on Deep Reinforcement Learning With Parameter Sharing[J]. 2025, 45(19): 7493-7509. DOI: 10.13334/j.0258-8013.pcsee.240635.
Two-stage Distributed Generators Optimization Based on Deep Reinforcement Learning With Parameter Sharing
As renewable energy sources such as solar and wind are increasingly integrated into the grid at high proportions
the optimization and scheduling of distributed power generation face challenges due to frequent changes in system topology
affecting the stability and economic operation of the distribution network. Existing methods
designed for systems with fixed topology
rely on precise models and are time-consuming
making real-time control difficult. Current deep reinforcement learning approaches struggle to balance distributed training and mixed discrete-continuous action spaces. This study introduces a distributed power optimization strategy based on multi-agent deep reinforcement learning with two stages and parameter sharing. Initially
the problem is vertically decoupled
constructing a dynamic distribution network reconfiguration model with distributed generation using mixed-integer second-order cone programming to determine the topology. Subsequently
the distribution network environment is horizontally decoupled into several regions. In the second stage
a centralized training with decentralized execution framework that incorporates parameter sharing is proposed. This framework incorporates a multi-agent prioritized double-delay deep deterministic policy gradient algorithm with a priority experience replay mechanism. Topology information is embedded into the distribution network environment
mapped to agents through power flow calculations to minimize network active power loss in the optimization scheduling model. Case studies demonstrate that the proposed algorithm
by considering changes in the distribution network topology and enhancing learning efficiency through strategy and experience sharing among agents
as well as priority experience replay
meets the efficiency requirements of real-time online decision-making and shows superior voltage stability and loss reduction performance compared to other strategies.