Constraint-enhanced Safe Reinforcement Learning-based Decision-making Method for Re/active Power Optimization in Highly Penetrated PV-storage-charging Distribution Network
|更新时间:2026-01-04
|
Constraint-enhanced Safe Reinforcement Learning-based Decision-making Method for Re/active Power Optimization in Highly Penetrated PV-storage-charging Distribution Network
HONG Lucheng, WU Minghe, ZHU Jin, et al. Constraint-enhanced Safe Reinforcement Learning-based Decision-making Method for Re/active Power Optimization in Highly Penetrated PV-storage-charging Distribution Network[J]. 2025, (22): 8764-8778.
DOI:
HONG Lucheng, WU Minghe, ZHU Jin, et al. Constraint-enhanced Safe Reinforcement Learning-based Decision-making Method for Re/active Power Optimization in Highly Penetrated PV-storage-charging Distribution Network[J]. 2025, (22): 8764-8778. DOI: 10.13334/j.0258-8013.pcsee.241080.
Constraint-enhanced Safe Reinforcement Learning-based Decision-making Method for Re/active Power Optimization in Highly Penetrated PV-storage-charging Distribution Network
面对高比例电动汽车(electric vehicles,EV)和光伏(photovoltaics,PV)接入带来的配电网潮流状态转移不确定性和功率优化问题复杂性,该文提出一种基于约束增强安全强化学习方法的光-储-充高渗透配电网有功/无功优化框架。该框架首先构建一种基于迁移强化学习的EV充电站(EV charging station,EVCS)模型,以合理描述计及需求响应的多类型EVs的充放电过程;然后,将有功/无功优化问题描述为一个马尔科夫决策过程,并提出一种基于混合整数规划的连续型双重深度Q网络(mixed-integer programming-continuous double deep Q network,MIP-CDDQN)算法进行求解,将连续型动作值网络Max-Q问题的求解转化为一个MIP模型,并考虑配电网的实时运行约束,确保了优化策略的高效性和安全性;最后,通过IEEE 33-bus系统上的仿真实验证明所提MIP-CDDQN算法有效性及其在执行约束和计算效率方面的优势。
Abstract
To address the uncertainties in power flow states and the complexity of power optimization problems brought about by the high penetration of electric vehicles (EVs) and photovoltaics (PVs) in distribution networks
this paper proposes an active/reactive power optimization framework for high-penetration PV-storage-charging distribution networks based on a constraint-enhanced safe reinforcement learning method. Firstly
an EV charging station (EVCS) model based on transfer reinforcement learning is constructed to reasonably describe the charging and discharging process of multiple types of EVs considering demand response. Then
the active/reactive power optimization problem is formulated as a Markov decision process
and a mixed-integer programming-continuous double deep Q network (MIP-CDDQN) algorithm is proposed to solve it. This algorithm transforms the Max-Q problem for the continuous action value network into an MIP model while considering the real-time operational constraints of the distribution network
which ensures efficiency and security of the optimization strategy. Finally
simulation experiments on the IEEE 33-bus system demonstrate the effectiveness of the proposed MIP-CDDQN algorithm and its advantages in enforcing constraints and computational efficiency.