陈宁, 李法社, 王霜, 张慧聪, 唐存靖, 倪梓皓. 基于深度强化学习算法的分布式光伏-EV互补系统智能调度[J]. 高电压技术, 2025, 51(3): 1454-1463. DOI: 10.13336/j.1003-6520.hve.20231846
引用本文: 陈宁, 李法社, 王霜, 张慧聪, 唐存靖, 倪梓皓. 基于深度强化学习算法的分布式光伏-EV互补系统智能调度[J]. 高电压技术, 2025, 51(3): 1454-1463. DOI: 10.13336/j.1003-6520.hve.20231846
CHEN Ning, LI Fashe, WANG Shuang, ZHANG Huicong, TANG Cunjin, NI Zihao. Intelligent Scheduling of Distributed Photovoltaic EV Complementary Systems Based on Deep Reinforcement Learning Algorithm[J]. High Voltage Engineering, 2025, 51(3): 1454-1463. DOI: 10.13336/j.1003-6520.hve.20231846
Citation: CHEN Ning, LI Fashe, WANG Shuang, ZHANG Huicong, TANG Cunjin, NI Zihao. Intelligent Scheduling of Distributed Photovoltaic EV Complementary Systems Based on Deep Reinforcement Learning Algorithm[J]. High Voltage Engineering, 2025, 51(3): 1454-1463. DOI: 10.13336/j.1003-6520.hve.20231846

基于深度强化学习算法的分布式光伏-EV互补系统智能调度

Intelligent Scheduling of Distributed Photovoltaic EV Complementary Systems Based on Deep Reinforcement Learning Algorithm

  • 摘要: 针对分布式光伏与电动汽车(electric vehicle,EV)大规模接入电网将对电力系统造成冲击的问题,通过建立分布式光伏-EV互补调度模型,以平抑光伏并网波动、增加EV用户经济性为目标,考虑光伏出力的随机性、负荷功率波动、EV接入时间及电量随机性、实时电价、电池老化成本等因素,提出采用梯度随机扰动的改进型近端策略优化算法(gradient random perturbation-proximal policy optimization algorithm,GRP-PPO)进行求解,通过对模型目标函数的调整,得到基于不同优化目标的2种实时运行策略。通过算例可知,实时调度策略可有效地平抑并网点功率波动,调度效果较传统PPO算法提高了3.48%;策略一以用户的出行需求及平抑并网点功率波动为首要目标,能够保证用户的24 h用车需求,同时并网点功率稳定率达到91.84%;策略二以用户经济效益为首要优化目标,全天参与调度的EV收益可达82.6元,可起到鼓励用户参与调度的目的。

     

    Abstract: Aiming at the problem that large-scale access to the grid by distributed photovoltaic and electric vehicles (EVs) will cause an impact on the power system, we develop a distributed PV-EV complementary scheduling model.With the goals of smoothing the fluctuation of PV grid connection and increasing the economy of EV users, the factors such as the stochasticity of PV output, load power fluctuation, stochasticity of EV access time and power, real-time tariffs, battery aging cost and so on are taken into consideration, and an improved proximal policy optimization algorithm with stochastic perturbation of the gradient is proposed to solve the problem. Meanwhile, the model objective function is adjusted to obtain 2 real-time operation strategies based on different optimization objectives. It can be seen from the arithmetic example that the real-time scheduling strategy can effectively smooth outlet power fluctuations, and the scheduling effect is improved by 3.48% compared with the traditional PPO algorithm. Strategy 1 takes the user's travel demand and smoothing the power fluctuation of the parallel network as the primary objectives, which can ensure the user's 24 h demand, and at the same time, the power stabilization rate of the parallel network reaches 91.84%. Strategy 2 takes the user's economic benefit as the primary optimization objective, and the EV benefit of participating in dispatching for the whole day can reach RMB 82.6, which can encourage the users to participate in dispatching.

     

/

返回文章
返回