孙广明, 陈良亮, 王瑞升, 陈中, 邢强. 基于深度强化学习的充光储能源站调度策略[J]. 电力工程技术, 2021, 40(5): 17-24. DOI: 10.12158/j.2096-3203.2021.05.003
引用本文: 孙广明, 陈良亮, 王瑞升, 陈中, 邢强. 基于深度强化学习的充光储能源站调度策略[J]. 电力工程技术, 2021, 40(5): 17-24. DOI: 10.12158/j.2096-3203.2021.05.003
SUN Guangming, CHEN Liangliang, WANG Ruisheng, CHEN Zhong, XING Qiang. A deep reinforcement learning-based scheduling strategy of photovoltaic-storage-charging integrated energy stations[J]. Electric Power Engineering Technology, 2021, 40(5): 17-24. DOI: 10.12158/j.2096-3203.2021.05.003
Citation: SUN Guangming, CHEN Liangliang, WANG Ruisheng, CHEN Zhong, XING Qiang. A deep reinforcement learning-based scheduling strategy of photovoltaic-storage-charging integrated energy stations[J]. Electric Power Engineering Technology, 2021, 40(5): 17-24. DOI: 10.12158/j.2096-3203.2021.05.003

基于深度强化学习的充光储能源站调度策略

A deep reinforcement learning-based scheduling strategy of photovoltaic-storage-charging integrated energy stations

  • 摘要: 为了应对大规模电动汽车调度模型求解复杂、算力要求高的问题,机器学习方法在电动汽车充电导航调度中越来越受到关注。针对充光储一体化能源站,文中提出了一种基于深度强化学习(DRL)的充光储能源站调度策略。首先,分析了能源站运行策略与DRL基本理论。其次,基于后悔理论刻画用户对不同充电方案时间与费用的心理状态,建立了智能体对"人-车-站"状态环境全感知模型,并引入时变ε-greedy策略作为智能体动作选择方法以提高算法收敛速度。最后,结合南京市实际道路与能源站分布设计了多场景算例仿真,结果表明所提方法在考虑用户心理效应的基础上能够有效提高能源站光伏消纳率,为电动汽车充电调度提供了一种新思路。

     

    Abstract: Large-scale electric vehicles (EVs) scheduling models are complex and require high calculation capacity. To solve these problems, machine learning methods have attracted more and more attention in electric vehicle charging and navigation scheduling. For the photovoltaic-storage-charging integrated energy station, a scheduling strategy of the energy stations based on deep reinforcement learning (DRL) is proposed in this paper. Firstly, the operation strategy of energy station and the basic theory of deep reinforcement learning are analyzed. Secondly, the users psychological state of time and cost for different charging schemes are described based on regret theory, and the agent perception model of user-EV-station state environment is established. To improve the convergence speed of the algorithm, time varying ε-greedy strategy is introduced as action selection method of agent. Finally, multi-scenario simulations are designed based on the actual road network and energy stations in Nanjing. The results show that the proposed method effectively improves the photovoltaic consumption rate of the energy station under the condition of considering the psychological effect of various users. The proposed method provides a new idea for electric vehicle charging scheduling.

     

/

返回文章
返回