李奇, 刘鑫, 孟翔, 谭逸, 杨明泽, 张世聪, 陈维荣. 基于近端策略优化算法的燃料电池混合动力系统综合价值损耗最小能量管理方法[J]. 中国电机工程学报, 2024, 44(12): 4788-4798. DOI: 10.13334/j.0258-8013.pcsee.230023
引用本文: 李奇, 刘鑫, 孟翔, 谭逸, 杨明泽, 张世聪, 陈维荣. 基于近端策略优化算法的燃料电池混合动力系统综合价值损耗最小能量管理方法[J]. 中国电机工程学报, 2024, 44(12): 4788-4798. DOI: 10.13334/j.0258-8013.pcsee.230023
LI Qi, LIU Xin, MENG Xiang, TAN Yi, YANG Mingze, ZHANG Shicong, CHEN Weirong. Comprehensive Value Depletion Minimization Energy Management Method for Fuel Cell Hybrid Systems Based on Proximal Policy Optimization Algorithm[J]. Proceedings of the CSEE, 2024, 44(12): 4788-4798. DOI: 10.13334/j.0258-8013.pcsee.230023
Citation: LI Qi, LIU Xin, MENG Xiang, TAN Yi, YANG Mingze, ZHANG Shicong, CHEN Weirong. Comprehensive Value Depletion Minimization Energy Management Method for Fuel Cell Hybrid Systems Based on Proximal Policy Optimization Algorithm[J]. Proceedings of the CSEE, 2024, 44(12): 4788-4798. DOI: 10.13334/j.0258-8013.pcsee.230023

基于近端策略优化算法的燃料电池混合动力系统综合价值损耗最小能量管理方法

Comprehensive Value Depletion Minimization Energy Management Method for Fuel Cell Hybrid Systems Based on Proximal Policy Optimization Algorithm

  • 摘要: 为了降低市域动车组燃料电池混合动力系统运行燃料经济成本,提升燃料电池耐久性,该文提出一种基于近端策略优化算法的能量管理方法。该方法将混合动力系统能量管理问题建模为马尔可夫决策过程,以综合考虑燃料经济性和燃料电池耐久性的综合价值损耗最小为优化目标设置奖励函数,采用一种收敛速度较快的深度强化学习算法—近端策略优化算法求解,实现负载功率在燃料电池和锂电池间的合理有效分配,最后,采用市域动车组实际运行工况进行实验验证。实验结果表明,在训练工况下,所提方法相较基于等效氢耗最小能量管理方法和基于Q-learning能量管理方法,综合价值损耗分别降低19.71%和5.87%;在未知工况下,综合价值损耗分别降低18.05%和13.52%。结果表明,所提方法能够有效降低综合价值损耗,并具有较好的工况适应性。

     

    Abstract: In order to reduce the fuel economy cost of fuel cell hybrid systems of city EMUs and improve the durability of the fuel cell, this paper proposes an energy management method based on proximal policy optimization algorithm. The method models the hybrid system energy management problem as a Markov decision process, and sets the reward function with the optimization objective of minimizing the comprehensive value depletion considering both fuel economy and fuel cell durability. Then, a deep reinforcement learning algorithm with high convergence speed, the proximal policy optimization algorithm, is used to solve the problem and achieve a reasonable and effective distribution of load power between the fuel cell and lithium battery, and finally, the actual operating conditions of EMUs are used for experimental verification. The experimental results show that the proposed method reduces the comprehensive value depletion by 19.71% and 5.87% under the training condition compared with the equivalent hydrogen consumption minimum and the Q-learning respectively, and reduces the comprehensive value depletion by 18.05% and 13.52% under the unknown condition respectively. The results show that the proposed method can effectively reduce the comprehensive value depletion and has good adaptability to working conditions.

     

/

返回文章
返回