王甜婧, 汤涌, 郭强, 黄彦浩, 陈兴雷, 黄河凯. 基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法[J]. 中国电机工程学报, 2020, 40(8): 2396-2406. DOI: 10.13334/j.0258-8013.pcsee.191443
引用本文: 王甜婧, 汤涌, 郭强, 黄彦浩, 陈兴雷, 黄河凯. 基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法[J]. 中国电机工程学报, 2020, 40(8): 2396-2406. DOI: 10.13334/j.0258-8013.pcsee.191443
WANG Tian-jing, TANG Yong, GUO Qiang, HUANG Yan-hao, CHEN Xing-lei, HUANG He-kai. Automatic Adjustment Method of Power Flow Calculation Convergence for Large-scale Power Grid Based on Knowledge Experience and Deep Reinforcement Learning[J]. Proceedings of the CSEE, 2020, 40(8): 2396-2406. DOI: 10.13334/j.0258-8013.pcsee.191443
Citation: WANG Tian-jing, TANG Yong, GUO Qiang, HUANG Yan-hao, CHEN Xing-lei, HUANG He-kai. Automatic Adjustment Method of Power Flow Calculation Convergence for Large-scale Power Grid Based on Knowledge Experience and Deep Reinforcement Learning[J]. Proceedings of the CSEE, 2020, 40(8): 2396-2406. DOI: 10.13334/j.0258-8013.pcsee.191443

基于知识经验和深度强化学习的大电网潮流计算收敛自动调整方法

Automatic Adjustment Method of Power Flow Calculation Convergence for Large-scale Power Grid Based on Knowledge Experience and Deep Reinforcement Learning

  • 摘要: 针对目前大电网潮流计算不收敛所带来的人力和时间成本消耗问题,提出了一种基于知识经验和深度强化学习的潮流计算收敛自动调整方法。首先介绍了潮流计算收敛调整的知识经验和深度强化学习的基本概念与原理,然后设计了强化学习的状态、动作空间和多重奖赏,以及深度神经网络的网架,再通过在强化学习中加入知识经验,缩小搜索空间,并模拟人调整的过程,先平衡有功功率再平衡无功功率,使搜索具有方向性,构建了潮流调整策略。在平衡无功功率时,利用Dijkstra算法以最优路径的方法,定位出无功功率可能不平衡位置附近的电容器和电抗器。最后,利用改进CEPRI36节点系统和东北电网实际系统验证了方法的有效性。

     

    Abstract: In order to solve the problem of manpower and time cost consumption caused by strict non-convergence of power flow in large-scale power grid calculation, an automatic adjustment method of power flow convergence based on knowledge experience and deep reinforcement learning was proposed. Firstly, the knowledge experience of power flow convergence adjustment and the basic concepts and principles of deep reinforcement learning were introduced. Then we designed state space, action space and multiple rewards of reinforcement learning, as well as the framework of deep neural network. Next, by adding knowledge and experience into reinforcement learning, the search space was narrowed. And the process of human adjustment was simulated, which is balancing active power and then balancing reactive power. So that the search has directivity, and the power flow adjustment strategy was constructed. When balancing reactive power, Dijkstra algorithm was used to locate capacitors and reactors near the nodes with potential reactive power deficiency by means of optimal path. Finally, the CEPRI36-node system and the northeast power grid were used to verify the effectiveness of the method.

     

/

返回文章
返回