朱继忠, 黄林莹, 陈一熙. 基于代理梯度深度强化学习的电力系统网络攻击事后安全控制策略[J]. 电网技术, 2024, 48(10): 4041-4049. DOI: 10.13335/j.1000-3673.pst.2024.0643
引用本文: 朱继忠, 黄林莹, 陈一熙. 基于代理梯度深度强化学习的电力系统网络攻击事后安全控制策略[J]. 电网技术, 2024, 48(10): 4041-4049. DOI: 10.13335/j.1000-3673.pst.2024.0643
ZHU Jizhong, HUANG Linying, CHEN Yixi. Surrogate Gradient-based Deep Reinforcement Learning for Power System Post-contingency Safety Control Against Cyber-attacks[J]. Power System Technology, 2024, 48(10): 4041-4049. DOI: 10.13335/j.1000-3673.pst.2024.0643
Citation: ZHU Jizhong, HUANG Linying, CHEN Yixi. Surrogate Gradient-based Deep Reinforcement Learning for Power System Post-contingency Safety Control Against Cyber-attacks[J]. Power System Technology, 2024, 48(10): 4041-4049. DOI: 10.13335/j.1000-3673.pst.2024.0643

基于代理梯度深度强化学习的电力系统网络攻击事后安全控制策略

Surrogate Gradient-based Deep Reinforcement Learning for Power System Post-contingency Safety Control Against Cyber-attacks

  • 摘要: 为解决电力系统在遭受网络攻击后修复过程中的系统安全稳定问题,同时应对环境中存在的不确定性因素,该文提出了一种基于代理梯度深度强化学习的电力系统网络攻击事后安全控制策略。首先,建立了针对信息系统数据与信息系统功能的网络攻击模型,构建了攻击后系统的安全控制策略模型,并分析了网络攻击下系统事件演化过程;接着,在深度强化学习的框架下,定义了安全控制策略的马尔科夫决策过程;而后,设计了一种基于代理梯度的深度强化学习算法,通过参数扰动生成智能体种群,将智能体种群中对应各组扰动的适应度加权平均作为代理梯度;最后,采用IEEE-39节点系统对该文所提方法的有效性和优越性进行验证。

     

    Abstract: To address the security and stability problem of power systems in the restoration process after cyber-attacks while coping with environmental uncertainties, a surrogate gradient-based deep reinforcement learning for power system post-contingency safety control strategy against cyber-attacks is proposed in this paper. First, the cyber-attack models against information system data and functions are established, and the system security control model is constructed. The evolution process of system events under cyber-attacks is analyzed. Second, the Markov decision process of the security control strategy is defined under the framework of deep reinforcement learning. Then, a surrogate gradient-based deep reinforcement learning algorithm is designed, where the agent population is generated by perturbing the agent parameters, and the weighted average of the fitness values corresponding to each perturbation is used as the surrogate gradient. Finally, the effectiveness and superiority of the proposed method are verified on the IEEE 39-bus system.

     

/

返回文章
返回