明威宇, 李妍, 程时杰, 龙禹, 徐菁, 王少荣. 基于态势利导的需求响应自学习优化调度方法[J]. 电力系统自动化, 2022, 46(23): 109-116.
引用本文: 明威宇, 李妍, 程时杰, 龙禹, 徐菁, 王少荣. 基于态势利导的需求响应自学习优化调度方法[J]. 电力系统自动化, 2022, 46(23): 109-116.
MING Weiyu, LI Yan, CHENG Shijie, LONG Yu, XU Jing, WANG Shaorong. Self-learning Optimal Scheduling Method of Demand Response Based on Situation Orientation[J]. Automation of Electric Power Systems, 2022, 46(23): 109-116.
Citation: MING Weiyu, LI Yan, CHENG Shijie, LONG Yu, XU Jing, WANG Shaorong. Self-learning Optimal Scheduling Method of Demand Response Based on Situation Orientation[J]. Automation of Electric Power Systems, 2022, 46(23): 109-116.

基于态势利导的需求响应自学习优化调度方法

Self-learning Optimal Scheduling Method of Demand Response Based on Situation Orientation

  • 摘要: 针对多随机场景下用户可选择需求响应(CCR)的场景组合激增问题,利用深度强化学习算法实现CCR群组的优选及其所包含节点的优化调度。首先,根据CCR优化调度的约束条件与目标函数,分析其数学模型及日调度周期的求解复杂度;然后,基于马尔可夫决策过程将CCR优化调度过程映射至态势感知元组,并基于竞争深度Q网络架构建立态势利导函数,通过多次态势推演,利用小批量梯度下降法对态势利导函数求导,不断反馈更新算法参数,实现决策优化;最后,基于IEEE 33节点算例,通过不同规模的随机样本数量,在随机运行方式下实现了待选CCR群组的优选,并制定相应的优化调度策略。

     

    Abstract: Aiming at the scene combination surge problem of the consumer choice resource(CCR) in multiple stochastic scenarios,this paper uses the deep reinforcement learning algorithm to achieve the optimal selection of CCR groups and the optimal scheduling of the contained nodes. First, according to the constraint conditions and objective function of optimal scheduling for CCR, the mathematical model and the solution complexity of the daily scheduling cycle are analyzed. Then, the optimal scheduling process for CCR is mapped into the situation awareness tuple based on the Markov decision process, and the situation orientation function is established based on the architecture of the dueling deep Q network. Through multiple situation deductions, the situation orientation function is derived by using the small batch gradient descent method, and the algorithm parameters are continuously fed back and updated to realize the decision optimization. Finally, based on the IEEE 33-bus example, by using random number of samples with different sizes, the optimization of the CCR group to be selected is realized in the random operation mode, and the corresponding optimal scheduling strategy is formulated.

     

/

返回文章
返回