韩冬, 黄微, 严正. 基于深度强化学习的电力市场虚拟投标策略[J]. 中国电机工程学报, 2022, 42(4): 1443-1454. DOI: 10.13334/j.0258-8013.pcsee.202092
引用本文: 韩冬, 黄微, 严正. 基于深度强化学习的电力市场虚拟投标策略[J]. 中国电机工程学报, 2022, 42(4): 1443-1454. DOI: 10.13334/j.0258-8013.pcsee.202092
HAN Dong, HUANG Wei, YAN Zheng. Deep Reinforcement Learning for Virtual Bidding in Electricity Markets[J]. Proceedings of the CSEE, 2022, 42(4): 1443-1454. DOI: 10.13334/j.0258-8013.pcsee.202092
Citation: HAN Dong, HUANG Wei, YAN Zheng. Deep Reinforcement Learning for Virtual Bidding in Electricity Markets[J]. Proceedings of the CSEE, 2022, 42(4): 1443-1454. DOI: 10.13334/j.0258-8013.pcsee.202092

基于深度强化学习的电力市场虚拟投标策略

Deep Reinforcement Learning for Virtual Bidding in Electricity Markets

  • 摘要: 针对电力现货市场中日前(day-ahead,DA)市场和实时(real-time,RT)市场的价格差异导致的市场运行风险大、效率低等问题,可采用虚拟投标(virtual bidding,VB)对未知分布的日前和实时价差进行套利,以促进二者的价格趋同。从时空维度搭建虚拟投标的市场架构,将虚拟投标划分为机组型和负荷型2种类别,以虚拟投标者累积收益最大化为目标,建立含预算约束的多区域多时段虚拟投标模型,该模型可表述为经典的0-1背包问题。同时,采用条件风险价值工具量化风险偏好、规避、中立3种类型的虚拟投标者所面临的风险,建立考虑风险度量的电力市场虚拟投标策略模型。针对这一问题的求解,构建深度强化学习(deep reinforcement learning,DRL)网络框架,通过设计合理的状态、动作空间及奖励函数,并利用深度Q网络与环境交互,获得信息反馈并优化神经网络参数,实现对最优投标策略的有效求解。利用美国PJM电力市场2018年6—12月数据计算虚拟投标者的累积收益和夏普比率,并与贪心算法、动态规划等方法进行对比,验证了该文模型和算法的有效性、优越性。

     

    Abstract: In order to solve the problem of the high risks and low efficiency caused by the inconsistency of the day-ahead and real-time electricity market prices in the electricity spot market, virtual bidding (VB) was used to arbitrage on difference between such two market prices that are unknown to virtual bidders to promote the fair competition. The virtual bidding model was established from the dimensions of time and space. In order to maximize the cumulative payoff of virtual bidders, the proposed model took the budget constraints of virtual bidders into account, as well as conducted two types of virtual bidding of decrement and increment bids for multiple locations on a period of time. And the problem was formulated as a 0-1 knapsack problem. Meanwhile, the conditional value-at-risk was used to quantify the risks faced by virtual bidders by risk pursuing and aversion. A virtual bidding model under risk measurement was also established. In order to solve the problem, through the design of state space, action space and reward function, a deep reinforcement learning network framework was built. Meanwhile, the deep Q network was used to interact with the environment to obtain feedback and the parameters of the neural network was optimized to achieve an effective solution to the optimal bidding strategy. The PJM data from June to December in 2018 was used to calculate the cumulative profits and Sharpe ratio of virtual bidders. Compared with greedy algorithm and dynamic programming, the the effectiveness and superiority of deep reinforcement learning algorithm is verified in this paper.

     

/

返回文章
返回