Abstract:
Considering the differentiated characteristics of prosumers in new power systems, the high importance of privacy in energy trading by prosumers, and the limitations of traditional model-based optimization methods in an environment with multiple uncertainties, this paper proposes a multi-agent reinforcement learning method with differentiated characteristics and privacy preservation for community energy trading. Firstly, the differentiated characteristics of prosumer, such as geographical location, type of distributed energy resource, and intrinsic type, are analyzed, and corresponding typical prosumer models are established. Secondly, a community energy trading model based on the market structure is constructed based on the mid-market rate pricing. Finally, taking market benefits and operating costs as optimization objectives, the energy trading optimization of prosumers participating in community energy trading is constructed into a partially observable Markov decision process. Aiming at the sparse reward problem brought by the state of the content recurrent constraint of energy storage, this paper proposes to modify the reward function by the cosine distance-based dynamic reward shaping. Aiming at the non-stationary problem of a multi-agent environment, this paper proposes to approximate the Q function of the soft actor-critic algorithm by the mean-field approximation mechanism. The proposed algorithm is then employed to obtain the prosumers' energy management decisions. Results of the case study show that the proposed algorithm has a 1.39%-54.32% improvement and 0.46%-50.34% reduction in aspects of training efficiency and average cumulative daily cost in solving energy trading optimization considering differentiated characteristics and privacy preservation.