The "dual-carbon" goal drives large-scale renewable integration and introduces strong stochastic disturbances
making it difficult for traditional control methods to obtain multi-region coordinated optimal solutions. Although reinforcement learning can address this issue
its performance remains affected by overestimation bias and reward noise arising from agent-environment interactions. Therefore
a novel algorithm of multi-agent coordinated automatic generation control for integrated energy system is proposed in this paper
i.e.
the algorithm of composite value estimation twin-delayed deep deterministic policy gradient based on behavioral cloning to obtain the multi-area coordinated optimal solution. The policy constraint is introduced through behavioral cloning to alleviate overestimation bias while interacting with the environment. And the composite Q-learning is used to achieve composite estimation of action values in order to better adapt to randomness or noise in the reward signal. By simulating a two-area load frequency control model dominated by renewable energy and a four-area integrated energy system model based on the Hubei power grid
it is verified that the proposed algorithm can obtain the multi-area coordinated stochastic optimal solution and demonstrates superior control performance compared to other various reinforcement learning algorithms.