To address the issues of low decomposition accuracy in multi-device concurrent scenarios and high dependency on large-scale labeled data for non-intrusive load disaggregation methods,a multi-device transfer learning load disaggregation method based on Conformer and mixture of experts(MoE) is proposed. This method leverages the Conformer to integrate the local perception capabilities of convolution and the global modeling capacity of self-attention mechanisms. By introducing a sparsely-activated MoE module,the model capacity is expanded at low computational cost,thereby enhancing its representational power for power consumption patterns. Furthermore,it constructs a ‘backbone-branch’ transfer learning framework,which transfers knowledge across different datasets by pre-training the shared backbone on source domains and fine-tuning specific appliance branches on the target domain. Case study analysis demonstrates that the proposed method significantly improves decomposition accuracy in multi-device concurrent scenarios and enhances generalization capability for cross-dataset transfer.