
1. 贵州电网有限责任公司电力科学研究院
2. 贵州电网有限责任公司
Published:2025
移动端阅览
Siwu Yu, Yumin He, Guobang Ban, et al. Construction and Application of the General Knowledge Base for Electric Power Safety Industry[J]. 2025, (6).
随着电力系统的发展,电力安全管控对统一、标准化的通用知识库需求愈发迫切,但当前知识库构建存在效率低、非结构化数据利用不足、人工依赖度高、知识更新滞后等问题。该文提出面向电力安全领域的通用知识库构建方法,旨在探索高效智能化构建路径,为基于大语言模型的上层应用奠定高质量数据基础。构建方法上,结合电力行业本质安全分析框架与大语言模型的信息抽取优势,形成多源异构数据系统化梳理整合策略,涵盖本质安全历史数据分析、安全制度与操作规程结构化处理等内容。应用分析部分,重点介绍基于检索增强生成的电力智能问答系统,及面向大模型微调的电力安全语料构建实践,通过定量指标验证了方法可行性与有效性。最后总结研究成果,展望知识库构建技术的持续优化,及在更广泛业务场景中的应用潜力。
With the development of the power system
the demand for a unified and standardized general knowledge base in power safety control has become increasingly urgent. However
the current construction of knowledge bases faces problems such as low efficiency
insufficient utilization of unstructured data
high dependence on manual labor
and lagging knowledge updates. This paper proposes a general knowledge base construction method for the power safety field
aiming to explore an efficient and intelligent construction path and lay a high-quality data foundation for upper-layer applications based on large language models. In terms of the construction method
it combines the essential safety analysis framework of the power industry with the information extraction advantages of large language models to form a systematic strategy for the integration of multi-source heterogeneous data
covering the analysis of historical data on essential safety
the structured processing of safety regulations and operating procedures
and other contents. In the application analysis section
it focuses on introducing the power intelligent question-answering system based on retrieval-augmented generation and the practice of constructing power safety corpora for fine-tuning large models. The feasibility and effectiveness of the method are verified through quantitative indicators. Finally
the research results are summarized
and the continuous optimization of knowledge base construction technology and its application potential in more extensive business scenarios are prospected.
徐梦蝶. 电力安全生产中运用本质安全管理体系实践研究[J]. 电力设备管理. 2024(4): 216-218.
张若思. 电力企业本质安全的研究与实践[D]. 北京: 北京交通大学, 2012.
黄曙东. 现代化工业企业人因失误分析与事故预防[J]. 工业安全与环保. 2002(08): 41-43.
王奎芳,吕璐成,孙文君,等. 基于大模型知识蒸馏的专利技术功效词自动抽取方法研究:以车联网V2X领域为例[J]. 数据分析与知识发现. 2024, 8(Z1): 144-156.
李敬灿,肖萃林,覃晓婷,等. 基于大语言模型与语义增强的文本关系抽取算法[J]. 计算机工程. 2024, 50(4): 87-94.
潘雨黛,张玲玲,蔡忠闽,等. 基于大规模语言模型的知识图谱可微规则抽取[J]. 计算机科学与探索. 2023, 17(10): 2403-2412.
李盼飞,杨小康,白逸晨,等. 基于大语言模型的中医医案命名实体抽取研究[J]. 中国中医药图书情报杂志. 2024, 48(2): 108-113.
LI J, HU J, ZHANG G. Enhancing Relational Triple Extraction in Specific Domains:Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models[J]. Computers, Materials Continua. 2024, 79(2).
REICHENPFADER D, MüLLER H, DENECKE K. Large language model-based information extraction from free-text radiology reports: a scoping review protocol[J]. BMJ open. 2023, 13(12): e76865.
GAO Y, XIONG Y, GAO X, et al. Retrieval-Augmented Generation for Large Language Models: A Survey[J]. arXiv.org. 2024.
ALAYA M Z, BUSSY S, GAFFAS S, et al. Binarsity: a penalization for one-hot encoded features[J]. 2019(118).
WALLACH H M. Topic modeling: beyond bag-of-words[C]. ICML ''06: Proceedings of the 23rd international conference on Machine learning, New York, NY, USA, 2006.
JOACHIMS T, STR B. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization[J]. Springer US. 1997.
GUO J, CHE W, YAROWSKY D, et al. A Distributed Representation-Based Framework for Cross-Lingual Transfer Parsing[J]. Journal of Artificial Intelligence Research. 2016, 55: 995-1023.
JANSSON P, LIU S. Distributed Representation, LDA Topic Modelling and Deep Learning for Emerging Named Entity Recognition from Social Media[J]. 2017.
PALANGI H, DENG L, SHEN Y, et al. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2016, 24(4): 694-707.
DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[J]. ArXiv. 2019, abs/1810.04805.
LIU Y, OTT M, GOYAL N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[J]. ArXiv. 2019, abs/1907.11692.
REIMERS N, GUREVYCH I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks[C]. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, 2019.
SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: A unified embedding for face recognition and clustering[J]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015: 815-823.
CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised Cross-lingual Representation Learning at Scale[C]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Ithaca, 2020.
CHEN J, XIAO S, ZHANG P, et al. M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation[C]. 2024.
孙雨生,曾俊皓. 向量数据库及其应用研究[J]. 科技情报研究. 2024, 6(4): 11-24.
GAO Y, XIONG Y, GAO X, et al. Retrieval-Augmented Generation for Large Language Models: A Survey[J]. ArXiv. 2023, abs/2312.10997.
POSEDARU B, PANTELIMON F, DULGHERU M, et al. Artificial Intelligence Text Processing Using Retrieval-Augmented Generation: Applications in Business and Education Fields[J]. Proceedings of the International Conference on Business Excellence. 2024, 18(1): 209-222.
SHAN R. Certifying Generative AI: Retrieval-Augmented Generation Chatbots in High-Stakes Environments[J]. Computer (Long Beach, Calif.). 2024, 57(9): 35-44.
艾洲. 基于大模型和区块链的电力知识问答系统设计与实现[J]. 电力大数据. 2024, 27(1).AI Zhou. Design and implementation of a power knowledge system based on large model and block-chain technology[J]. POWER SYSTEMS AND BIG DATA. 2024, 27(1).
罗成,刘奕群,张敏,等. 基于用户意图识别的查询推荐研究[J]. 中文信息学报. 2014, 28(01): 64-72.
李勇,李银峰,高宸,等. 基于意图识别模型的意图识别方法,训练方法和设备[P]. 202211371788.8. 2023-03-28.
孙鑫,王厚峰. 问答中的问句意图识别和约束条件分析[J]. 中文信息学报. 2017, 31(06): 132-139.
HU J E, SHEN Y, WALLIS P, et al. LoRA: Low-Rank Adaptation of Large Language Models[J]. ArXiv. 2021, abs/2106.09685.
HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-Efficient Transfer Learning for NLP[J]. ArXiv. 2019, abs/1902.00751.
LI X L, LIANG P. Prefix-Tuning: Optimizing Continuous Prompts for Generation[J]. ArXiv. 2021, abs/2101.00190.
LIU X, ZHENG Y, DU Z, et al. GPT Understands, Too[J]. CoRR. 2021, abs/2103.10385.
LESTER B, RFOU R A, CONSTANT N. The Power of Scale for Parameter-Efficient Prompt Tuning[J]. ArXiv. 2021, abs/2104.08691.
魏海坤,徐嗣鑫,宋文忠. 神经网络的泛化理论和泛化方法[J]. 自动化学报. 2001, 27(6): 806-815.
王晖,何新贵. BP网络泛化能力改进研究[J]. 系统工程与电子技术. 2001, 23(3): 85-87.
ZHENG Y, ZHANG R, ZHANG J, et al. LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models[C]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 2024.作者简介:余思伍(1984),男,高级工程师,硕士,主要从事电网安风体系、安全监管、作业风险管控等。邮箱:swyu2012@163.com。
徐梦蝶. 电力安全生产中运用本质安全管理体系实践研究[J]. 电力设备管理. 2024(4): 216-218.
张若思. 电力企业本质安全的研究与实践[D]. 北京: 北京交通大学, 2012.
黄曙东. 现代化工业企业人因失误分析与事故预防[J]. 工业安全与环保. 2002(08): 41-43.
王奎芳,吕璐成,孙文君,等. 基于大模型知识蒸馏的专利技术功效词自动抽取方法研究:以车联网V2X领域为例[J]. 数据分析与知识发现. 2024, 8(Z1): 144-156.
李敬灿,肖萃林,覃晓婷,等. 基于大语言模型与语义增强的文本关系抽取算法[J]. 计算机工程. 2024, 50(4): 87-94.
潘雨黛,张玲玲,蔡忠闽,等. 基于大规模语言模型的知识图谱可微规则抽取[J]. 计算机科学与探索. 2023, 17(10): 2403-2412.
李盼飞,杨小康,白逸晨,等. 基于大语言模型的中医医案命名实体抽取研究[J]. 中国中医药图书情报杂志. 2024, 48(2): 108-113.
LI J, HU J, ZHANG G. Enhancing Relational Triple Extraction in Specific Domains:Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models[J]. Computers, Materials Continua. 2024, 79(2).
REICHENPFADER D, MüLLER H, DENECKE K. Large language model-based information extraction from free-text radiology reports: a scoping review protocol[J]. BMJ open. 2023, 13(12): e76865.
GAO Y, XIONG Y, GAO X, et al. Retrieval-Augmented Generation for Large Language Models: A Survey[J]. arXiv.org. 2024.
ALAYA M Z, BUSSY S, GAFFAS S, et al. Binarsity: a penalization for one-hot encoded features[J]. 2019(118).
WALLACH H M. Topic modeling: beyond bag-of-words[C]. ICML ''06: Proceedings of the 23rd international conference on Machine learning, New York, NY, USA, 2006.
JOACHIMS T, STR B. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization[J]. Springer US. 1997.
GUO J, CHE W, YAROWSKY D, et al. A Distributed Representation-Based Framework for Cross-Lingual Transfer Parsing[J]. Journal of Artificial Intelligence Research. 2016, 55: 995-1023.
JANSSON P, LIU S. Distributed Representation, LDA Topic Modelling and Deep Learning for Emerging Named Entity Recognition from Social Media[J]. 2017.
PALANGI H, DENG L, SHEN Y, et al. Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2016, 24(4): 694-707.
DEVLIN J, CHANG M, LEE K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[J]. ArXiv. 2019, abs/1810.04805.
LIU Y, OTT M, GOYAL N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[J]. ArXiv. 2019, abs/1907.11692.
REIMERS N, GUREVYCH I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks[C]. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, 2019.
SCHROFF F, KALENICHENKO D, PHILBIN J. FaceNet: A unified embedding for face recognition and clustering[J]. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015: 815-823.
CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised Cross-lingual Representation Learning at Scale[C]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Ithaca, 2020.
CHEN J, XIAO S, ZHANG P, et al. M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation[C]. 2024.
孙雨生,曾俊皓. 向量数据库及其应用研究[J]. 科技情报研究. 2024, 6(4): 11-24.
GAO Y, XIONG Y, GAO X, et al. Retrieval-Augmented Generation for Large Language Models: A Survey[J]. ArXiv. 2023, abs/2312.10997.
POSEDARU B, PANTELIMON F, DULGHERU M, et al. Artificial Intelligence Text Processing Using Retrieval-Augmented Generation: Applications in Business and Education Fields[J]. Proceedings of the International Conference on Business Excellence. 2024, 18(1): 209-222.
SHAN R. Certifying Generative AI: Retrieval-Augmented Generation Chatbots in High-Stakes Environments[J]. Computer (Long Beach, Calif.). 2024, 57(9): 35-44.
艾洲. 基于大模型和区块链的电力知识问答系统设计与实现[J]. 电力大数据. 2024, 27(1).AI Zhou. Design and implementation of a power knowledge system based on large model and block-chain technology[J]. POWER SYSTEMS AND BIG DATA. 2024, 27(1).
罗成,刘奕群,张敏,等. 基于用户意图识别的查询推荐研究[J]. 中文信息学报. 2014, 28(01): 64-72.
李勇,李银峰,高宸,等. 基于意图识别模型的意图识别方法,训练方法和设备[P]. 202211371788.8. 2023-03-28.
孙鑫,王厚峰. 问答中的问句意图识别和约束条件分析[J]. 中文信息学报. 2017, 31(06): 132-139.
HU J E, SHEN Y, WALLIS P, et al. LoRA: Low-Rank Adaptation of Large Language Models[J]. ArXiv. 2021, abs/2106.09685.
HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-Efficient Transfer Learning for NLP[J]. ArXiv. 2019, abs/1902.00751.
LI X L, LIANG P. Prefix-Tuning: Optimizing Continuous Prompts for Generation[J]. ArXiv. 2021, abs/2101.00190.
LIU X, ZHENG Y, DU Z, et al. GPT Understands, Too[J]. CoRR. 2021, abs/2103.10385.
LESTER B, RFOU R A, CONSTANT N. The Power of Scale for Parameter-Efficient Prompt Tuning[J]. ArXiv. 2021, abs/2104.08691.
魏海坤,徐嗣鑫,宋文忠. 神经网络的泛化理论和泛化方法[J]. 自动化学报. 2001, 27(6): 806-815.
王晖,何新贵. BP网络泛化能力改进研究[J]. 系统工程与电子技术. 2001, 23(3): 85-87.
ZHENG Y, ZHANG R, ZHANG J, et al. LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models[C]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 2024.作者简介:余思伍(1984),男,高级工程师,硕士,主要从事电网安风体系、安全监管、作业风险管控等。邮箱:swyu2012@163.com。
0
Views
1
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621