Chinese Journal of Scientific and Technical Periodicals ›› 2025, Vol. 36 ›› Issue (1): 37-43. doi: 10.11946/cjstp.202408280941

Previous Articles     Next Articles

Application of generative AI technology in indexing and summarization for scientific literature

SHEN Xibin1,2)()(), LIU Hongxia1,2), WANG Hongjian1,2), WANG Lilei1,2)   

  1. 1) New Media Department, Chinese Medical Association Publishing House, 69 Dongheyan Street, Xicheng District, Beijing 100052, China
    2) Key Laboratory of Knowledge Mining and Service for Medical Journals, National Press and Publication Administration, 69 Dongheyan Street, Xicheng District, Beijing 100052, China
  • Received:2024-08-28 Revised:2024-12-26 Online:2025-01-15 Published:2025-02-11

生成式人工智能技术在科技期刊论文关键信息提取与总结中的应用

沈锡宾1,2)()(), 刘红霞1,2), 王红剑1,2), 王立磊1,2)   

  1. 1) 中华医学会杂志社新媒体部,北京市西城区东河沿街69号 100052
    2) 国家新闻出版署医学期刊知识挖掘与服务重点实验室,北京市西城区东河沿街69号 100052
  • 作者简介:

    沈锡宾(ORCID:0000-0002-7310-8157),硕士,编审,E-mail: ;

    刘红霞,学士,副编审;王红剑,学士,编审;王立磊,学士,产品经理。

    作者贡献声明: 沈锡宾:论文框架设计,收集资料,数据处理,撰写论文; 刘红霞:收集资料,修改论文; 王红剑:数据处理,修改论文; 王立磊:收集资料,修改论文。
  • 基金资助:
    中国科技期刊卓越行动计划集群化试点项目(卓越计划-集群-5)

Abstract:

[Purposes] To explore the application capabilities of four large language models (LLMs) in key information extraction and summarization of medical papers, providing empirical references for the technical pathways of knowledge services in STM journals. [Methods] One hundred research articles published in National Medical Journal of China were selected randomly.Using prompt engineering, ChatGPT 4o, Kimi, ChatGLM 4.0, and iFLYTEK Spark were employed to extract information in JSON format from the papers. The LLMs’ abilities in knowledge extraction, text comprehension, and summarization were evaluated.[Findings] All models returned accurate JSON-format data successfully, demonstrating high accuracy in extracting information such as study sample, sample size, disease, research type, discipline, and keywords. The models also performed well in summary generation, though their understanding of research methods was suboptimal. [Conclusions] The study indicates that LLMs possess strong capabilities in text comprehension, knowledge extraction, and summarization, but certain shortcomings remain. Overcoming these technical challenges could enable GenAI to play a significant role in STM journal dissemination, knowledge services, and decision-making support in vertical domains.

Key words: Scientific journal, Knowledge indexing, Large language models, Generative AI, Knowledge services

摘要:

【目的】 探讨4种大模型技术在科技期刊论文关键信息提取与总结中的应用能力,为科技期刊知识服务技术路径的探索提供实证参考。【方法】 随机选取《中华医学杂志》100篇研究型文献,通过提示语工程利用ChatGPT 4o、Kimi、ChatGLM 4、星火认知大模型从文本中以JSON方式提取信息,并评价各大模型知识抽取、文本理解及总结能力。【结果】 所有大模型均返回准确的JSON格式数据,在提取研究对象、样本量、疾病、研究类型、学科和主题词等信息时,表现出较高的准确性。在概要总结能力上也表现良好,仅在研究方法的理解方面表现不佳。【结论】 大模型具备较强的文本理解、知识提取和总结能力,但也存在一些不足。若能克服技术难点,GenAI有望在科技期刊的内容传播、知识服务以及垂直领域的决策支持等方面发挥重要作用。

关键词: 科技期刊, 知识标引, 大语言模型, 生成式人工智能, 知识服务