Research on intelligent early warning of academic paper retraction risks： Hybrid enhanced detection framework based on large language models

doi:10.11946/cjstp.202506240737

Abstract

Abstract:

Purposes Aiming at the problem of the spread of incorrect knowledge caused by the lag in retracting academic papers， this study aims to shorten the identification cycle of abnormal paper status and maintain the reliability and integrity of the academic communication system through AI technique. Methods Integrate the metadata and peer reviews contained in 12，098 papers from authoritative platforms such as PubPeer and PubMed as data support， develop a hybrid enhanced detection framework for paper retraction risk， and make decisions through retrieval-enhanced generation and expert-enhanced generation results. Findings This framework can accurately and effectively identify retracted papers. The verification accuracy rate of the retracted paper status reaches 91.91%， and the recall rate reaches 73.72%. Conclusions This study confirms the technical feasibility of large language models in academic early warning research and academic integrity maintenance. The application of the hybrid enhanced detection framework can provide a practical retraction early warning plan for publishing institutions. The open sharing of the hybrid enhanced detection framework will promote the standardization process of research on using large language models to handle retraction risk detection， help build an active monitoring scientific research integrity management system， improve supervision efficiency， and maintain the healthy development of the scientific research ecosystem.

Key words: Retracted papers, Retraction risk detection, Large language models, Research integrity

摘要：

目的针对学术论文撤稿滞后现象导致的错误知识传播问题，运用人工智能技术缩短论文异常状态识别周期，维护学术交流系统的可靠性与完整性。方法整合PubPeer、 PubMed等权威平台12098篇论文所包含的元数据及其同行评议作为数据支撑，开发论文撤销风险混合增强检测框架，通过检索增强生成和专家增强生成结果进行决策。结果框架可准确、有效地识别撤销论文，撤销论文状态验证准确率达91.91%，召回率达73.72%。结论大语言模型在学术预警研究及学术诚信维护中具有可行性，应用混合增强检测框架能够为出版机构提供可操作的撤稿预警方案，助力构建主动监测的科研诚信管理体系，提高监管效率，维护科研生态健康发展。

关键词: 撤销论文, 撤稿风险检测, 大语言模型, 科研诚信

ZHAO Xiansong, CHEN Xiaohui, LU Ye, YANG Ming, LIN Yuan. Research on intelligent early warning of academic paper retraction risks： Hybrid enhanced detection framework based on large language models[J]. Chinese Journal of Scientific and Technical Periodicals, 2025, 36(11): 1478-1486.

赵显嵩, 陈晓晖, 陆晔, 杨茗, 林原. 学术撤稿风险智能预警研究：基于大语言模型的混合增强检测框架[J]. 中国科技期刊研究, 2025, 36(11): 1478-1486.

/ / Recommend / Download Citations

URL: https://www.cjstp.cn/EN/10.11946/cjstp.202506240737

https://www.cjstp.cn/EN/Y2025/V36/I11/1478

Figures/Tables 4

References 26

[1]	Van Noorden R. More than 10，000 research papers were retracted in 2023： A new record［J］. Nature， 2023， 624（7992）： 479-481. doi: 10.1038/d41586-023-03974-8
[2]	Cokol M， Ozbay F， Rodriguez‐Esteban R. Retraction rates are on the rise［J］. EMBO Reports， 2008， 9（1）： 2. doi: 10.1038/sj.embor.7401143 pmid: 18174889
[3]	李福连. 学术期刊撤稿研究综述［J］. 科技和产业， 2024， 24（8）： 160-165.
[4]	周志新. 中文科技期刊被撤销论文特征分析及启示［J］. 中国科技期刊研究， 2017， 28（11）： 1065-1070. doi: 10.11946/cjstp.201707160590
[5]	Leta J， Araujo K， Treiber S. Citing documents of Wakefield’s retracted article： The domino effect of authors and journals［J］. Scientometrics， 2022， 127（12）： 7333-7349. doi: 10.1007/s11192-022-04353-2
[6]	Van Noorden R. Controversial COVID study that promoted unproven treatment retracted after four-year saga［EB/OL］. ［2025-05-05］. https://www.nature.com/articles/d41586-024-04014-9. URL
[7]	陈俊辉，张丽华. 中国学者撤销论文的撤销时滞影响因素研究：基于撤稿观察数据库［J］. 中国科技期刊研究， 2024， 35（7）： 995-1003. doi: 10.11946/cjstp.202312281053
[8]	张丽华，田丹. 不同学科论文被撤原因及撤销时滞对比分析［J］. 数字图书馆论坛， 2019（8）： 40-44.
[9]	袁子晗，靳彤. 高影响力国际科技期刊撤稿论文特征分析及启示：以Cell、Nature和Science为例［J］. 中国科技期刊研究， 2024， 35（2）： 216-225.
[10]	高羽英，侯剑华. 学术期刊撤稿时滞的计量分析与应对策略研究［J］. 中国科技期刊研究， 2024， 35（12）： 1723-1731.
[11]	Fang F C， Steen R G， Casadevall A. Misconduct accounts for the majority of retracted scientific publications［J］. Proceedings of the National Academy of Sciences of the United States of America， 2012， 109（42）： 17028-17033. doi: 10.1073/pnas.1212247109 pmid: 23027971
[12]	Brown T， Mann B， Ryder N， et al. Language models are few-shot learners［C］// Advances in Neural Information Processing Systems 33： Annual Conference on Neural Information Processing Systems 2020， NeurIPS 2020. December 6- 12， 2020.
[13]	佚名. 《负责任研究行为规范指引（2023）》节选［J］. 上海预防医学， 2024， 36（3）： 311.
[14]	Lewis M， Liu Y H， Goyal N， et al. BART： Denoising sequence-to-sequence pre-training for natural language generation， translation， and comprehension［EB/OL］. ［2025-06-23］. https://arxiv.org/pdf/1910.13461. URL
[15]	Li X Y， Wang S， Zeng S Q， et al. A survey on LLM-based multi-agent systems： Workflow， infrastructure， and challenges［J］. Vicinagearth， 2024， 1（1）： 9. doi: 10.1007/s44336-024-00009-2
[16]	Pham C， Hoyle A， Sun S M， et al. TopicGPT： A prompt-based topic modeling framework［C］// Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies NAACL2024. Mexico City， Mexico， 2024： 2956-2984.
[17]	Rackauckas Z. RAG-Fusion： A new take on retrieval-augmented generation［J］. International Journal on Natural Language Computing， 2024， 13（1）： 37-47. doi: 10.5121/ijnlc URL
[18]	Cormack G V， Clarke C L A， Buettcher S. Reciprocal rank fusion outperforms condorcet and individual rank learning methods［C］// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. Boston MA USA： ACM， 2009： 758-759.
[19]	Pérez J M， Rajngewerc M， Giudici J C， et al. Pysentimiento： A python toolkit for opinion mining and social NLP tasks［EB/OL］. ［2025-06-23］. https://arxiv.org/abs/2106.09462. URL
[20]	Dettmers T， Pagnoni A， Holtzman A， et al. QLoRA： Efficient finetuning of quantized LLMs［C］// Advances in Neural Information Processing Systems 36： Annual Conference on Neural Information Processing Systems 2023， NeurIPS 2023. OrleansNew， LA， USA， 2023.
[21]	Team G， Riviere M， Pathak S， et al. Gemma 2： Improving open language models at a practical size［EB/OL］. ［2025-06-23］. https://arxiv.org/abs/2408.00118. URL
[22]	Grattafiori A， Dubey A， Jauhri A， et al. The llama 3 herd of models［EB/OL］. ［2025-06-23］. https://arxiv.org/abs/2407.21783. URL
[23]	Jiang A Q， Sablayrolles A， Mensch A， et al. Mistral 7B［EB/OL］. ［2025-06-23］. https://arxiv.org/abs/2310.06825. URL
[24]	Qwen 2. 5 technical report［EB/OL］. ［2025-06-23］. https://arxiv.org/abs/2412.15115. URL
[25]	DeepSeek-AI， Guo D， Yang D， et al. DeepSeek-R1： Incentivizing reasoning capability in LLMs via reinforcement learning［EB/OL］. ［2025-06-23］. https://arxiv.org/abs/2501.12948. URL
[26]	DeepSeek-AI， Liu A， Feng B， et al. DeepSeek-V3 technical report［EB/OL］. ［2025-06-23］. https://arxiv.org/abs/2412.19437. URL

模型	参数量	准确率/%	精确率/%	召回率/%	F₁/%
零样本任务-Gemma2	9B	42.86	19.46	90.58	32.04
零样本任务-Mistral	7B	75.68	33.70	65.65	44.54
零样本任务-Qwen2.5	7B	73.64	34.16	83.28	48.45
零样本任务-LLaMA3	8B	47.06	20.93	92.10	34.10
零样本任务-DeepSeek-r1	API	85.40	53.85	12.77	20.64
训练任务-Gemma2	9B	77.62	37.46	75.38	50.05
训练任务-Mistral	7B	88.43	61.37	59.88	60.62
训练任务-Qwen2.5	7B	88.52	63.74	52.89	57.81
训练任务-LLaMA3	8B	89.51	64.06	67.17	65.58
混合增强检测框架-LLaMA3	8B	91.91	71.31	76.29	73.72
混合增强检测框架-DeepSeek-r1	8B	90.55	64.93	79.33	71.41

模型	参数量	准确率/%	精确率/%	召回率/%	F₁/%
零样本任务-Gemma2	9B	42.86	19.46	90.58	32.04
零样本任务-Mistral	7B	75.68	33.70	65.65	44.54
零样本任务-Qwen2.5	7B	73.64	34.16	83.28	48.45
零样本任务-LLaMA3	8B	47.06	20.93	92.10	34.10
零样本任务-DeepSeek-r1	API	85.40	53.85	12.77	20.64
训练任务-Gemma2	9B	77.62	37.46	75.38	50.05
训练任务-Mistral	7B	88.43	61.37	59.88	60.62
训练任务-Qwen2.5	7B	88.52	63.74	52.89	57.81
训练任务-LLaMA3	8B	89.51	64.06	67.17	65.58
混合增强检测框架-LLaMA3	8B	91.91	71.31	76.29	73.72
混合增强检测框架-DeepSeek-r1	8B	90.55	64.93	79.33	71.41

检索增强代理	专家增强代理	准确率/%	精确率/%	召回率/%	F₁/%
×	×	89.83	64.29	71.12	67.53
√	×	90.87	69.91	67.78	68.83
×	√	91.18	67.72	77.81	72.42
√	√	91.91	71.31	76.29	73.72

检索增强代理	专家增强代理	准确率/%	精确率/%	召回率/%	F₁/%
×	×	89.83	64.29	71.12	67.53
√	×	90.87	69.91	67.78	68.83
×	√	91.18	67.72	77.81	72.42
√	√	91.91	71.31	76.29	73.72

Please choose a citation manager

Content to export