中国科技期刊研究 ›› 2025, Vol. 36 ›› Issue (9): 1280-1287. doi: 10.11946/cjstp.202507300919

• 评价与分析 • 上一篇    下一篇

3款通用AI工具在医学期刊校对中的应用效能分析

武玉欣()(), 于溪*()()   

  1. 中国医科大学期刊中心《中国医科大学学报》编辑部,辽宁省沈阳市和平区北二马路92号 110001
  • 收稿日期:2025-07-30 出版日期:2025-10-28 发布日期:2025-10-28
  • 通讯作者: 于溪
  • 作者简介:

    武玉欣(ORCID:0009-0006-9057-0606),硕士,副研究员,编辑部主任,E-mail:

    作者贡献声明: 武玉欣:设计论文框架,收集分析数据,撰写和修订论文; 于 溪:审核论文框架,撰写和修订论文。
  • 基金资助:
    辽宁省期刊协会专项基金项目“新质生产力视域下医学期刊青年编辑的角色定位与素养提升”(LJA2024-GJ-032)

Analysis of the application effectiveness of three general-purpose AI tools in proofreading medical journal

WU Yuxin()(), YU Xi*()()   

  1. Editorial Office of Journal of China Medical University,Journal Center of China Medical University,92 Beier Road,Heping District,Shenyang 110001,China
  • Received:2025-07-30 Online:2025-10-28 Published:2025-10-28
  • Contact: YU Xi

摘要:

目的 比较3款热门AI工具(豆包、Kimi和DeepSeek)在医学期刊校对中的应用效能,旨在为合理利用AI工具,提高编校效率提供参考。方法 随机抽取2024年第1~12期《中国医科大学学报》已发表的43篇经作者确认的最终录用论著稿件作为研究对象,以人工校对为标准,按照文字语法、标点符号、量和单位、数字用法,术语、公司名称、网址、软件名称等指标统计豆包、Kimi和DeepSeek在医学期刊中的校对效率;采用卡方检验或Fisher精确概率法比较3款AI工具的校对效能。结果 人工校对总检出差错量为 1718 个,豆包、Kimi和DeepSeek总检出差错量分别为 287 个(16.71%)、 127 个(7.39%)、 505 个(29.39%);3款 AI 工具的差错检出率比较差异显著(P<0.05)。进一步两两比较显示,DeepSeek的校对能力显著优于其他2款工具,其差错检出率分别是豆包和Kimi的1.76倍和3.98倍,而豆包的检出率亦显著高于 Kimi(P<0.05)。各指标校对检出率比较结果显示,与豆包、Kimi比较,DeepSeek在公司名称(61.54%)、网址(92.31%)、字母大小写(80.00%)、软件名称(100%)的差错检出率显著提高(P<0.05)。结论 DeepSeek在差错捕捉的广度上具有优势,豆包次之,Kimi 的总体检出能力相对最弱。另外,DeepSeek在网址、字母大小写和软件名称的差错检出率已接近人工水平,可以在实际校对中尝试使用。然而,3款 AI 工具检出的差错总数均远低于人工校对,提示通用AI 工具对医学论文的校对存在一定局限性和潜在风险,有待进一步优化。

关键词: 医学期刊, 论著, 豆包, Kimi, DeepSeek, 校对

Abstract:

Purposes This study aims to compare the efficacy of three popular AI tools (Doubao, Kimi, and DeepSeek) in proofreading medical journal articles, to provide a reference for their rational application in improving editorial efficiency. Methods A total of 43 final-accepted and author-confirmed research articles published in issues from 1 to 12 of 2024 in the Journal of China Medical University were randomly selected. Using manual proofreading as the gold standard, the proofreading performance of Doubao, Kimi, and DeepSeek was statistically analyzed based on the following indicators: grammar, punctuation, quantities and units, numeral usage, terminology, company names, URLs, and software names. Chi-square tests or Fisher’s exact tests were employed for comparisons. Fingdings Manual proofreading detected 1718 errors in total. The error detection counts for Doubao, Kimi, and DeepSeek were 287 (16.71%), 127 (7.39%), and 505 (29.39%), respectively, with statistically significant differences among the three tools (P < 0.05). Pairwise comparisons revealed that DeepSeek significantly outperformed the other two tools, with an error detection rate 1.76 times that of Doubao and 3.98 times that of Kimi. Doubao also performed significantly better than Kimi (P < 0.05). In the comparison of detection rates across various error categories, DeepSeek achieved significantly higher rates in company names (61.54%), URLs (92.31%), capitalization (80.00%), and software names (100.00%) than both Doubao and Kimi (all P < 0.05). Conclusions DeepSeek demonstrates an advantage in the breadth of error detection, followed by Doubao, while Kimi exhibits the weakest overall detection capability. DeepSeek’s performance in detecting errors in URLs, capitalization, and software names approached human-level performance, suggesting its potential for practical proofreading applications. However, the total number of errors detected by all three AI tools remained significantly lower than that identified through manual proofreading, indicating that general-purpose AI tools still have limitations and pose potential risks in the proofreading of medical articles. Further optimization is needed to enhance their efficacy in this specialized domain.

Key words: Medical journal, Article, Doubao, Kimi, DeepSeek, Proofreading