中国科技期刊研究 ›› 2021, Vol. 32 ›› Issue (11): 1355-1361. doi: 10.11946/cjstp.202105310452

• 学术不端防范专题 • 上一篇    下一篇

相似比例在科技论文剽窃检测中的适用性评价

张姣()   

  1. 清华大学环境学院《环境科学与工程前沿》编辑部, 北京市海淀区清华园1号 100084
  • 收稿日期:2021-05-31 修回日期:2021-09-02 出版日期:2021-11-15 发布日期:2021-11-15
  • 作者简介:张 姣(ORCID:0000-0001-8761-7561),博士,编辑,E-mail: jiaozhang@tsinghua.edu.cn
  • 基金资助:
    中国科技期刊卓越行动计划重点期刊项目(2019-B10);中国工程院战略研究与咨询项目环境与轻纺工程领域组(2021-ZD-01-10)

Applicability of similarity indexes to plagiarism check of scientific papers

ZHANG Jiao()   

  1. Editorial Office of Frontiers of Environmental Science & Engineering, School of Environment, Tsinghua University, 1 Qinghuayuan, Haidian District, Beijing 100084, China
  • Received:2021-05-31 Revised:2021-09-02 Online:2021-11-15 Published:2021-11-15

摘要:

【目的】 考察查重报告中相似比例作为稿件重复与否判断标准的可信度,并识别错判原因。【方法】 对CrossCheck/iThenticate生成的642篇查重报告进行人工核查,采用分类算法的评价指标对相似比例的可信度进行评价,并分析错判原因。【结果】 整体相似比例[包括总相似比例(TS)和主体部分相似比例(MS)]和单篇相似比例(SS)判断法的正确率均小于75%,SS法的召回率(85%)和精确率(47%)平衡协调较好(F1=0.61),3种判定方法按照相似比例可信度的排序为SS法、MS法、TS法,但仍存在大量错判案例。【结论】 设定合适的阈值,MS和SS可作为判断稿件重复与否的参考,但仍需对易出错条目进行人工核对,不宜过度依赖查重系统的检测结果。

关键词: 科技期刊, 稿件, 剽窃, 相似比例, 指标, 评价

Abstract:

[Purposes] This study intends to evaluate whether the similarity indexes in plagiarism check reports are reliable and analyze the reasons for the unreliable cases.[Methods] The plagiarism check reports of 642 papers yielded by CrossCheck/iThenticate were examined. Indexes of the sorting algorithm were used to assess the reliability of the similarity indexes and the reasons for the unreliable cases were analyzed. [Findings] Either overall similarity index percentage [including the total similarity (TS) and the main-body similarity (MS)] methods or single similarity (SS) index percentage method had an accuracy of <75%. With recall of 85% and precision of 47%, SS method had an F1 of 0.61. The reliability reduced in the order of SS method, MS method, and TS method. Meanwhile, a great number of manuscripts were incorrectly judged according to the similarity index percentages.[Conclusions] MS and SS can be used as references on condition of appropriate maximum limits, but manual double check is necessary, especially for the error-prone items.

Key words: Scientific journal, Manuscript, Plagiarism, Similarity index, Index, Evaluation