影响因子操纵期刊识别与分类方法构建与应用

doi:10.11946/cjstp.202209250730

中国科技期刊研究 ›› 2023, Vol. 34 ›› Issue (2): 136-143. doi: 10.11946/cjstp.202209250730

影响因子操纵期刊识别与分类方法构建与应用

姜丰辉¹⁾()(), 刘祥鹏²⁾, 邵巍³⁾, 陈春平¹⁾, 于龙振⁴⁾^,^*()()

1) 《青岛科技大学学报(自然科学版)》编辑部,山东省青岛市崂山区松岭路99号 266061
2) 青岛科技大学数理学院,山东省青岛市崂山区松岭路99号 266061
3) 青岛科技大学自动化与电子工程学院,山东省青岛市崂山区松岭路99号 266061
4) 青岛科技大学经济与管理学院,山东省青岛市崂山区松岭路99号 266061

收稿日期:2022-09-25 修回日期:2023-01-08 出版日期:2023-02-15 发布日期:2023-03-20
通讯作者:
*于龙振(ORCID:0000-0002-6594-6679),博士,副教授,E-mail:yulongzhen@qust.edu.cn。
作者简介:
姜丰辉(ORCID:0000-0001-8655-2305),硕士,编辑,E-mail:jfhict@qust.edu.cn;
刘祥鹏,硕士,副教授;
邵巍,博士,教授;
陈春平,博士,编审。
作者贡献声明: 姜丰辉:提出研究思路,调研文献,收集、处理数据,选择算法,编写程序,分析实验结果,撰写论文; 刘祥鹏:处理数据,选择算法,编写程序,修改论文; 邵巍:选择算法,设计程序,修改论文; 陈春平:分析方案可行性,修改论文; 于龙振:分析方案可行性,处理数据,选择算法,编写程序,分析实验结果,修改论文。
基金资助:
中国高校科技期刊研究会项目“基于大数据与人工智能算法的期刊影响因子操纵模式识别与对策”(CUJS-CX-2021-029); 山东省教育厅项目“山东省高等学校期刊高质量发展建设项目”(JYTQKB202211)

Identification and classification of journals of impact factor manipulation

JIANG Fenghui¹⁾()(), LIU Xiangpeng²⁾, SHAO Wei³⁾, CHEN Chunping¹⁾, YU Longzhen⁴⁾()()

1) Editorial Office of Journal of Qingdao University of Science and Technology (Natural Science Edition), 99 Songling Road, Laoshan District, Qingdao 266061, China
2) School of Mathematics and Physics, Qingdao University of Science and Technology, 99 Songling Road, Laoshan District, Qingdao 266061, China
3) College of Automation and Electronic Engineering, Qingdao University of Science and Technology, 99 Songling Road, Laoshan District, Qingdao 266061, China
4) College of Economics and Management, Qingdao University of Science and Technology, 99 Songling Road, Laoshan District, Qingdao 266061, China

Received:2022-09-25 Revised:2023-01-08 Online:2023-02-15 Published:2023-03-20

摘要/Abstract

摘要：

【目的】 严重的期刊影响因子操纵现象影响了影响因子客观性,这种不正当行为应该被严格禁止,识别受操纵期刊的有效方式亟待发掘。【方法】 以Web of Science 平台发布的历年JCR数据为研究对象,选取正常期刊和异常(因影响因子受操纵而被镇压)期刊的14个文献计量学指标的历年数据,形成正常和异常2个期刊数据集。利用Python Scikit-learn库编写机器学习算法程序,对由正常、异常期刊数据集合并生成的训练集、验证集和测试集分别进行分类、训练、验证、测试。【结果】 机器学习算法可以有效地对正常、异常期刊数据集进行分类,对验证集分类的准确率、精确率和召回率均达到98%以上,对算法最重要的5个特征的特征重要性为91.55%。部分算法对镇压后恢复正常期刊在镇压后第5年的数据的识别效果开始降低,所有编辑关注期刊均被分类为异常期刊,2021版JCR镇压期刊及镇压预警期刊均被准确分类为异常期刊。支持向量机算法具有最好的预测效果。【结论】 机器学习算法在识别影响因子操纵期刊上具有天然的快速性和客观性优势。随着对影响因子的操纵手法及文献计量学指标不断增多,人工综合各种指标来识别、判定受操纵期刊的难度越来越大,各种机器学习算法的优势不断凸显。

关键词: 影响因子操纵, JCR镇压期刊, JCR编辑关注期刊, JCR指标, 机器学习, 自动识别

Abstract:

[Purposes] The serious manipulation of journal impact factors has seriously affected its objectivity, and this improper behavior should be strictly prohibited. It is urgent to find effective methods for identifying manipulated journals. [Methods] Taking the JCR data published on the Web of Science platform as the research object, the data on 14 bibliometrics indexes of normal journals and abnormal (suppressed due to manipulation of impact factors) journals were selected to form two data sets (normal and abnormal). Python Scikit-learn library was used to compile machine learning algorithm program to classify, train, verify, and test the training set, verification set, and test set generated from the normal and abnormal combined data set. [Findings] The machine-learning algorithm effectively classifies the normal and abnormal journal data sets, with precision, accuracy, and recall rate in data validation sets reaching more than 98%. The feature importance of the 5 most important features of the algorithm is 91.55%. The recognition effect of some algorithms on the data of the fifth year after the suppression of the journals restored to normal begins to decline. All the journals concerned by editors are classified as abnormal journals. The 2021 edition JCR suppression and suppression-warning journals are accurately classified as abnormal journals. Support vector machine algorithm has an optimal prediction effect. [Conclusions] The machine-learning algorithm has natural advantages of rapidity and objectivity in the recognition of journals of impact factors manipulation. With the increasing number of manipulation methods of impact factors and bibliometric indicators, it is more and more difficult to manually synthesize various indicators for identification and judgment, and the advantages of various machine-learning algorithms are continuously reflected.

Key words: Impact factor manipulation, JCR suppression journal, JCR editorial concern journal, JCR indicator, Machine learning, Automatic identification

姜丰辉, 刘祥鹏, 邵巍, 陈春平, 于龙振. 影响因子操纵期刊识别与分类方法构建与应用[J]. 中国科技期刊研究, 2023, 34(2): 136-143.

JIANG Fenghui, LIU Xiangpeng, SHAO Wei, CHEN Chunping, YU Longzhen. Identification and classification of journals of impact factor manipulation[J]. Chinese Journal of Scientific and Technical Periodicals, 2023, 34(2): 136-143.

　　　　 https://www.cjstp.cn/CN/Y2023/V34/I2/136

图/表 8

参考文献 16

[1]	Garfield E. Citation indexes for science. A new dimension in documentation through association of ideas[J]. Science, 1955, 122(3159):108-111. doi: 10.1126/science.122.3159.108 URL
[2]	Garfield E. Citation analysis as a tool in journal evaluation[J]. Science, 1972, 178(4060):471-479. doi: 10.1126/science.178.4060.471 pmid: 5079701
[3]	Garfield E. The history and meaning of the journal impact factor[J]. JAMA, 2006, 295(1):90-93. doi: 10.1001/jama.295.1.90 pmid: 16391221
[4]	刘雪立. 全球性SCI现象和影响因子崇拜[J]. 中国科技期刊研究, 2012, 23(2):185-190.
[5]	Brody S. Impact factor:Imperfect but not yet replaceable[J]. Scientometrics, 2013, 96(1):255-257. doi: 10.1007/s11192-012-0863-x URL
[6]	王凌峰, 叶涯剑. 期刊影响因子操纵行为及抑制策略[J]. 编辑学报, 2012, 24(6):567-570.
[7]	鞠秀芳, 郑彦宁, 潘云涛. 期刊引用操纵行为研究综述[J]. 西南民族大学学报(人文社会科学版), 2013, 34(4):224-228.
[8]	刘雪立. 论期刊影响因子人为操纵的识别[J]. 编辑学报, 2018, 30(1):98-101.
[9]	徐海丽. 影响因子人为操纵案例分析及构建期刊综合评价体系设想[J]. 中国科技期刊研究, 2014, 25(5):691-695.
[10]	马峥. 通过计量指标分析发现操纵期刊评价结果的行为[J]. 编辑学报, 2016, 28(6):608-611.
[11]	Falagas M E, Alexiou V G. The top-ten in journal impact factor manipulation[J]. Archivum Immunologiae et Therapiae Experimentalis, 2008, 56(4):223-226. doi: 10.1007/s00005-008-0024-5 pmid: 18661263
[12]	Yu G, Yang D H, He H X. An automatic recognition method of journal impact factor manipulation[J]. Journal of Information Science, 2011, 37(3):235-245. doi: 10.1177/0165551511400954 URL
[13]	Yu T, Yu G, Wang M Y, et al. Classification method for detecting coercive self-citation in journals[J]. Journal of Informetrics, 2014, 8(1):123-135. doi: 10.1016/j.joi.2013.11.001 URL
[14]	Yang D H, Li X, Sun X X, et al. Detecting impact factor manipulation with data mining techniques[J]. Scientometrics, 2016, 109(3):1989-2005. doi: 10.1007/s11192-016-2144-6
[15]	Journal Citation Reports[EB/OL]. [2022-08-06]. https://jcr.clarivate.com/jcr/home. URL
[16]	Yu G, Wang L. The self-cited rate of scientific journals and the manipulation of their impact factors[J]. Scientometrics, 2007, 73(3):321-330. doi: 10.1007/s11192-007-1779-8 URL

数据类型	决策树			随机森林			支持向量机			Adaboost
数据类型	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据
准确率 /%	99.66	99.88	99.85	99.88	99.92	99.92	98.66	98.55	98.36	99.96	100.00	100.00
精确率 /%	99.95	99.91	99.95	99.87	99.91	99.91	98.65	98.49	98.25	99.95	100.00	100.00
召回率 /%	99.67	99.95	99.87	100.00	100.00	100.00	99.87	99.91	99.95	100.00	100.00	100.00
F₁分数 /%	99.81	99.93	99.91	99.93	99.95	99.95	99.26	99.20	99.10	99.97	100.00	100.00

数据类型	决策树			随机森林			支持向量机			Adaboost
数据类型	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据
准确率 /%	99.66	99.88	99.85	99.88	99.92	99.92	98.66	98.55	98.36	99.96	100.00	100.00
精确率 /%	99.95	99.91	99.95	99.87	99.91	99.91	98.65	98.49	98.25	99.95	100.00	100.00
召回率 /%	99.67	99.95	99.87	100.00	100.00	100.00	99.87	99.91	99.95	100.00	100.00	100.00
F₁分数 /%	99.81	99.93	99.91	99.93	99.95	99.95	99.26	99.20	99.10	99.97	100.00	100.00

数据类型	决策树			随机森林			支持向量机			Adaboost
数据类型	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据
准确率 /%	71.83	73.10	73.10	83.10	84.14	82.22	94.92	94.69	94.95	69.52	82.45	81.76
精确率 /%	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00
召回率 /%	71.83	73.10	73.10	83.10	84.14	82.22	94.92	94.69	94.95	69.52	82.45	81.76
F₁分数 /%	83.60	84.46	84.46	90.77	91.39	90.24	97.39	97.27	97.41	82.02	90.38	89.96

数据类型	决策树			随机森林			支持向量机			Adaboost
数据类型	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据
准确率 /%	71.83	73.10	73.10	83.10	84.14	82.22	94.92	94.69	94.95	69.52	82.45	81.76
精确率 /%	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00
召回率 /%	71.83	73.10	73.10	83.10	84.14	82.22	94.92	94.69	94.95	69.52	82.45	81.76
F₁分数 /%	83.60	84.46	84.46	90.77	91.39	90.24	97.39	97.27	97.41	82.02	90.38	89.96

数据类型	决策树			随机森林			支持向量机			Adaboost
数据类型	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据
准确率 /%	83.94	99.83	86.69	99.87	100.00	98.48	99.87	99.95	99.95	87.30	87.51	93.69
精确率 /%	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00
召回率 /%	83.90	99.83	86.65	99.87	100.00	98.48	99.91	100.00	100.00	87.26	87.47	93.67
F₁分数 /%	91.24	99.91	92.84	99.93	100.00	99.23	99.95	100.00	100.00	93.20	93.31	96.73

选择文件类型/文献管理软件名称

选择包含的内容

影响因子操纵期刊识别与分类方法构建与应用

Identification and classification of journals of impact factor manipulation

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 16

相关文章 0

编辑推荐

Metrics

本文评价

关于我们

联系我们

数据类型	决策树			随机森林			支持向量机			Adaboost
数据类型	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据
准确率 /%	1.99	1.83	1.99	3.32	3.16	2.82	56.90	32.77	27.45	19.13	2.32	1.99
精确率 /%	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00
召回率 /%	1.66	1.50	1.66	3.00	2.83	2.50	56.92	32.72	27.37	18.86	2.00	1.66
F₁分数 /%	3.28	2.96	3.28	5.83	5.51	4.88	72.55	49.30	42.98	31.74	3.92	3.28

数据类型	决策树			随机森林			支持向量机			Adaboost
数据类型	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据	1年数据	3年数据	5年数据
准确率 /%	1.99	1.83	1.99	3.32	3.16	2.82	56.90	32.77	27.45	19.13	2.32	1.99
精确率 /%	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00
召回率 /%	1.66	1.50	1.66	3.00	2.83	2.50	56.92	32.72	27.37	18.86	2.00	1.66
F₁分数 /%	3.28	2.96	3.28	5.83	5.51	4.88	72.55	49.30	42.98	31.74	3.92	3.28