Abstract:
[Purposes] The serious manipulation of journal impact factors has seriously affected its objectivity, and this improper behavior should be strictly prohibited. It is urgent to find effective methods for identifying manipulated journals. [Methods] Taking the JCR data published on the Web of Science platform as the research object, the data on 14 bibliometrics indexes of normal journals and abnormal (suppressed due to manipulation of impact factors) journals were selected to form two data sets (normal and abnormal). Python Scikit-learn library was used to compile machine learning algorithm program to classify, train, verify, and test the training set, verification set, and test set generated from the normal and abnormal combined data set. [Findings] The machine-learning algorithm effectively classifies the normal and abnormal journal data sets, with precision, accuracy, and recall rate in data validation sets reaching more than 98%. The feature importance of the 5 most important features of the algorithm is 91.55%. The recognition effect of some algorithms on the data of the fifth year after the suppression of the journals restored to normal begins to decline. All the journals concerned by editors are classified as abnormal journals. The 2021 edition JCR suppression and suppression-warning journals are accurately classified as abnormal journals. Support vector machine algorithm has an optimal prediction effect. [Conclusions] The machine-learning algorithm has natural advantages of rapidity and objectivity in the recognition of journals of impact factors manipulation. With the increasing number of manipulation methods of impact factors and bibliometric indicators, it is more and more difficult to manually synthesize various indicators for identification and judgment, and the advantages of various machine-learning algorithms are continuously reflected.
Key words:
Impact factor manipulation,
JCR suppression journal,
JCR editorial concern journal,
JCR indicator,
Machine learning,
Automatic identification
摘要:
【目的】 严重的期刊影响因子操纵现象影响了影响因子客观性,这种不正当行为应该被严格禁止,识别受操纵期刊的有效方式亟待发掘。【方法】 以Web of Science 平台发布的历年JCR数据为研究对象,选取正常期刊和异常(因影响因子受操纵而被镇压)期刊的14个文献计量学指标的历年数据,形成正常和异常2个期刊数据集。利用Python Scikit-learn库编写机器学习算法程序,对由正常、异常期刊数据集合并生成的训练集、验证集和测试集分别进行分类、训练、验证、测试。【结果】 机器学习算法可以有效地对正常、异常期刊数据集进行分类,对验证集分类的准确率、精确率和召回率均达到98%以上,对算法最重要的5个特征的特征重要性为91.55%。部分算法对镇压后恢复正常期刊在镇压后第5年的数据的识别效果开始降低,所有编辑关注期刊均被分类为异常期刊,2021版JCR镇压期刊及镇压预警期刊均被准确分类为异常期刊。支持向量机算法具有最好的预测效果。【结论】 机器学习算法在识别影响因子操纵期刊上具有天然的快速性和客观性优势。随着对影响因子的操纵手法及文献计量学指标不断增多,人工综合各种指标来识别、判定受操纵期刊的难度越来越大,各种机器学习算法的优势不断凸显。
关键词:
影响因子操纵,
JCR镇压期刊,
JCR编辑关注期刊,
JCR指标,
机器学习,
自动识别
JIANG Fenghui, LIU Xiangpeng, SHAO Wei, CHEN Chunping, YU Longzhen. Identification and classification of journals of impact factor manipulation[J]. Chinese Journal of Scientific and Technical Periodicals, 2023, 34(2): 136-143.
姜丰辉, 刘祥鹏, 邵巍, 陈春平, 于龙振. 影响因子操纵期刊识别与分类方法构建与应用[J]. 中国科技期刊研究, 2023, 34(2): 136-143.