Chinese Journal of Scientific and Technical Periodicals ›› 2023, Vol. 34 ›› Issue (3): 341-347. doi: 10.11946/cjstp.202206200474

Previous Articles     Next Articles

Optimization of the mechanism for data mining of potential readers of Chinese academic journals based on big data

TIAN Haijiang1)()(), HUANG Jianghua2)   

  1. 1) Journal Publishing Center of Chongqing University of Posts and Telecommunications, 2 Chongwen Road, Nan'an District, Chongqing 400065, China
    2) Journal Publishing Center of Yangtze Normal University, 16 Juxian Avenue, Fuling District, Chongqing 408100, China
  • Received:2022-06-20 Revised:2023-02-07 Online:2023-03-15 Published:2023-04-21

基于大数据的中文学术期刊传播对象数据精准挖掘逻辑优化

田海江1)()(), 黄江华2)   

  1. 1) 重庆邮电大学期刊社,重庆市南岸区崇文路2号 400065
    2) 长江师范学院期刊社,重庆市涪陵区聚贤大道16号 408100
  • 作者简介:

    田海江(ORCID:0009-0005-5890-4613),硕士,副编审,E-mail:;

    黄江华,硕士,副编审。

    作者贡献声明: 田海江:撰写论文; 黄江华:修订论文。
  • 基金资助:
    教育部产学合作协同育人项目“基于人工智能及大数据的学术期刊精准传播系统”(JYB202002325051)

Abstract:

[Purposes] Based on the mainstream communication data mining logic mechanism supported by the current big data technology, namely the author relevance mechanism and the document fragmentation natural language processing mechanism, this study aims to conduct experimental design and data statistics around the relevance, effectiveness, and other dimensions of propagation objects. [Methods] In the CNKI database, 100 documents were randomly selected according to the discipline classification as the propagation samples, and a total of 10000 propagation objects were mined in mechanism application platforms for Delphi judgment and data statistical analysis. The analysis dimension involved a series of graph and information indicators such as timeliness correlation, matching correlation, and document frequency. [Findings] The analysis shows that the author relevance mechanism has inherent defects and it is difficult to continue optimization. Although the document fragmentation natural language processing mechanism has the objective problem that the representation of discipline classification and the essence of matching clustering are not easy to bridge, the effect of data mining can be improved by optimizing data mining logic. [Conclusions] Based on the analysis results, an optimization path is proposed by improving algorithm mapping and abandoning the average data, and its effectiveness is verified through experiments.

Key words: Big data, Academic journal, Accurate communication, Potential reader, Data mining

摘要:

【目的】基于当前大数据技术支撑的主流传播数据挖掘逻辑机制,即作者关联度机制及文献碎片化自然语言处理机制,围绕传播对象的相关性、时效性等维度进行实验设计和数据统计。【方法】在CNKI数据库中按照学科分类随机选取100篇文献作为传播样本,在各机制应用平台挖掘共计10000个传播对象进行德尔菲式的判定及数据统计分析。分析维度涉及时效相关性、匹配相关性、发文频率等一系列图情指标。【结果】分析显示,作者关联度机制存在内生性问题,难以继续优化;文献碎片化自然语言处理机制虽存在学科分类表象与匹配聚类实质不易弥合的客观问题,但可以通过优化数据挖掘逻辑提升数据挖掘效果。【结论】基于分析结果,通过改进算法映射及摒弃“超龄”数据来提出优化路径,并通过实验验证其有效性。

关键词: 大数据, 学术期刊, 精准传播, 传播对象, 数据挖掘