中国科技期刊研究 ›› 2021, Vol. 32 ›› Issue (12): 1535-1548. doi: 10.11946/cjstp.202008100733

• 数字出版 • 上一篇    下一篇

关键词规范化对文献主题信息挖掘的影响——以遥感领域为例

边钊1)(), 唐娉2),*(), 闫珺1)   

  1. 1)中国科学院空天信息创新研究院 《遥感学报》编辑部,北京市海淀区中关村北四环西路19号 100190
    2)中国科学院空天信息创新研究院,北京市海淀区邓庄南路9号 100094
  • 收稿日期:2020-08-10 修回日期:2021-10-10 出版日期:2021-12-15 发布日期:2021-12-28
  • 通讯作者: 唐娉 E-mail:bianzhao@aircas.ac.cn;358469594@qq.com
  • 作者简介:边 钊(ORCID:0000-0002-8807-8368),博士,副编审,E-mail: bianzhao@aircas.ac.cn;|闫 珺,博士,正高级工程师,学会与期刊部主任。
  • 基金资助:
    2021年度中国科技期刊卓越行动计划选育高水平办刊人才子项目—青年人才支持项目(2021ZZ051902)

Influence of keyword normalization on literature subject information mining: A case study of remote sensing field

BIAN Zhao1)(), TANG Ping2),*(), YAN Jun1)   

  1. 1) Editorial Office of National Remote Sensing Bulletin, Aerospace Information Research Institute, Chinese Academy of Sciences, 19 North Fourth Ring Road West, Zhongguancun, Haidian District, Beijing 100190, China
    2) Aerospace Information Research Institute, Chinese Academy of Sciences, 9 Dengzhuang South Road, Haidian District, Beijing 100094, China
  • Received:2020-08-10 Revised:2021-10-10 Online:2021-12-15 Published:2021-12-28
  • Contact: TANG Ping E-mail:bianzhao@aircas.ac.cn;358469594@qq.com

摘要:

【目的】 分析规范化数据对于主题信息挖掘的影响,并提出遥感领域文献关键词标引规则的制定建议。【方法】 以遥感领域1996年、2006年以及2016年中国知网收录的文献为3个数据子集,抽取文献的关键词,并按照关键词出现频次进行从高到低排序,提取排名靠前的(如前10%)的高频词作为研究对象。通过关键词的共现频次、期刊影响因子、文献被引频次等构建相似矩阵,利用谱聚类方法进行遥感领域研究主题的挖掘。【结果】 对关键词进行尺度规范化以及名称规范化处理,使聚类结果由初步分析到优化分析,这也是初级结果进一步细化的过程,但是最终的聚类结果中很多信息并没有体现出来,因此对关键词按照一定的规则进行重塑后再次进行聚类,聚类结果中凸显出遥感数据源的信息、传感器的信息、技术方法的信息等,使研究主题进一步清晰。【结论】 关键词规范化对提高文献主题信息挖掘是可行的,因此按照遥感领域的学科特点对关键词按照一定的规则进行规范化处理是非常必要。延伸到科技期刊出版领域,规范化的数据可为期刊的高效精准知识服务提供数据支撑。

关键词: 数据挖掘, 主题信息, 关键词, 规范化

Abstract:

[Purposes] This paper aims to analyze the influence of data normalization on subject information mining and propose suggestions on the indexing of keywords in remote sensing. [Methods] Articles on remote sensing published in 1996, 2006, and 2016 in CNKI were taken as three data subsets. Keywords were extracted from them and ranked in the descending order of frequency, and the yielded top 10% words were studied. The similarity matrix was constructed based on co-occurrence frequency of keywords, impact factors of journals, and cited frequency of articles, and the hierarchical spectral clustering method was used to extract the information on the research topics in remote sensing. In this process, the influence of scale standardization and name standardization of keywords on the subject information mining was considered. [Findings] The standardization of keyword scale and name detailed the clustering result, thereby optimizing the results. However, a lot of information failed to be demonstrated in the final clustering results. Therefore, the keywords were further standardized according to certain rules before clustering, thus highlighting the remote sensing data sources, sensors, and technical methods. On this basis, the research topics were clarified. [Conclusions] Standardization of keywords is viable to improve the literature subject information mining. Therefore, it is necessary to normalize the keywords in remote sensing according to the characteristics of the discipline. For scientific journals, data standardization can support efficient and targeted knowledge services.

Key words: Data mining, Subject information, Keyword, Normalization