【目的】 分析规范化数据对于主题信息挖掘的影响,并提出遥感领域文献关键词标引规则的制定建议。【方法】 以遥感领域1996年、2006年以及2016年中国知网收录的文献为3个数据子集,抽取文献的关键词,并按照关键词出现频次进行从高到低排序,提取排名靠前的(如前10%)的高频词作为研究对象。通过关键词的共现频次、期刊影响因子、文献被引频次等构建相似矩阵,利用谱聚类方法进行遥感领域研究主题的挖掘。【结果】 对关键词进行尺度规范化以及名称规范化处理,使聚类结果由初步分析到优化分析,这也是初级结果进一步细化的过程,但是最终的聚类结果中很多信息并没有体现出来,因此对关键词按照一定的规则进行重塑后再次进行聚类,聚类结果中凸显出遥感数据源的信息、传感器的信息、技术方法的信息等,使研究主题进一步清晰。【结论】 关键词规范化对提高文献主题信息挖掘是可行的,因此按照遥感领域的学科特点对关键词按照一定的规则进行规范化处理是非常必要。延伸到科技期刊出版领域,规范化的数据可为期刊的高效精准知识服务提供数据支撑。
[Purposes] This paper aims to analyze the influence of data normalization on subject information mining and propose suggestions on the indexing of keywords in remote sensing. [Methods] Articles on remote sensing published in 1996, 2006, and 2016 in CNKI were taken as three data subsets. Keywords were extracted from them and ranked in the descending order of frequency, and the yielded top 10% words were studied. The similarity matrix was constructed based on co-occurrence frequency of keywords, impact factors of journals, and cited frequency of articles, and the hierarchical spectral clustering method was used to extract the information on the research topics in remote sensing. In this process, the influence of scale standardization and name standardization of keywords on the subject information mining was considered. [Findings] The standardization of keyword scale and name detailed the clustering result, thereby optimizing the results. However, a lot of information failed to be demonstrated in the final clustering results. Therefore, the keywords were further standardized according to certain rules before clustering, thus highlighting the remote sensing data sources, sensors, and technical methods. On this basis, the research topics were clarified. [Conclusions] Standardization of keywords is viable to improve the literature subject information mining. Therefore, it is necessary to normalize the keywords in remote sensing according to the characteristics of the discipline. For scientific journals, data standardization can support efficient and targeted knowledge services.