摘要: 目的 为科技期刊自动提取更加全面的元数据提供方法和借鉴。 方法 以方正排版文件为对象,建立了提取元数据的数学模型,同时提出尾部分割算法。然后利用基于对象的VB编程软件编写了自动提取元数据程序。 结果 在分析了方正排版语言特点之后,对方正排版文件进行了字符串替换处理,并建立了分割关键词列表文件,最后将提取的元数据保存到Excel文件中。 结论 实际应用表明,仅几秒钟就可以完成一期数据的提取工作,大大提高了工作效率。
关键词:
网刊发布系统,
元数据,
方正排版,
VB,
自动提取
Abstract:
[Purpose] The objective of this paper is to automatically extract more comprehensive metadata from the journals of science and technology.[Methodology] A mathematicalmodel which takes founder typesetting files as the object is established to extract themetadata,and also the tail segmentation algorithm is advanced.Then,the automatic metadata extraction software is programmed based on VB programm ing software.[Findings] The strings of founder typesetting files are replaced,w ith analyzing the founder typesetting language features,and then a segmentation keywords list file is established.Finally,the extractedmetadata is saved to the Excel file.[Conclusions] The actual application shows that completing the extraction work of 1 issue consumes only a few seconds,which greatly improves the work efficiency.
Key words:
Network publishing system,
Metadata,
Founder,
VB,
Automatic extraction
杨海亮,徐用吉. 利用VB读取方正排版文件提取元数据[J]. 中国科技期刊研究, 2015, 26(6): 612-617.
YANG Hailiang,XU Yongji. Research on metadata extraction by using VB to read from founder typesetting files[J]. Chinese Journal of Scientific and Technical Periodicals, 2015, 26(6): 612-617.