出版数据用于人工智能模型训练的期刊版权保护问题研究

doi:10.11946/cjstp.202510251279

中国科技期刊研究 ›› 2026, Vol. 37 ›› Issue (2): 173-180. doi: 10.11946/cjstp.202510251279

出版数据用于人工智能模型训练的期刊版权保护问题研究

倪婧¹^,²^,^3）()(), 郝秀原^2）, 任胜利^4）, 张久珍^1）^,*()()

1）北京大学信息管理系，北京市海淀区颐和园路5号 100871
2）中华医学会杂志社《中华医学杂志（英文版）》编辑部，北京市西城区东河沿街69号 100052
3）国家新闻出版署医学期刊知识挖掘与服务重点实验室，北京市西城区东河沿街69号 100052
4）《中国科学》杂志社，北京市东城区东黄城根北街16号 100717

收稿日期:2025-10-25 修回日期:2025-12-07 出版日期:2026-02-25 发布日期:2026-04-01
通讯作者: 张久珍
作者简介:
倪婧（ORCID：0000-0002-8018-0683），硕士，副编审，E-mail： nijing@cmaph.org；
郝秀原，博士，编审
任胜利，博士，编审。
作者贡献声明：倪婧：文献调研与整理、收集数据、采集、清洗与分析数据、起草论文、修订论文；郝秀原：参与论文修订；任胜利：参与论文修订、审核论文；张久珍：设计论文框架、论文最终版本修订。

Copyright protection of journal‑published research data used for artificial intelligence model training

NI Jing¹^,²^,^3）()(), HAO Xiuyuan^2）, REN Shengli^4）, ZHANG Jiuzhen^1）()()

1）Department of Information Management，Peking University，5 Yiheyuan Road，Haidian District，Beijing 100871，China
2）Editorial Office of Chinese Medical Journal，Chinese Medical Association，69 Dongheyan Street，Xicheng District，Beijing100052，China，
3）Key Laboratory of Medical Knowledge Mining and Services，National Press and Publication Administration，69 Dongheyan Street，Xicheng District，Beijing 100052，China
4）Science China Press，16 Donghuangchenggen North Street，Dongcheng District，Beijing 100717，China

Received:2025-10-25 Revised:2025-12-07 Online:2026-02-25 Published:2026-04-01
Contact: ZHANG Jiuzhen

摘要/Abstract

摘要：

目的探讨AI模型训练场景下，出版数据供、需双方的权责边界，为完善我国科技期刊出版数据版权保护和构建公平合理的流通规则提供参考。方法以科技期刊出版利益相关方为研究对象，包括出版机构、行业协会及政府部门，检索各方公开发布的版权协议、技术条款、立场声明、政策与法规文本，分析版权协议、技术限制及义务条款的特征。采用案例分析法，选取Elsevier， Springer Nature， Sage， Wiley和Taylor & Francis 5家出版机构的实践案例，比较不同出版数据用于AI模型训练模式的版权侵权风险。结果仅少数版权协议明确约定涉及AI训练的各种情形。出版机构或制定技术性限制条款、或态度中立、或采用更适用于AI训练场景的数据管理模式。开放获取论文占比较高的出版机构更倾向于主动提供数据访问。将出版数据用于AI模型训练的出版机构采用2种策略：自主研发“内部合理使用”和内容授权许可给第三方。二者数据流通范围不同，涉及的版权边界争议点分别为“合理使用”的界定和授权链条完整性。结论为应对AI模型训练需求，版权协议应补充“合理使用”适用情形或“分许可”条款，出版机构应制定面向AI模型训练需求的版权管理方案，出版数据持有者与AI模型开发者应明确数据权限、责任边界，为构建市场化的数据流通收益分配机制和争议解决机制提供保障，学/协会和政府部门应出台针对出版数据用于AI训练的专项版权指引，规范出版数据的合规、高效流通环境，探索建立公平的收益分配机制，指导行业有序发展。

关键词: 科技期刊, 出版数据, 网络爬虫, 文本与数据挖掘, 人工智能, 模型, 著作权

Abstract:

Purposes Explore the boundaries of rights and responsibilities between publishers and data users in AI model training scenarios， providing a reference for improving copyright protection and circulation rules for scientific journal-published data in China. Methods This study examines stakeholders in scientific journal publishing，including publishing institutions， industry associations， and government bodies，by analyzing publicly available copyright agreements， technical terms， position statements， policies， and regulatory documents. It aims to characterize the features of copyright clauses， technical restrictions， and obligation terms in these materials. A case study approach is adopted， focusing on the practices of five leading publishing institutions： Elsevier， Springer Nature， Sage， Wiley， and Taylor & Francis. The study compares copyright infringement risks associated with different models of using published data for AI training. Findings At the scientific journal level， copyright transfer agreements for AI model training have yet to be substantially updated. As key players in protecting published data copyrights， leading publishers have only a small number of copyright agreements explicitly address various scenarios related to AI training. Publishing institutions may adopt technical restriction clauses， maintain a neutral stance， or implement data management models better suited for AI training scenarios. Publishers with a higher proportion of open-access papers are more inclined to proactively provide data access. When utilizing published data for AI model training， publishing institutions typically adopt two strategies：independent development under “internal fair use” provisions or licensing content to third parties. These two approaches differ in terms of data circulation scope. The key copyright boundary disputes involved are the definition of “fair use” and completeness of the authorization chain. Conclusions To address the needs of AI model training， copyright agreements should be supplemented with provisions clarifying the applicability of “fair use” or adding “sublicensing” clauses. Publishing institutions should develop copyright management frameworks tailored to AI training requirements. Data holders in publishing institutions and AI model developers must clearly define data permissions and delineate responsibility boundaries， thereby providing a foundation for establishing market-oriented mechanisms for data circulation revenue distribution and dispute resolution. Academic associations and government departments should issue specialized copyright guidelines for the use of published data in AI training， regulate the compliant and efficient circulation of such data， explore the establishment of equitable revenue-sharing mechanisms， and guide the orderly development of the industry.

Key words: Scientific journals, Published data, Web crawler, Text and data mining, Artificial intelligence, Model, Copyright

倪婧, 郝秀原, 任胜利, 张久珍. 出版数据用于人工智能模型训练的期刊版权保护问题研究[J]. 中国科技期刊研究, 2026, 37(2): 173-180.

NI Jing, HAO Xiuyuan, REN Shengli, ZHANG Jiuzhen. Copyright protection of journal‑published research data used for artificial intelligence model training[J]. Chinese Journal of Scientific and Technical Periodicals, 2026, 37(2): 173-180.

　　　　 https://www.cjstp.cn/CN/Y2026/V37/I2/173

图/表 2

参考文献 21

[1]	苏衡，严三九. 数据可供性：数字出版产业转型发展的生产要素逻辑［J］. 出版发行研究，2024（4）：38-43.
	Su H， Yan S J. Data affordance： the logic of production factors in the transformation and development in the digital publishing industry［J］. Publishing Research， 2024（4）： 38-43.
[2]	高阳. 人工智能训练数据侵犯著作权行为规制［J］. 中国出版， 2024（15）：12-18.
	Gao Y. Regulation on copyright infringement of artificial intelligence training data［J］. China Publishing Journal， 2024（15）： 12-18.
[3]	张涛. 生成式人工智能训练数据集的法律风险与包容审慎规制［J］. 比较法研究， 2024（4）：86-103.
	Zhang T. Legal risks of generative AI training datasets and inclusive prudential regulation［J］. Journal of Comparative Law， 2024（4）： 86-103.
[4]	杭敏，陆泓承. 数据要素赋能出版业发展的路径与机制探讨［J］. 中国编辑， 2024（7）：18-23.
	Hang M， Lu H C. Discussion on pathways and mechanisms of data elements empowering the development of the publishing industry［J］. Chinese Editors Journal， 2024（7）：18-23.
[5]	郭嘉. 媒介融合语境下数据出版知识空间建构研究［J］. 编辑之友， 2024（8）：54-59.
	Guo J. The construction of knowledge space in data publishing in the context of media convergence［J］. Editorial Friend， 2024（8）： 54-59.
[6]	汪慧玲，姚长青，雷雪. 科技期刊论文关联数据的确权现状及治理对策［J］. 中国科技期刊研究，2025，36（1）： 18-24.
	Wang H L， Yao C Q， Lei X. Current status of rights confirmation and governance strategies for paper-related data of scientific and technical journal［J］. Chinese Journal of Scientific and Technical Periodicals， 2025， 36（1）： 18-24.
[7]	杨旦修. 论数字出版产业平台的数据合规与算法信任［J］. 编辑之友， 2024（8）：47-53.
	Yang D X. Data compliance and algorithm trust of digital publishing industry platform［J］. Editorial Friend， 2024（8）： 47-53.
[8]	叶悦. AI大模型时代出版内容数据保护的理据与进路［J］. 出版与印刷， 2025（1）：27-36.
	Ye Y. The rationale and approach for data protection of published contents in the era of AI big models［J］. Publishing & Printing， 2025（1）： 27-36.
[9]	Guidelines on using STM content for TDM and for training of AI \| Technical summary［EB/OL］. ［2025-06-21］.https://s3.eu-west-2.amazonaws.com/stm.offloadmedia/wp-content/uploads/2024/12/30174748/2025-09-30-STM-Art-4-Technical-Summary.pdf. URL
[10]	IPA position on generative AI and copyright［EB/OL］. ［2025-06-20］. https://www.internationalpublishers.org/wp-content/uploads/2024/02/IPA-position-on-Generative-AI-and-Copyright-Policy-FEB-2024.pdf. URL
[11]	STM Statement regarding unlicensed use of STM’s members’ content in the training， development， and operation of AI models［EB/OL］.［2025-04-08］.https://stm-assoc.org/document/stm-statement-regarding-unlicensed-use-of-stms-members-content-in-the-training-development-and-operation-of-ai-models/. URL
[12]	Statement on AI training［EB/OL］. ［2025-06-20］.https://www.aitrainingstatement.org/. URL
[13]	Artificial intelligence risk management framework （AI RMF 1.0）［EB/OL］.［2025-06-20］. https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf. URL
[14]	S.4674-content origin protection and integrity from edited and deepfaked media act of 2024［EB/OL］.［2025-08-20］. https://www.congress.gov/bill/118th-congress/senate-bill/4674/text. URL
[15]	AB 3211： California digital content provenance standards ［EB/OL］.［2025-08-20］.https://calmatters.digitaldemocracy.org/bills/ca_202320240ab3211. URL
[16]	Regulation（EU）2024/1689 of the European Parliament and of the Council［EB/OL］. ［2025-05-15］.https://eur-lex.europa.eu/legal-content/EN/PDF/?uri=OJ:L_202401689. URL
[17]	中共中央国务院关于构建数据基础制度更好发挥数据要素作用的意见［EB/OL］.［2025-08-20］.https://www.gov.cn/zhengce/2022-12/19/content_5732695.htm. URL
	Opinions of the Central Committee of the Communist Party of China and the State Council on building a foundational data system to better leverage the role of data elements ［EB/OL］. ［2025-08-20］. https://www.gov.cn/zhengce/2022-12/19/content_5732695.htm. URL
[18]	生成式人工智能服务管理暂行办法［EB/OL］.［2025-08-20］.https://www.gov.cn/zhengce/zhengceku/202307/content_6891752.htm. URL
	Interim measures for the management of generative artificial intelligence services ［EB/OL］. ［2025-08-20］. https://www.gov.cn/zhengce/zhengceku/202307/content_6891752.htm. URL
[19]	关于印发《人工智能生成合成内容标识办法》的通知［EB/OL］.［2025-08-21］.https://www.cac.gov.cn/2025-03/14/c_1743654684782215.htm. URL
	Notice on issuing the “Measures for the Identification of AI-Generated Synthetic Content”［EB/OL］.［2025-08-21］. https://www.cac.gov.cn/2025-03/14/c_1743654684782215.htm. URL
[20]	国务院关于深入实施“人工智能+”行动的意见［EB/OL］.［2025-08-21］.https://www.gov.cn/gongbao/2025/issue_12266/202509/content_7039598.html. URL
	Opinions of the State Council on deepening the implementation of the “Artificial Intelligence +” initiative［EB/OL］. ［2025-08-21］. https://www.gov.cn/gongbao/2025/issue_12266/202509/content_7039598.html. URL
[21]	工业和信息化部等部门公开征求对《人工智能科技伦理管理服务办法（试行）》的意见［EB/OL］.［2025-09-02］.https://www.miit.gov.cn/jgsj/kjs/jscx/gjsfz/art/2025/art_092a447008f340d3abd55819b8c8e5cf.html. URL
	Notice by the Ministry of Industry and Information Technology and Other Departments on soliciting public opinions on the “Interim Measures for the Management of Artificial Intelligence Science and Technology Ethics” （Trial）［EB/OL］. ［2025-09-02］. https://www.miit.gov.cn/jgsj/kjs/jscx/gjsfz/art/2025/art_092a447008f340d3abd55819b8c8e5cf.html. URL

出版行业协会	文件名称	发布日期	核心内容
国际科学、技术与医学出版商联盟（STM）	关于使用STM内容进行文本与数据挖掘及人工智能模型/系统训练的指导原则	2024-03-27	TDM的释义、权利范围，标注免责声明的建议、数据挖掘相关的技术性要求、预防未经许可的TDM （AI训练）的建议
国际科学、技术与医学出版商联盟（STM）	STM关于在AI模型训练、开发和运营中未经授权使用其成员内容的声明	2024-07-09	未经授权、未支付报酬且未标注来源，擅自使用会员内容进行AI训练的行为构成侵权
美国出版商协会（Association of American Publishers）	关于版权作品用于AI训练的声明	2024-10-22	“未经许可使用创意作品来训练生成式AI，对这些作品创作者及相关方的生计构成了重大的、不公正的威胁，绝不能被允许”
国际出版商协会（IPA）联合英国出版商协会（UK Publishers Association）	IPA关于生成式人工智能和版权的立场	2024-11-01	收集、处理、存储和复制作品用于训练AI模型的行为涉及作者的专有权利，生成式AI公司必须按照权利人指引的方式获取作品使用授权，应披露用于训练AI模型的作品信息，应在尊重版权的前提下履行法律责任，盗版网站应被视为AI训练及盈利的禁区，呼吁各国政府维护版权，抵制科技公司对公众拥有过多权力

出版行业协会	文件名称	发布日期	核心内容
国际科学、技术与医学出版商联盟（STM）	关于使用STM内容进行文本与数据挖掘及人工智能模型/系统训练的指导原则	2024-03-27	TDM的释义、权利范围，标注免责声明的建议、数据挖掘相关的技术性要求、预防未经许可的TDM （AI训练）的建议
国际科学、技术与医学出版商联盟（STM）	STM关于在AI模型训练、开发和运营中未经授权使用其成员内容的声明	2024-07-09	未经授权、未支付报酬且未标注来源，擅自使用会员内容进行AI训练的行为构成侵权
美国出版商协会（Association of American Publishers）	关于版权作品用于AI训练的声明	2024-10-22	“未经许可使用创意作品来训练生成式AI，对这些作品创作者及相关方的生计构成了重大的、不公正的威胁，绝不能被允许”
国际出版商协会（IPA）联合英国出版商协会（UK Publishers Association）	IPA关于生成式人工智能和版权的立场	2024-11-01	收集、处理、存储和复制作品用于训练AI模型的行为涉及作者的专有权利，生成式AI公司必须按照权利人指引的方式获取作品使用授权，应披露用于训练AI模型的作品信息，应在尊重版权的前提下履行法律责任，盗版网站应被视为AI训练及盈利的禁区，呼吁各国政府维护版权，抵制科技公司对公众拥有过多权力

选择文件类型/文献管理软件名称

选择包含的内容

出版数据用于人工智能模型训练的期刊版权保护问题研究

Copyright protection of journal‑published research data used for artificial intelligence model training

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 2

参考文献 21

相关文章 15

编辑推荐

Metrics

本文评价

关于我们

联系我们

[1]	张艺霖, 符有梅, 栾嘉, 彭熙, 王新娟, 邹浪, 吴瑜, 何杰玲, 邓强庭. 我国区域科技期刊集群化发展现况与启示：基于重庆的实证研究[J]. 中国科技期刊研究, 2026, 37(3): 323-333.
[2]	李玥, 唐定国, 刘秋池, 符有梅, 严涵, 汤锦波, 邓强庭, 张艺霖, 彭熙, 栾嘉. 我国区域科技期刊集群化发展的现存困局与破局路径：基于重庆的实证研究[J]. 中国科技期刊研究, 2026, 37(3): 334-343.
[3]	刘秋池, 唐定国, 严涵, 符有梅, 汤锦波, 邓强庭, 李玥, 张艺霖, 彭熙, 栾嘉. 我国区域科技期刊集群化发展的需求动因与深化变革：基于重庆的实证分析[J]. 中国科技期刊研究, 2026, 37(3): 344-353.
[4]	刘志远, 王微, 刘俊丽. “龙头”“龙尾”理念的内涵及其对一流科技期刊建设的启示[J]. 中国科技期刊研究, 2026, 37(3): 367-373.
[5]	高强, 张海峰, 林松清, 佘诗刚. 我国一流英文科技期刊的核心差距与突破路径——基于主客观融合视角的分析[J]. 中国科技期刊研究, 2026, 37(3): 374-383.
[6]	叶喜艳, 赵光平, 张启松, 姜育彦, 常宗强, 张静辉, 陈小红. 国产科技期刊出版平台全链条运营实践及启示[J]. 中国科技期刊研究, 2026, 37(3): 384-392.
[7]	邢宇洋. 面向期刊编辑的医学论文智能审核平台构建与效能评估——基于大语言模型氛围编程模式的实践探索[J]. 中国科技期刊研究, 2026, 37(3): 393-403.
[8]	李丹璐, 周莉花, 刘丽娟, 章晓光. 农林科技期刊微信公众号的传播特征及优化策略[J]. 中国科技期刊研究, 2026, 37(3): 404-412.
[9]	翁彦琴, 闫瑞娟, 靳炜, 吴艾桐, 钟艳. 科技期刊封面的公众传播模式探析[J]. 中国科技期刊研究, 2026, 37(3): 413-420.
[10]	占莉娟, 孙绪壕, 夏紫艳, 梁永霞, 刘叶萍. 学术期刊开放同行评议模式采纳行为的影响因素与优化策略[J]. 中国科技期刊研究, 2026, 37(3): 445-458.
[11]	尚媛媛. 我国人文社科领域学术期刊AIGC规范政策：共识、差异与未来进路[J]. 中国科技期刊研究, 2026, 37(2): 181-192.
[12]	刘雪梅, 曾元祥, 雷芳, 董敏, 杨竣铎, 杜亮, 刘伦旭. 大语言模型辅助中文期刊同行评审的实践路径探索[J]. 中国科技期刊研究, 2026, 37(2): 193-200.
[13]	李沛寰. AI辅助工具的应用与科技期刊编辑职业压力的关联——基于问卷调查的实证分析[J]. 中国科技期刊研究, 2026, 37(2): 228-237.
[14]	张家烁, 金帆, 陈洪侃, 张小凡, 黄国彬. 责任主体视角下我国科学数据出版的主要障碍及应对建议[J]. 中国科技期刊研究, 2026, 37(2): 238-247.
[15]	王琳辉, 王尔亮, 田甜, 倪明. 中国开放获取科技期刊著作权协议常见问题及解决对策：以上海市生物医学金色OA和钻石OA期刊为例[J]. 中国科技期刊研究, 2026, 37(2): 265-277.