Four Key Legal Issues in AIGC Copyright

At the end of the article, there are surprises.

The reshaping of various industries by large models has commenced, and legal professionals are facing numerous challenges brought about by this revolutionary technology, particularly in the area of copyright. As the saying goes, “A good question is half the success,” thus it is crucial to extract the most critical issues raised by AI in the field of copyright. Based on current practical experience, the author attempts to summarize four of the most challenging issues for legal colleagues’ reference. By pooling our wisdom, overcoming these challenges will greatly benefit the rapid development of the AI industry.

1. Copyright Protection and Infringement Standards for AIGC

The AIGC technology has reached a multimodal stage, meaning that images, text, and videos can all be generated, and the level of completion has reached a degree where many industries can practically apply it. In this context, the need for individuals to personally complete expressions will become increasingly rare. A creation model where AI is responsible for expression, and humans are responsible for providing requirements and making final judgments and selections is bound to become mainstream. This fundamental shift in the creation model will inevitably raise questions regarding whether the content generated by human-machine collaboration (AIGC) can be protected by copyright under the Copyright Law, and how to establish protection standards.

In my personal opinion, the issue of copyright protection no longer requires discussion. If we still cling to the notion that only expressions completed personally by humans qualify as works, we are already severely out of sync with the current technological developments. When the quality of content generated by human-machine collaboration (AIGC) far exceeds that of human-only creations, yet the former cannot be recognized legally, this situation is absurd. The Copyright Law should shift from emphasizing encouragement of human expression to encouraging human creative ideas and aesthetic judgments (of course, ultimately still applying to works generated by human-machine collaboration). This is the principle often mentioned in legal circles: “Where there is value, there are rights.” In this regard, the Beijing Internet Court’s first case on AIGC-generated images and the Guangzhou Internet Court’s first case on AIGC platform liability have provided excellent examples.

On the basis that the necessity of protection does not require further discussion, the next challenge lies in the standards of protection. Not all AIGC content qualifies as works eligible for copyright protection, just as not all human expressions can be considered works. AIGC must establish a new standard for recognizing works that aligns with the trend of human-machine collaboration. At that time, the requirement for a work’s “originality” may need to be re-evaluated. Theoretically, if a person merely provides a prompt and then selects a piece of work they find satisfactory from several options generated by AIGC, they may still be eligible for copyright protection. This is similar to photographic works, where a photographer captures a scene with a camera, merely capturing a moment of light and shadow, and completes the creation with a click of the shutter. If such a work has aesthetic value, it can also receive copyright protection. There are many similar cases that require categorized thinking to establish standards, and this standard will also affect the future direction and enthusiasm of human literary and artistic creation through AI.

Furthermore, as more and more AIGC content gains copyright protection, the issue of how to judge infringement arises. Theoretically, two people can generate different content using the same set of prompts or generate similar content using different prompts. How to resolve these new issues in determining infringement comparisons also relates to the level of protection for AIGC works.

2. Boundaries of Data Training and Fair Use for Large Models

So far, large models still need to be trained on existing data, and the larger the data volume, the higher the model’s intelligence level. This technical characteristic highlights the value of data as a production factor.

Currently, data in the civil domain involves personal information, privacy, and intellectual property rights. Balancing the protection of these existing rights with the model’s demand for big data is a very difficult question to answer. Many of us have been saying, “Not developing is the greatest insecurity,” reflecting the industry’s mindset of prioritizing technological development to avoid widening the gap with foreign competitors. However, if you think carefully, is reckless technological development sustainable? Must technological development come at the expense of existing rights? In the AIGC platform liability case I represented, the court required the platform to exercise a certain duty of care when generating content similar to third-party IP. After the ruling, many discussed the imposition of restrictions on technology, but upon closer examination of the judgment, one piece of evidence we submitted was precisely ChatGPT’s proactive measures to avoid generating well-known IP, indicating that foreign counterparts have not abandoned rights protection, and it has not hindered technological progress.

In fact, my view is that any technological development should first conform to market logic, meaning that the application of this technology needs to achieve a win-win effect for multiple parties. Overdrawing one or more parties’ interests for short-term technological advancement is akin to an athlete using performance-enhancing drugs; it may temporarily improve results, but from the perspective of long-term sustainable career development, it is likely to be untenable.

Returning to the era of the Industrial Revolution, technology advanced by leaps and bounds, and intellectual property law emerged accordingly. It was precisely because these laws found a good balance between encouraging innovation and public access that technological achievements flourished. Some people use current open-source to refute the value of intellectual property, but one only needs to look at whether current intellectual achievements are dominated by open-source or licensed use to see the truth. Moreover, the premise of open-source is the voluntary relinquishment of rights based on the acknowledgment of rights; without protection of the source of rights, how can there be open-source? The same applies to AIGC; if we disregard the interests of data sources and engage in over-exploitative development, it would be akin to fishing in a dried-up pond. We should allow those who contribute to training data to prioritize receiving the benefits of AI technology development, thereby promoting more data resources to be invested in AI training.

Of course, I do not agree that all data needs to be authorized before being used for training, especially copyrighted data. The Copyright Law may be able to extend certain parts of copyrighted data into the scope of machine training under the fair use system. However, this process involves how to delineate boundaries, minimizing the impact on original rights holders while maximizing the effect on technological advancement. This likely requires extensive quantitative and qualitative assessments, thus this issue also presents a high degree of professional difficulty.

3. Layered Responsibility of AIGC Platforms

Since the advent of the internet, various types of platforms have emerged, and legally they all share a common designation: network service providers. In a complex network ecosystem, different network service providers occupy different ecological niches, each performing different tasks and having distinct roles. Therefore, it is unscientific to assess platform responsibility in a generalized manner; corresponding duties of care and responsibilities should be allocated based on platform types. This has been reflected in typical cases I have represented, such as mini-program platform liability cases and cloud computing SaaS platform liability cases. Moreover, after the Civil Code was enacted, Article 1195 clearly stipulates that network service providers must take necessary measures based on preliminary evidence of infringement and service type, providing a more robust legal basis for imposing duty of care according to network service types.

This logic remains applicable in the field of AIGC large model services. Clearly, both currently and for a long time to come, large models will be divided into three progressive levels. The bottom level consists of basic infrastructure, which refers to the most fundamental general large models that provide general model capabilities not targeted at any specific use case, similar to current cloud computing technology. The middle layer consists of large models for vertical or industry-specific fields, which have certain specialized training and usage scenarios, such as medical models, educational models, financial models, etc. However, these models still do not target any specific application scenario and are more technical service-oriented, just trained with knowledge or skills pertinent to a particular field. The top layer consists of application-level models, which are created based on the previous two layers (either one) combined with user needs and usage scenarios to provide solutions and results that directly meet user needs.

These three different types of models increase in number from bottom to top. Generally speaking, the most basic general models are very few, possibly single digits; the number of vertical models in the middle layer is likely also single digits based on industry and field, while the application layer will have a very large number, referencing the current number of various app applications, as this is determined by the number of application scenarios.

Thus, both in terms of data training and the duty of care concerning generated content, these should be set according to this layered structure. How to set it most scientifically is clearly a high-difficulty question.

4. Compliance Issues of Open Source in Large Models

The development of large models in China relies heavily on existing open-source resources, including open-source models, open-source codes, and open-source datasets. Moreover, after development, many models will continue to be open-sourced, possibly based on previous open-source licensing requirements or commercial considerations. This mainly depends on foreign open-source resources, thus involving numerous open-source licenses. These licenses not only include existing general terms (GPL, PSD, Apache, MIT, etc.) but also some terms redefined by open-source companies. There have already been cases both domestically and internationally concerning open-source agreements, revealing that failure to comply with open-source agreements can lead to legal liabilities that extend beyond mere compensation in individual cases, potentially resulting in the developed products being unusable. Therefore, this is a critical issue related to business sustainability. Furthermore, not only must open-source agreements be clarified, but it is also necessary to determine which open-source resources meet the company’s business needs, which technologies can be used to detect already utilized open-source resources, and subsequently establish development processes, systems, and relevant professional committees to address complex issues. All these require in-depth research before solutions can be provided.

— END —

Author: Zhang Yanlai

Founding Partner and Chief Lawyer of Zhejiang Kending Law Firm

Patent Agent

Practical Mentor at China University of Political Science and Law

Practical Mentor for Master’s Degree in Law at Southwest University of Political Science and Law

Member of the Expert Guidance Committee on Antitrust in Zhejiang Province

Vice President of the Xihu Branch of Hangzhou Lawyers Association

Member of the Democratic National Construction Association

Arbitrator of Hangzhou Arbitration Commission

Since starting my practice, I have focused entirely on internet legal practice, serving as a long-term legal advisor for dozens of leading internet companies, and representing several landmark internet litigation cases including the first case of infringement of NFT digital collectibles, the first group control case, the first WeChat mini-program case, the first smartphone flashing case, the first 5G cloud gaming case, the first facial recognition case, and the first risk app governance case. The cases I represented have been selected multiple times as “Top Ten Typical Intellectual Property Cases” by the Supreme Court, “Top Fifty Typical Intellectual Property Cases” by the Supreme Court, “Most Research-Value Intellectual Property Cases in China,” “Top Ten Constitutional Cases in China,” and other top ten typical cases at various levels of people’s courts.

I have been deeply involved in the legislative work of China’s “E-commerce Law,” the General Administration of Industry and Commerce’s “Network Transaction Management Measures,” and Hangzhou’s “Network Transaction Management Measures,” participating as a drafter in the standardization work of the “Platform Economy Data Governance Evaluation Guide Standard” and the “Generative Artificial Intelligence Data Application Compliance Guidelines.”

My personal works “Legal Eye on E-commerce,” “Notes from the Battlefield of Internet Law,” and “No Technology, No Law” have been published by Law Press and Legal Publishing House.

Contact for Practical Circle Contributions

Leave a Comment Cancel reply