AI companies oppose paying for copyrighted content in generative AI training

The biggest players in the field of artificial intelligence (AI) have expressed their opposition to paying for copyrighted material as training data for generative AI models.

The US Copyright Office has been seeking public input on potential new rules regarding the use of copyrighted content in AI model training, and several major AI companies have submitted their arguments against the proposed changes.

Here is a summary of their points collected by TheVerge:

Meta: Copyright holders wouldn’t get much money anyway

Imposing a first-of-its-kind licensing regime now, well after the fact, will cause chaos as developers seek to identify millions and millions of rightsholders, for very little benefit, given that any fair royalty due would be incredibly small in light of the insignificance of any one work among an Al training set.

Google: AI training is just like reading a book

If training could be accomplished without the creation of copies, there would be no copyright questions here. Indeed that act of “knowledge harvesting.” to use the Court’s metaphor from Harper & Row, like the act of reading a book ‘and learning the facts and ideas within it, would not only be non-infringing, it would further the very purpose of copyright law. The mere fact that, as a technological matter, copies need to be made to extract those ideas and facts from copyrighted works should not alter that result.

Microsoft: Changing copyright law could hurt small AI developers

Any requirement to obtain consent for accessible works to be used for training would chill Al innovation. It is not feasible to achieve the scale of data necessary to develop responsible Al models even when the identity of a work and its owner is known. Such licensing schemes will also impede innovation from start-ups and entrants who don’t have the resources to obtain licenses, leaving Al development to a small set of companies with the resources to run large-scale licensing programs or to developers in countries that have decided that use of copyrighted works to train Al models is not infringement.

Anthropic: Current law is fine; don’t change it

Sound policy has always recognized the need for appropriate limits to copyright in order to support creativity, innovation, and other values, and we believe that existing law and continued collaboration among all stakeholders can harmonize the diverse interests at stake, unlocking AI’s benefits while addressing concern.

Adobe: It’s fair use, like when Accolade copied Sega’s code

In Sega v. Accolade, the Ninth Circuit held that intermediate copying of Sega’s software was fair use. The defendant made copies while reverse engineering to discover the functional requirements—unprotected information—for making games compatible with Sega’s gaming console. Such intermediate copying also benefited the public: it led to an increase in the number of independently designed video games (which contain a mix of functional and creative aspects) available for Sega’s console. This growth in creative expression was precisely what the Copyright Act was intended to promote.

Anthropic: Copying is just an intermediate step

For Claude, as discussed above, the training process makes copies of information for the purposes of performing a statistical analysis of the data. The copying is merely an intermediate step, extracting unprotectable elements about the entire corpus of works, in order to create new outputs. In this way, the use of the original copyrighted work is non-expressive; that is, it is not re-using the copyrighted expression to communicate it to users.

Andreessen Horowitz: Investors have spent ‘billions and billions’

Over the last decade or more, there has been an enormous amount of investment—billons and billions of dollars—in the development of AI technologies, premised on an understanding that, under current copyright law, any copying necessary to extract statistical facts is permitted. A change in this regime will significantly disrupt settled expectations in this area. Those expectations have been a critical factor in the enormous investment of private capital into U.S.-based AI companies which, in turn, has made the U.S. a global leader in AI. Undermining those expectations will jeopardize future investment, along with U.S. economic competitiveness and national security.

Hugging Face: Training on copyrighted material is fair use

The use of a given work in training is of a broadly beneficial purpose: the creation of a distinctive and productive Al model. Rather than replacing the specific communicative expression of the initial work, the model is capable of creating a wide variety of different sort of outputs wholly unrelated to that underlying, copyrightable expression. For those and other reasons, generative Al models are generally fair use when they train on large numbers of copyrighted works. We use “generally” deliberately, however, as one can imagine patterns of facts that would raise tougher calls.

StabilityAI: Other countries call AI model training fair use

A range of jurisdictions including Singapore, Japan, the European Union, the Republic of Korea, Taiwan, Malaysia, and Israel have reformed their copyright laws to create safe harbors for Al training that achieve similar effects o fair use.” In the United Kingdom, the Government Chief Scientific Advisor has recommended that “if the government’s aim is to promote an innovative Al industry in the UK, it should enable mining of available data, text, and images (the input) and utilise [sic] existing protections of copyright and IP law on the output of AI.

Apple: Let us copyright our AI-made code

In circumstances where a human developer controls the expressive elements of output and the decisions to modify, add to, enhance, or even reject suggested code, the final code that results from the developer’s interactions with the tools will have sufficient human authorship to be copyrightable.

The debate around AI and copyright continues to evolve, with various stakeholders offering different perspectives on the issue. The US Copyright Office's decisions on potential rule changes will likely have significant implications for the AI industry and copyright law.