Senate Hearing Debates AI Training on Copyrighted Works

A U.S. Senate hearing held July 16 gave some hope to publishers and authors that at least some members of Congress seem willing to step up the fight against Big Tech companies who knowingly violate copyright laws to train their large language models.

The hearing, titled “Too Big to Prosecute? Examining the AI Industry’s Mass Ingestion of Copyrighted Works for AI Training,” was overseen by Senator Josh Hawley (R-MO), chair of the Senate Judiciary Subcommittee on Crime and Counterterrorism. Hawley, a frequent critic of Big Tech, left no doubt where his sentiment lies.

“Today’s hearing is about the largest intellectual property theft in American history,” Hawley began. “For all of the talk about artificial intelligence and innovation and the future that comes out of Silicon Valley, here’s the truth that nobody wants to admit: AI companies are training their models on stolen material.”

The hearing featured five witnesses, four of whom argued that that the AI companies’ training methods are a clear violation of fair use, while the fifth, Edward Lee, a professor at Santa Clara University School of Law, made the case that their methods could be protected by fair use and he cautioned that before Congress takes any action it should let the issues play out in court.

In making his argument, Lee cited two recent rulings in lawsuits brought against Anthropic and Meta where both judges found that copying was indeed fair use. In each instance, however, the judges wrote that the tech companies were not in the clear. In the Anthropic case, the court ruled that while using legally acquired copyrighted books to train AI large language models constitutes fair use, downloading pirated copies of those books for permanent storage violates copyright law.

The use of pirate sites like Lib Gen was of particular concern to Maxwell Pritt, partner at the law firm Boies Schiller Flexner, which is suing Meta. Pritt said Meta and other AI companies have revived the fortunes of pirate sites by turning to them from training purposes. He noted that documents showed that Meta employees knew using pirated sites was illegal, but that Meta chair Mark Zuckerberg made the decision to proceed anyway. “There is no carve out in the Copyright Act for AI companies to engage in mass piracy,” Pritt said.

Both Hawley and Senator Dick Durbin of Illinois peppered Lee with suggestions that AI companies were the direct beneficiaries of not paying for content at the expense of creators. Lee didn’t disagree with that assertion, but said it is important that U.S. companies are able to compete with China in the field of AI, something that would benefit all Americans. Hawley took exception to the line or reasoning, asking whether it is fair that giant corporations can benefit from stealing the work of American citizens such as author David Baldacci. Lee stuck to his belief that the fair use doctrine is the best way to balance the need for innovation with the protection of copyright, and that Congress should let the lawsuits before the courts play out before intervening.

In his ruling in the Meta case, judge Vince Chhabria supported the transformative aspects of AI, but also laid out why AI companies maybe in some legal jeopardy in their training methods. Chhabria pointed out that "generative AI has the potential to flood the market with endless amounts of images, songs, articles, books, and more."

In his remarks, Baldacci, a plaintiff in the Authors Guild class action suit against OpenAI, addressed that very issue. Baldacci explained he has seen new books released that closely resemble his own plot lines and fears that these types of works will eventually flood the market. He noted that online vendors now require the “author” to disclose if a book was not human-created. “It’s getting to the point where they will have to limit the number of books that someone can publish on a weekly or even daily basis,” Baldacci said.

In fact, as early as September of 2023 Amazon began limiting the number of books authors using its KDP platform can post to three per day. The decision came a few weeks after KDP issued guidelines that authors need to notify Amazon if AI materials were involved in writing a book.

Bhamati Viswanathan, a professor at New England Law School, said Baldacci is right to worry that books ripped off from him could overwhelm the market. Subpar work that substitutes for the original work “is not what fair use was intend to achieve,” she said.

The back-and-forth of the session, which AAP CEO Maria Pallante called a "terrific hearing" in a note to members, seem to enforce Hawley’s view on what needs to be done. If what AI companies are doing isn’t infringement, Hawley said, “Congress needs to do something. I mean, if the answer is that the biggest corporation in the world, worth trillions of dollars, can come take an individual author’s work like Mr. Baldacci’s, lie about it, hide it, profit off of it, and there’s nothing our law does about that, we need to change the law.”