For AI training: Meta violates the copyright of thousands of book authors

Meta Copyright Authors Books AI Training Llama

Meta is said to have violated the copyright of thousands of book authors for the AI training of its language model Llama. The Facebook Group is now facing several class action lawsuits.

Artificial intelligence requires huge amounts of data. But it is not clear for all of them whether they can even be used for AI training.

The main problem here is the copyright of texts, images or videos. And that’s exactly what Meta is said to have violated on a mass scale. The US group is currently facing several class action lawsuits. The accusation: Meta is said to have violated the copyright of thousands of book authors.

Table of Contents

Did Meta for AI Training Violate Copyright?

Novelist Christopher Farnsworth has one of these class action lawsuits filed in a US court. In it he accuses Meta of having used his books and those of other authors without permission for Llama’s AI training.

He is demanding compensation and wants to stop the use of his works for AI training. He is not alone in this. Other authors have also filed similar class action lawsuits in the same court.

These include comedian Sarah Silverman and author Ta-Nehisi Coates. They also accuse Meta of violating the copyright of their works because the company is said to have used them for AI training.

Where does the data for AI training come from?

The background is a data set called “The Pile”, which is 886 gigabytes in size and contains numerous texts in English. This dataset comes from EleutherAI in 2020 and was made available for training large AI language models.

A subcategory of The Pile called Books3 contains 196,640 copyrighted books. It includes, among others, works by Stephen King, Margaret Atwood and the novelist Christopher Farnsworth.

According to the lawsuit, it is confirmed that Meta downloaded “The Pile” data set and used it “as part of its work in the training and development of its LLMs.” For this reason, Farnsworth accuses Meta of using the books contained in Books3 for the AI training of its Llama models and thus violating copyright.

The problem of AI and copyright

The conflict between AI companies and authors is not new. Companies that need data to train their AI models often refer to the fair use doctrine of US copyright law.

This doctrine stipulates that copyrighted works can also be used without authorization in areas such as public education. This is primarily aimed at science and the work of researchers and students.

But many big players in the AI industry also refer to the fair use doctrine and accuse the plaintiffs of slowing down progress in the field of artificial intelligence.

But while the AI companies rely on this, authors are demanding compensation for their works. The training of AI models is often compared to human learning. However, people who learn from books would buy them or borrow them from libraries, Farnsworth’s class action lawsuit says.

People would “legally obtain” the works, thereby offering at least a certain level of compensation for authors and creators. He further explained: “Meta does not do this and has appropriated the content of authors to create a machine that generates exactly the type of content that authors are normally paid for.”

Also interesting:

This AI can detect type 2 diabetes – just by voice
Meta Smart Glasses: Students determine the identity of people on the street
Do AI systems need warnings – like medications?
This is how much electricity and water ChatGPT needs for a single email

The article For AI training: Meta violates the copyright of thousands of book authors by Maria Gramsch appeared first on BASIC thinking. Follow us too Facebook, Twitter and Instagram.

As a Tech Industry expert, I am deeply concerned about Meta violating the copyright of thousands of book authors for AI training purposes. Intellectual property rights are crucial in the digital age, and companies must respect these rights in order to foster innovation and creativity.

Using copyrighted material without permission not only undermines the hard work and creativity of authors but also sets a dangerous precedent for the future of AI training. It is essential for companies like Meta to adhere to copyright laws and obtain proper licenses or permissions for the content they use in their AI training.

Furthermore, this incident highlights the importance of ethical considerations and responsible practices in the tech industry. Companies should prioritize ethical behavior and ensure that they are not infringing on the rights of content creators in their pursuit of technological advancements.

In conclusion, Meta’s violation of copyright laws for AI training is a serious issue that must be addressed promptly and appropriately. It is imperative for companies to uphold ethical standards and respect intellectual property rights in order to maintain trust and integrity in the tech industry.

Credits