Super HN

New Show
   Judge sides with Anthropic over training AI on books without authors' permission (techcrunch.com)
Broadly summarizing.

This is OK and fair use: Training LLMs on copyrighted work, since it's transformative.

This is not OK and not fair use: pirating data, or creating a big repository of pirated data that isn't necessarily for AI training.

Overall seems like a pretty reasonable ruling?

But those training the LLMs are still using the works, and not just to discuss them, which I think is the point of fair use doctrine. I guess I fail to see how it's any different from me using it in some other way? If I wanted to write a play very loosely inspired by Blood Meridian, it might be transformative, but that doesn't justify me pirating the book.

I tend to think copyright should be extremely limited compared to what it is now, but to me the logic of this ruling is illogical other than "it's ok for a corporation to use lots of works without permission but not for an individual to use a single work without permission." Maybe if they suddenly loosened copyright enforcement for everyone I might feel differently.

"Kill one man, and you are a murderer. Kill millions of men, and you are a conqueror." (An admittedly hyperbolic comparison, but similar idea.)

Depends whether you actually agree its transformative
For textual purposes it seems fairly transformative.

If you train a LLM on harry potter and ask it to generate a story that isn't harry potter then it's not a replacement.

However, if you train a model on stock imagery and use it to generate stock imagery then I think you'll run into an issue from the Warhol case.

[delayed]
What's the steelman case that is transformative? Because prima-facie, it seems to only output original output - "intelligent" output.
The HN crowd dislikes brick-and-mortar landlords but often sides with charging rent for certain bits. Which side will prevail?

Interesting excerpt:

> “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages,” Judge Alsup wrote in the decision. “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for theft but it may affect the extent of statutory damages.”

Language of “pirated” and “theft” are from the article. If they did realize a mistake and purchased copies after the fact, why should that be insufficient?

Anthropic won't submit a spreadsheet of all the books and whether they were purchases or not. So trivially, not every book stolen is shown to be later purchased.

As just a matter of society, I don't think you want people say stealing a car and then coming back a month later with the money.

While no one wants anyone to steal a car, almost no one would mind freely cloning a car. The trouble truly is that 3d-printing hasn't gotten that good yet.
>If they did realize a mistake and purchased copies after the fact, why should that be insufficient?

1. You're assuming this was some good faith "they didn't know they were stealing" factor. They use someone else's product's for commercial use. I'm not so charitable in my interpretation.

2. I'm not absolved of theft just because I go back and put money on the register. I still sttole, intentionally or not

why would it erase the mistake? you pirated first.