385

Meta Admits Use of 'Pirated' Book Dataset to Train AI (torrentfreak.com)

submitted 2 years ago by aprnu@feddit.ch to c/piracy@lemmy.dbzer0.com

48 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] msgraves@lemmy.dbzer0.com 17 points 2 years ago

ok, fair; but do consider the context that the models are open weight. You can download them and use them for free.

There is a slight catch though which I’m very annoyed at: it’s not actually Apache. It’s this weird license where you can use the model commercially up until you have 700M Monthly users, which then you have to request a custom license from meta. ok, I kinda understand them not wanting companies like bytedance or google using their models just like that, but Mistral has their models on Apache-2.0 open weight so the context should definitely be reconsidered, especially for llama3.

It’s kind of a thing right now- publishers don’t want models trained on their books, „because it breaks copyright“ even though the model doesn’t actually remember copyrighted passages from the book. Many arguments hinge on the publishers being mad that you can prompt the model to repeat a copyrighted passage, which it can do. IMO this is a bullshit reason

anyway, will be an interesting two years as (hopefully) copyright will get turned inside out :)