60

Does a license like this exist?

you are viewing a single comment's thread
view the rest of the comments
[-] lobut@lemmy.ca 7 points 1 week ago

Some authors typed the first few sentences of their book and the LLM spit out the rest.

[-] FaceDeer@fedia.io 0 points 1 week ago

That generally only happens in cases of overfitting, where the model was trained on a poorly de-duplicated data set that contains many copies of that book (or excerpts, quotes, and so forth). This is considered a flaw by AI trainers and a lot of work goes into sanitizing the training data to prevent it.

[-] XLE@piefed.social 4 points 1 week ago* (last edited 1 week ago)

But you're otherwise disgusted by the fact that material is plagiarized without consent to begin with...

...Right, FaceDeer?

[-] FaceDeer@fedia.io -1 points 1 week ago

You went digging through my Reddit comments to find a two-month-old thread, that must have taken a lot of effort. But I'm afraid I don't see what the relevance of it is, aside from a general "it's about AI". The bulk of the comments I wrote there were about water usage.

I'm genuinely puzzled. Are you saying that deduplicating data is "hiding unethical behaviour?" It's actually intended for improving the model's performance, having a model spit out exact copies of its training data means you've produced a hugely expensive and wasteful re-implementation of copy-and-paste rather than a generative AI. The whole point of generative AI is to produce novel outputs.

this post was submitted on 24 Feb 2026
60 points (92.9% liked)

Opensource

5712 readers
45 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

CreditsIcon base by Lorc under CC BY 3.0 with modifications to add a gradient



founded 2 years ago
MODERATORS