view the rest of the comments
Technology
This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.
Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.
Rules:
1: All Lemmy rules apply
2: Do not post low effort posts
3: NEVER post naziped*gore stuff
4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.
5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)
6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist
7: crypto related posts, unless essential, are disallowed
Amazing how every new generation of technology has a generation of users of the previous technology who do whatever they can do stop its advancement. This technology takes human creativity and output to a whole new level, it will advance medicine and science in ways that are difficult to even imagine, it will provide personalized educational tutoring to every student regardless of income, and these people are worried about the technicality of what the AI is trained on and often don't even understand enough about AI to even make an argument about it. If people like this win, whatever country's legal system they win in will not see the benefits that AI can bring. That society is shooting themselves in the foot.
Your favorite musician listened to music that inspired them when they made their songs. Listening to other people's music taught them how to make music. They paid for the music (or somebody did via licensing fees or it was freely available for some other reason) when they listened to it in the first place. When they sold records, they didn't have to pay the artist of every song they ever listened to. That would be ludicrous. An AI shouldn't have to pay you because it read your book and millions like it to learn how to read and write.
I don't think that Sarah Silverman and the others are saying that the tech shouldn't exist. They're saying that the input to train them needs to be negotiated as a society. And the businesses also care about the input to train them because it affects the performance of the LLMs. If we do allow licensing, watermarking, data cleanup, synthetic data, etc. in a way that is transparent, I think it's good for the industry and it's good for the people.
I don't need to negotiate with Sarah Silverman if Im handed her book by a friend, and neither should an AI
But you do need to negotiate with Sarah Silverman, if you take that book, rearrange the chapters, and then try sell it for profit. Obviously that's extremified but it's The argument they're making.
I agree. But that isn't what AI is doing, because it doesn't store the actual book and it isn't possible to reproduce any part in a format that is recognizable as the original work.
Definitely not how that output works. It will come up with something that seems like a Sarah Silverman created work but isn't. It's like calling Copyright on impersonations. I don't buy it
Yes. Imagine how much trouble ANY actor would be in if they were sued for impersonating someone nearly identical but not that person. If Sarah Silverman ever interacted with a person and then imitated that person on stage for her own personal benefit without the other persons express consent it would be no different. And comedians pick up their comedy from everything around them both natural and imitation.
100%. I just can't get behind any of these arguments against AI from this segment of workers. This is no different than other rallies against technological evolution due to fear of job losses. Their scarce commodity will soon disappear and that's what they're actually afraid of.
It’s easy. They’re grasping at straws because their career isn’t what it used to be. It’s something new and viral so it must be an easy target to exploit for money. Personally I’d be on top of it and setting up contracts to allow AI to use my likeness for a small subset of the usual pay. I just can’t imagine not taking advantage of the ability to do absolutely nothing and still get paid for it. Instead they appear to actively be trying to tear it down. If they were wanting to set guidelines then they would be rallying congress not suing a company based on how you FEEL it should be.
That’s not what this is. To use your example it would be like taking her book and rearranging ALL of the words to make another book and selling that book. But they’re not selling the book or its contents, they’re selling how their software interprets the book for the benefit of the user. This would be like suing teachers for teaching about their book.
An LLM isn't human and shouldn't be treated the same as a human. It's as foolish as corporate personhood.
The argument is less that an LLM is a human and more that it is not a copyright violation to use a material to train the LLM. By current legal definitions, it is fair use unless the material is able to be reproduced in its entirety (or at least, in some meaningful way).
Yeah, definitions that were written before this technology existed. I don't base my opinions on what is legal, legality nothing more than rules determined by those in power.
Instead, I base them on what is ethical, and the consumption of material by LLMs and other AIs without the express permission of its creator is unethical.
Except the AI owner does. It's like sampling music for a remix or integrating that sample into a new work. Yes, you do not need to negotiate with Sarah Silverman if you are handed a book by a friend. However if you use material from that book in a work it needs to be cited. If you create an IP based off that work, Sarah Silverman deserves compensation because you used material from her work.
No different with AI. If the AI used intellectual property from an author in its learning algorithm, than if that intellectual property is used in the AI's output the original author is due compensation under certain circumstances.
Neither citation nor compensation are necessary for fair use, which is what occurs when an original work is used for its concepts but not reproduced.
Sure, but fair use is rather narrowly defined. You must consider the purpose, nature, amount, and effect. In the case of scraping entire bodies of work as training data, the purpose is commercial, the nature is not in the public interest, the amount is the work in its entirety, and the effect is to compete with the original author. It fails to meet any criteria for fair use.
The work is not reproduced in its entirety. Simply using the work in its entirety is not a violation of copyright law, just as reading a book or watching a movie (even if pirated) is not a violation. The reproduction of that work is the violation, and LLMs simply do not store the works in their entirety nor are they capable of reproducing them.
It doesn't have to be reproduced to be a copyright violation, only used. For example, publishing your Harry Potter fanfic would be infringement. You're not reproducing the original material in any way, but you're still heavily depending on it.
Breach of trademark, not copyright, whole different barrel of fish.
It is different. That knowledge from her book forms part of your processing and allows you to extract features and implement similar outputs yourself. The key difference between the AI module and dataset is that it's codified in bits, versus whatever neural links we have in our brain. So if one theoretically creates a way to codify your neural network you might be subject to the same restrictions we're trying to levy on ai. And that's bullshit.