850
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 26 Jul 2023
850 points (96.4% liked)
Technology
59583 readers
3066 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
It's an algorithm that's been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.
You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That's all an algorithm is. An execution of programmed tasks.
If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I'd get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn't an AI have to do the same?
Well, if OpenAI knowingly used pirated work, that's one thing. It seems pretty unlikely and certainly hasn't been proven anywhere.
Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it's hard to make the case that they're really at fault any more than Google would be.
well no, because the summary is its own copyrighted work
The published summary is open to fair use by web crawlers. That was settled in Perfect 10 v Amazon.
Right, but not one the author of the book could go after. The article publisher would have the closest rights to a claim. But if I read the crib notes and a few reviews of a movie... Then go to summarize the movie myself... That's derivative content and is protected under copyright.
Haven't people asked it to reproduce specific chapters or pages of specific books and it's gotten it right?
I haven't been able to reproduce that, and at least so far, I haven't seen any very compelling screenshots of it that actually match. Usually it just generates text, but that text doesn't actually match.
Gotcha. This seems like a good way to test for it then, I think.
If I read your book... and get an amazing idea... Turn it into a business and make billions off of it. You still have no right to anything. This is no different.
There's been no proof or evidence provided that ANY content was ever pirated. Has any of the companies even provided the dataset they've used yet?
Why is this the presumption that they did it the illegal way?
I don't see how this is even remotely the same? These companies are using this material to create their commercial product. They're not consuming it personally and developing a random idea later, far removed from the book itself.
I can't just buy (or pirate) a stack of Blu-rays and then go start my own Netflix, which is akin to what is happening here.
I never said that the idea would be removed from the book. You can literally take the idea from the book itself and make the money. There would be no issues. There is no dues owed to the book's writer.
This is the whole premise for educational textbooks. You can explain to me how the whole world works in book form... I can go out and take those ideas wholesale from your book and apply them to my business and literally make money SOLELY from information from your book. There's nothing due back to you as a writer from me nor my business.
You've failed to explain how that relates to your point. Sure you can purchase an econonomics textbook and then go become a finance bro, but that's not what they're doing here. They're taking that textbook (that wasn't paid for) and feeding it into their commercial product. The end product is derived from the author's work.
To put it a different way, would they still be able to produce ChatGPT if one of the developers simply read that same textbook and then inputted what they learned into the model? My guess is no.
It'd be the same if I went and bought CDs, ripped my favorite tracks, and then put them into a compilation album that I then sold for money. My product can't exist without having copied the original artists work. ChatGPT just obfuscates that by copying a lot of songs.
Nobody has provided any evidence that this is the case. Until this is proven it should not be assumed. Bandwagoning (and repeating this over and over again without any evidence or proof) against the ML people without evidence is not fair. The whole point of the Justice system is innocent until proven guilty.
Derivative works are 100% protected under copyright law. https://www.legalzoom.com/articles/what-are-derivative-works-under-copyright-law
This is the same premise that allows "fair use" that we all got up and arms about on youtube. Claiming that this doesn't exist now in this case means that all that stuff we fought for on Youtube needs to be rolled back.
Why not? Why can't someone grab a book, scan it... chuck it into an OCR and get the same content? There are plenty of ways that snippets of raw content could make it into these repositories WITHOUT asserting legal problems.
No... You could have for all intents and purposes have recorded all your songs from the radio onto a cassette... That would be 100% legal for personal consumption... which would be what the ML authors are doing. ChatGPT and others could have sources information from published sources that are completely legit. No "Author" has provided any evidence otherwise yet to believe that ChatGPT and others have actually broken a law yet. For all we know the authors of these tools have library cards, and fed in screenshots of the digital scans of the book or hand scanned the book. Or didn't even use the book at all and contextually grabbed a bunch of content from the internet at large.
Since the ML bots are all making derivative works, rather than spitting out original content... they'd be covered by copyright as a derivative work.
This only becomes an actual problem if you can prove that these tools have done BOTH
A better comparison would probably be sampling. Sampling is fair use in most of the world, though there are mixed judgments. I think most reasonable people would consider the output of ChatGPT to be transformative use, which is considered fair use.
If I created a web app that took samples from songs created by Metallica, Britney Spears, Backstreet Boys, Snoop Dogg, Slayer, Eminem, Mozart, Beethoven, and hundreds of other different musicians, and allowed users to mix all these samples together into new songs, without getting a license to use these samples, the RIAA would sue the pants off of me faster than you could say "unlicensed reproduction."
It doesn't matter that the output of my creation is clear-cut fair use. The input of the app--the samples of copyrighted works--is infringing.
The RIAA is indeed a litigious organization, and they tend to use their phalanx of lawyers to extract anyone who does anything creative or new into submission.
But sampling is generally considered fair use.
And if the algorithm you used actually listened to tens of thousands of hours of music, and fed existing patterns into a system that creates new patterns, well, you'd be doing the same thing anyone who goes from listening to music to writing music does. The first song ever written by humans was probably plagiarized from a bird.