677
submitted 1 year ago by L4s@lemmy.world to c/technology@lemmy.world

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

(page 5) 50 comments
sorted by: hot top controversial new old
[-] LordShrek@lemmy.world 0 points 1 year ago
load more comments (3 replies)
[-] TropicalDingdong@lemmy.world 0 points 1 year ago

Its a bit pedantic, but I'm not really sure I support this kind of extremist view of copyright and the scale of whats being interpreted as 'possessed' under the idea of copyright. Once an idea is communicated, it becomes a part of the collective consciousness. Different people interpret and build upon that idea in various ways, making it a dynamic entity that evolves beyond the original creator's intention. Its like issues with sampling beats or records in the early days of hiphop. Its like the very principal of an idea goes against this vision, more that, once you put something out into the commons, its irretrievable. Its not really yours any more once its been communicated. I think if you want to keep an idea truly yours, then you should keep it to yourself. Otherwise you are participating in a shared vision of the idea. You don't control how the idea is interpreted so its not really yours any more.

If thats ChatGPT or Public Enemy is neither here nor there to me. The idea that a work like Peter Pan is still possessed is such a very real but very silly obvious malady of this weirdly accepted but very extreme view of the ability to possess an idea.

[-] Bogasse@lemmy.world 0 points 1 year ago

Well, I'd consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is "they build original content", both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their "original content" is not derivated from copyrighted content 🤷

[-] TropicalDingdong@lemmy.world -1 points 1 year ago

Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

Yeah I suppose that's on them.

[-] Laticauda@lemmy.ca 0 points 1 year ago* (last edited 1 year ago)

Ai isn't interpreting anything. This isn't the sci-fi style of ai that people think of, that's general ai. This is narrow AI, which is really just an advanced algorithm. It can't create new things with intent and design, it can only regurgitate a mix of pre-existing stuff based on narrow guidelines programmed into it to try and keep it coherent, with no actual thought or interpretation involved in the result. The issue isn't that it's derivative, the issue is that it can only ever be inherently derivative without any intentional interpretation or creativity, and nothing else.

Even collage art has to qualify as fair use to avoid copyright infringement if it's being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used (which requires intent). Even if it's transformative enough to make the original unrecognizable, if the majority of the work is not your own art, then you need to get permission to use it otherwise you aren't automatically safe from getting in trouble over copyright. Even using images for photoshop involves creative commons and commercial use licenses. Fanart and fanfic is also considered a grey area and the only reason more of a stink isn't kicked up over it regarding copyright is because it's generally beneficial to the original creators, and credit is naturally provided by the nature of fan works so long as someone doesn't try to claim the characters or IP as their own. So most creators turn a blind eye to the copyright aspect of the genre, but if any ever did want to kick up a stink, they could, and have in the past like with Anne Rice. And as a result most fanfiction sites do not allow writers to profit off of fanfics, or advertise fanfic commissions. And those are cases with actual humans being the ones to produce the works based on something that inspired them or that they are interpreting. So even human made derivative works have rules and laws applied to them as well. Ai isn't a creative force with thoughts and ideas and intent, it's just a pattern recognition and replication tool, and it doesn't benefit creators when it's used to replace them entirely, like Hollywood is attempting to do (among other corporate entities). Viewing AI at least as critically as actual human beings is the very least we can do, as well as establishing protection for human creators so that they can't be taken advantage of because of AI.

I'm not inherently against AI as a concept and as a tool for creators to use, but I am against AI works with no human input being used to replace creators entirely, and I am against using works to train it without the permission of the original creators. Even in the artist/writer/etc communities it's considered to be a common courtesy to credit other people/works that you based a work on or took inspiration from, even if what you made would be safe under copyright law regardless. Sure, humans get some leeway in this because we are imperfect meat creatures with imperfect memories and may not be aware of all our influences, but a coded algorithm doesn't have that excuse. If the current AIs in circulation can't function without being fed stolen works without credit or permission, then they're simply not ready for commercial use yet as far as I'm concerned. If it's never going to be possible, which I just simply don't believe, then it should never be used commercially period. And it should be used by creators to assist in their work, not used to replace them entirely. If it takes longer to develop, fine. If it takes more effort and manpower, fine. That's the price I'm willing to pay for it to be ethical. If it can't be done ethically, then imo it shouldn't be done at all.

load more comments (1 replies)
[-] CosmicCleric@lemmy.world -1 points 1 year ago* (last edited 1 year ago)

It feels like we've just taken our first steps down the path of the Robin Williams acted movie 'Bicentennial Man' timeline.

[-] Gnubyte@lemdit.com -2 points 1 year ago* (last edited 1 year ago)

Our ancient legal system trying to lend itself to "protecting authors" is fucking absurd. AI is the future. Are we really going to let everyone take a shot suing these guys over this crap? Its a useful program and infrastructure for everyone.

Holding technology back for antiquated copyright law is downright absurd.

Edit: I want to add that I'm not suggesting copyright should be a free for all on your books or hard work, but rather that this is a computer program and a major breakthrough, and in the same way that if I read a book no one sues my brain for consumption I don't think we should sue an AI: it is not reproducing books. In the same manner that many footnotes websites about books do not reproduce a book by summarizing their content. With the contingency that until Open AI does not have an event where their reputation has to be re-evaluated (IE this is subject to change if they start trying to reproduce books).

[-] LordShrek@lemmy.world 0 points 1 year ago

if I read a book no one sues my brain for consumption

yes, this is the fundamental point

load more comments (18 replies)
load more comments
view more: ‹ prev next ›
this post was submitted on 22 Aug 2023
677 points (95.6% liked)

Technology

60084 readers
2566 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS