463

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models. (www.businessinsider.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

171 comments fedilink hide all child comments

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.::In January, OpenAI launched a system for identifying AI-generated text. This month, the company scrapped it.

you are viewing a single comment's thread
view the rest of the comments

[-] Womble@lemmy.world 13 points 2 years ago

Your assertion that a future AI detector will be able to detect current LLM output is dubious. If I give you the sentence "Yesterday I went to the shop and bought some milk and eggs." There is no way for you or any detection system to tell if that was AI generated or not with any significant degree of certainty. What can be done is statistical analysis of large data sets to see how they "smell", but saying around 30% of this dataset is likely LLM generated does not get you very far in creating a training set.

I'm not saying that there is no solution to this problem, but blithely waving away the problem saying future AI will be able to spot old AI is not a serious take.

[-] lily33@lemmy.world -4 points 2 years ago

If you give me several paragraphs instead of a single sentence, do you still think it's impossible to tell?

[-] steakmeout@lemmy.world 4 points 2 years ago

"If you zoom further out you can definitely tell it's been shopped because you can see more pixels."

[-] steveman_ha@lemmy.world 1 points 2 years ago* (last edited 2 years ago)

What they're getting towards (one thing, anyways) is that "indistinguishable to the model" and "the same" are two very different things.

IIRC, one possibility is that LLMs which learn from one another will make such incremental changes to what's considered "acceptable" or "normal" language structuring that, over time, more noticeable linguistic changes begin to emerge that go unnoticed by the models.

As it continues, this phenomena creates a "positive feedback loop" in which the gap progressively widens -- still undetected, because the quality of training data is going down -- to the point where models basically "collapse" in their effectiveness.

So even if their output is indistinguishable now, how the tech is used (I guess?) will determine whether or not a self-destructive LLM echo chamber is produced.

this post was submitted on 28 Jul 2023

463 points (93.6% liked)

Technology

79286 readers

383 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws