There is no such thing as an effective "AI detector", nor will there ever be one. (lemmy.megumin.org)

submitted 2 years ago* (last edited 2 years ago) by excel@lemmy.megumin.org to c/technology@lemmy.world

65 comments fedilink hide all child comments

I keep seeing posts about this kind of thing getting people's hopes up, so let's address this myth.

What's an "AI detector"?

We're talking about these tools that advertise the ability to accurately detect things like deep-fake videos or text generated by LLMs (like ChatGPT), etc. We are NOT talking about voluntary watermarking that companies like OpenAI might choose to add in the future.

What does "effective" mean?

I mean something with high levels of accuracy, both highly sensitive (low false negatives) and highly specific (low false positives). High would probably be at least 95%, though this is ultimately subjective.

Why should the accuracy bar be so high? Isn't anything better than a coin flip good enough?

If you're going to definitively label something as "fake" or "real", you better be damn sure about it, because the consequences for being wrong with that label are even worse than having no label at all. You're either telling people that they should trust a fake that they might have been skeptical about otherwise, or you're slandering something real. In both cases you're spreading misinformation which is worse than if you had just said "I'm not sure".

Why can't a good AI detector be built?

To understand this part you need to understand a little bit about how these neural networks are created in the first place. Generative Adversarial Networks (GANs) are a strategy often employed to train models that generate content. These work by having two different neural networks, one that generates content similar to existing content, and one that detects the difference between generated content and the existing content. These networks learn in tandem, each time one network gets better the other one also gets better.

That this means is that building a content generator and a fake content detector are effectively two different sides of the same coin. Improvements to one can always be translated directly and in an automated way into improvements into the other one. This means that the generator will always improve until the detector is fooled about 50% of the time.

Note that not all of these models are always trained in exactly this way, but the point is that anything CAN be trained this way, so even if a GAN wasn't originally used, any kind of improved detection can always be directly translated into improved generation to beat that detection. This isn't just any ordinary "arms race", because the turn around time here is so fast there won't be any chance of being ahead of the curve... the generators will always win.

Why do these "AI detectors" keep getting advertised if they don't work?

People are afraid of being saturated by fake content, and the media is taking advantage of that fear to sell snake oil
Every generator network comes with its own free detector network that doesn't really work all that well (~50% accuracy) because it was used to create the generator originally, so these detectors are ubiquitous among AI labs. That means the people that own the detectors are the SAME PEOPLE that created the problem in the first place, and they want to make sure you come back to them for the solution as well.

you are viewing a single comment's thread
view the rest of the comments

[-] FlyingSquid@lemmy.world 1 points 2 years ago

Of course when one of these grand mountain ranges goes stretching across the printed page, it adorns and ennobles that literary landscape--but at the same time it is a great distress to the new student, for it blocks up his way; he cannot crawl under it, or climb over it, or tunnel through it. So he resorts to the dictionary for help, but there is no help there. The dictionary must draw the line somewhere--so it leaves this sort of words out. And it is right, because these long things are hardly legitimate words, but are rather combinations of words, and the inventor of them ought to have been killed. They are compound words with the hyphens left out. The various words used in building them are in the dictionary, but in a very scattered condition; so you can hunt the materials out, one by one, and get at the meaning at last, but it is a tedious and harassing business. I have tried this process upon some of the above examples. "Freundshaftsbezeigungen" seems to be "Friendship demonstrations," which is only a foolish and clumsy way of saying "demonstrations of friendship." "Unabhaengigkeitserklaerungen" seems to be "Independencedeclarations," which is no improvement upon "Declarations of Independence," so far as I can see. "Generalstaatsverordnetenversammlungen" seems to be "General-statesrepresentativesmeetings," as nearly as I can get at it--a mere rhythmical, gushy euphuism for "meetings of the legislature," I judge. We used to have a good deal of this sort of crime in our literature, but it has gone out now. We used to speak of a things as a "never-to-be-forgotten" circumstance, instead of cramping it into the simple and sufficient word "memorable" and then going calmly about our business as if nothing had happened. In those days we were not content to embalm the thing and bury it decently, we wanted to build a monument over it.

-- Mark Twain, A Tramp Abroad

[-] Spzi@lemm.ee 1 points 2 years ago

Okay, interesting. Of course it would be nice if languages were easy to understand and easy to learn. German seems to be on the hard end of this spectrum, but no language is free from unecessary complications like these. They all grew historically and organically, and were not constructed with accessibility in mind.

It is nearly impossible to get an objective view on languages, since each of us is inherently biased, and most of us don't speak another language so well that we could truly judge it. It's easy to spot silly things in other languages while we may be unaware of how difficult our mother language is to learn for foreigners.

The interpretation of the given examples feels wrong for me. While the technical part is correct, I think the conclusion is incorrect. For example, "Unabhaengigkeitserklärung" emphasizes the independence, while "Erklärung von Unabhängigkeit" emphasizes the demonstration. The two are not equivalent. Twain seemed to be ignorant about that and simply assumed a foreign language would follow the same rules as his own.

While I can understand Twains frustration in learning another language, his critique is based on a lack of understanding.

For some compound words, there is no straightforward equivalent. "Apfelbaum" (apple tree) could be "Baum, an dem Äpfel wachsen" (tree on which apples grow). But that leaves the question wether it's still an Apfelbaum when it does not grow apples in this moment, like in winter. "Baum des Apfels" (tree of the apple) can refer to a miniature tree on an apple. "Baum der Äpfel" (tree of the apples) might be okay.

Further, what he believes to be superior can sometimes be inferior. Consider cases like "The presentation on renewable energy technology investors." In this sentence, it's not clear whether "renewable energy technology" is a single entity modifying "investors," or if "renewable energy" and "technology investors" are separate entities, both modifying "presentation." The sentence could refer to a presentation for investors interested in renewable energy technology or to a presentation about investors who focus on renewable energy projects. Compound words prevent ambiguities like these.

Hyphens can help in these cases. They can also be used in German to make it easy to identify compound components, like it's required in https://en.wikipedia.org/wiki/Leichte_Sprache.

We used to speak of a things as a “never-to-be-forgotten” circumstance, instead of cramping it into the simple and sufficient word “memorable”

That's another interesting point to discuss. Which is easier for foreigners? Sure, a single, short word in itself is easy to learn. But it is a new word, which has to be learned. In this case, you have to learn which part of "memory" or "memorize" can be used, and which part must be replaced.

I also don't think "memorable" has the same meaning as "never-to-be-forgotten". Isn't "memorable" more fitting for positive things, while n-t-b-f is well suited for negative things? Was the Holocaust 'memorable'?

[-] FlyingSquid@lemmy.world 2 points 2 years ago

Honestly, I posted it more because I thought it was funny than anything. I didn't expect such a deconstruction, but it's interesting!

this post was submitted on 23 Jul 2023

295 points (95.1% liked)

Technology

73538 readers

849 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws