166

AI agents wrong ~70% of time: Carnegie Mellon study (www.theregister.com)

submitted 2 weeks ago by eli001@lemmy.world to c/technology@lemmy.world

165 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Knock_Knock_Lemmy_In@lemmy.world 0 points 2 weeks ago

Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate. LLMs don't get tired and they can be run in parallel.

[-] MangoCats@feddit.it 1 points 2 weeks ago

I have actually been doing this lately: iteratively prompting AI to write software and fix its errors until something useful comes out. It's a lot like machine translation. I speak fluent C++, but I don't speak Rust, but I can hammer away on the AI (with English language prompts) until it produces passable Rust for something I could write for myself in C++ in half the time and effort.

I also don't speak Finnish, but Google Translate can take what I say in English and put it into at least somewhat comprehensible Finnish without egregious translation errors most of the time.

Is this useful? When C++ is getting banned for "security concerns" and Rust is the required language, it's at least a little helpful.

[-] davidagain@lemmy.world 0 points 2 weeks ago

What's 0.7^10?

[-] Knock_Knock_Lemmy_In@lemmy.world 0 points 2 weeks ago

About 0.02

[-] davidagain@lemmy.world 0 points 2 weeks ago

So the chances of it being right ten times in a row are 2%.

[-] Knock_Knock_Lemmy_In@lemmy.world 1 points 2 weeks ago* (last edited 2 weeks ago)

No the chances of being wrong 10x in a row are 2%. So the chances of being right at least once are 98%.

[-] davidagain@lemmy.world 1 points 2 weeks ago

Ah, my bad, you're right, for being consistently correct, I should have done 0.3^10=0.0000059049

so the chances of it being right ten times in a row are less than one thousandth of a percent.

No wonder I couldn't get it to summarise my list of data right and it was always lying by the 7th row.

[-] Knock_Knock_Lemmy_In@lemmy.world 1 points 2 weeks ago

That looks better. Even with a fair coin, 10 heads in a row is almost impossible.

And if you are feeding the output back into a new instance of a model then the quality is highly likely to degrade.

this post was submitted on 07 Jul 2025

166 points (97.2% liked)

Technology

73195 readers

409 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws