Why I'm Betting Against AI Agents in 2025 (Despite Building Them) (utkarshkanwat.com)

submitted 1 week ago by yogthos@lemmy.ml to c/technology@lemmy.ml

11 comments fedilink hide all child comments

top 11 comments

sorted by: hot top controversial new old

[-] NobodyElse@sh.itjust.works 15 points 1 week ago

Performing procedural tasks using a statistical model of our language will never be reliable. There’s a reason why we use logical and proscriptive syntax when we want deterministic outcomes.

[-] yogthos@lemmy.ml 6 points 1 week ago

I expect what we will see are tools where the human manages high level implementation, and the agents are used to implement specific functionality that can be easily tested and verified. I can see something along the lines of a scene graph where you focus on the flow of the code, and farm off details of implementation of each step to a tool. As the article notes, these tools can already get over 90% degree accuracy in these scenarios.

[-] NobodyElse@sh.itjust.works 2 points 1 week ago

I agree that this could be helpful for finally getting to that natural language programming paradigm that people have been hoping for. But there’s going to have to be something capable of logically implementing the low level code and that has to be more than just a statistical model of how people write code, as trained on a big collection of random repositories. (I’m open to being proved wrong about that!)

The 90% accuracy could just arise from the fact that the tests are against trivial or commonly solved tasks, so that the exact solutions to them exist in the training set. Anything novel will exist outside the training set and outside of the model.

[-] yogthos@lemmy.ml 2 points 6 days ago

I think it's going to be humans that implement actually interesting code while LLMs handle common and tedious stuff. That's the approach I've been using at work. When I need to crap out a UI based on some JSON payload, or make an HTTP endpoint, I let the LLM do it. When I have some actual business logic that's domain specific, I write that myself. This allows me to focus on writing code that's actually interesting, while the LLM does all the tedious work.

[-] queermunist@lemmy.ml 2 points 6 days ago

But doesn't the LLM sometimes churn out tedious garbage that you have to fix, thus not actually saving time?

[-] yogthos@lemmy.ml 1 points 6 days ago

That's where the rate of success becomes important. LLMs mostly produce decent code when applied to common cases like the examples I gave above. My experience is that vast majority of the time it's as good as what you'd write, occasionally needing minor tweaks. However, there's nothing forcing you to use the code they produce either. If the LLM stumbles, you can always fall back to writing the code by hand which leaves you no worse off than you would've been otherwise. It's all about learning how the tool works and when to use it.

[-] queermunist@lemmy.ml 1 points 6 days ago

You have to check it every single time, though, erasing any time savings. You're saving effort, maybe, but not time.

[-] yogthos@lemmy.ml 1 points 6 days ago

You're absolutely saving time, checking that the code works is far less time consuming than writing it. Especially for stuff like UIs or service endpoints. I literally work with this stuff on daily basis, and I would never go back. There's also another aspect to it which is that I personally find it makes my workflow more enjoyable. It lets me focus on things I actually want to work on, while automating a lot of boilerplate that I had to write by hand previously. Even if it wasn't saving me much time, there's a quality of life improvement here.

[-] queermunist@lemmy.ml 1 points 6 days ago

METR measured the speed of 16 developers working on complex software projects, both with and without AI assistance. After finishing their tasks, the developers estimated that access to AI had accelerated their work by 20% on average. In fact, the measurements showed that AI had slowed them down by about 20%.

[-] yogthos@lemmy.ml 1 points 6 days ago

Yes, I've seen this as well. First of all, 16 devs is a tiny sample, a far bigger study would be needed to get any meaningful results here. Second, it really depends on how experienced people are at using these tools. It took me a while to identify patterns that actually work repeatably and develop intuition for cases where the model is most likely to produce good results.

[-] Anissem@lemmy.ml 2 points 1 week ago

Well said. I understood next to none of it, but well said.

this post was submitted on 20 Jul 2025

13 points (88.2% liked)

Technology

39082 readers

101 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago

MODERATORS

MinutePhrase@lemmy.ml