Why I'm Betting Against AI Agents in 2025 (Despite Building Them) (utkarshkanwat.com)

submitted 1 week ago by yogthos@lemmy.ml to c/technology@lemmy.ml

11 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] NobodyElse@sh.itjust.works 2 points 1 week ago

I agree that this could be helpful for finally getting to that natural language programming paradigm that people have been hoping for. But there’s going to have to be something capable of logically implementing the low level code and that has to be more than just a statistical model of how people write code, as trained on a big collection of random repositories. (I’m open to being proved wrong about that!)

The 90% accuracy could just arise from the fact that the tests are against trivial or commonly solved tasks, so that the exact solutions to them exist in the training set. Anything novel will exist outside the training set and outside of the model.

[-] yogthos@lemmy.ml 2 points 1 week ago

I think it's going to be humans that implement actually interesting code while LLMs handle common and tedious stuff. That's the approach I've been using at work. When I need to crap out a UI based on some JSON payload, or make an HTTP endpoint, I let the LLM do it. When I have some actual business logic that's domain specific, I write that myself. This allows me to focus on writing code that's actually interesting, while the LLM does all the tedious work.

[-] queermunist@lemmy.ml 2 points 1 week ago

But doesn't the LLM sometimes churn out tedious garbage that you have to fix, thus not actually saving time?

[-] yogthos@lemmy.ml 1 points 1 week ago

That's where the rate of success becomes important. LLMs mostly produce decent code when applied to common cases like the examples I gave above. My experience is that vast majority of the time it's as good as what you'd write, occasionally needing minor tweaks. However, there's nothing forcing you to use the code they produce either. If the LLM stumbles, you can always fall back to writing the code by hand which leaves you no worse off than you would've been otherwise. It's all about learning how the tool works and when to use it.

[-] queermunist@lemmy.ml 1 points 1 week ago

You have to check it every single time, though, erasing any time savings. You're saving effort, maybe, but not time.

[-] yogthos@lemmy.ml 1 points 1 week ago

You're absolutely saving time, checking that the code works is far less time consuming than writing it. Especially for stuff like UIs or service endpoints. I literally work with this stuff on daily basis, and I would never go back. There's also another aspect to it which is that I personally find it makes my workflow more enjoyable. It lets me focus on things I actually want to work on, while automating a lot of boilerplate that I had to write by hand previously. Even if it wasn't saving me much time, there's a quality of life improvement here.

[-] queermunist@lemmy.ml 1 points 1 week ago

METR measured the speed of 16 developers working on complex software projects, both with and without AI assistance. After finishing their tasks, the developers estimated that access to AI had accelerated their work by 20% on average. In fact, the measurements showed that AI had slowed them down by about 20%.

[-] yogthos@lemmy.ml 1 points 1 week ago

Yes, I've seen this as well. First of all, 16 devs is a tiny sample, a far bigger study would be needed to get any meaningful results here. Second, it really depends on how experienced people are at using these tools. It took me a while to identify patterns that actually work repeatably and develop intuition for cases where the model is most likely to produce good results.

this post was submitted on 20 Jul 2025

13 points (88.2% liked)

Technology

39109 readers

98 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago

MODERATORS

MinutePhrase@lemmy.ml