18
top 38 comments
sorted by: hot top controversial new old
[-] tabarnaski@sh.itjust.works 6 points 2 weeks ago

The [AI] safety stuff is more visceral to me after a weekend of vibe hacking,” Lemkin said. I explicitly told it eleven times in ALL CAPS not to do this. I am a little worried about safety now.

This sounds like something straight out of The Onion.

[-] Natanael@infosec.pub 2 points 2 weeks ago

The Pink Elephant problem of LLMs. You can not reliably make them NOT do something.

[-] ChaoticEntropy@feddit.uk 1 points 2 weeks ago

Even after he used "ALL CAPS"?!? Impossible!

[-] Transtronaut@lemmy.blahaj.zone 4 points 2 weeks ago

The founder of SaaS business development outfit SaaStr has claimed AI coding tool Replit deleted a database despite his instructions not to change any code without permission.

Sounds like an absolute diSaaStr...

If an LLM can delete your production database, it should

[-] ohshit604@sh.itjust.works 1 points 2 weeks ago

And the backups.

[-] LovableSidekick@lemmy.world 3 points 2 weeks ago* (last edited 2 weeks ago)

Headling should say, "Incompetent project managers fuck up by not controlling production database access. Oh well."

[-] nobleshift@lemmy.world 3 points 2 weeks ago

So it's the LLM's fault for violating Best Practices, SOP, and Opsec that the rest of us learned about in Year One?

Someone needs to be shown the door and ridiculed into therapy.

[-] mrgoosmoos@lemmy.ca 2 points 2 weeks ago

His mood shifted the next day when he found Replit “was lying and being deceptive all day. It kept covering up bugs and issues by creating fake data, fake reports, and worse of all, lying about our unit test.”

yeah that's what it does

[-] panda_abyss@lemmy.ca 2 points 2 weeks ago

I explicitly told it eleven times in ALL CAPS not to do this. I am a little worried about safety now.

Well then, that settles it, this should never have happened.

I don’t think putting complex technical info in front of non technical people like this is a good idea. When it comes to LLMs, they cannot do any work that you yourself do not understand.

That goes for math, coding, health advice, etc.

If you don’t understand then you don’t know what they’re doing wrong. They’re helpful tools but only in this context.

[-] dejected_warp_core@lemmy.world 3 points 2 weeks ago

I explicitly told it eleven times in ALL CAPS not to do this. I am a little worried about safety now.

This baffles me. How can anyone see AI function in the wild and not conclude 1) it has no conscience, 2) it's free to do whatever it's empowered to do if it wants and 3) at some level its behavior is pseudorandom and/or probabilistic? We're figuratively rolling dice with this stuff.

[-] panda_abyss@lemmy.ca 1 points 2 weeks ago

It’s incredible that it works, it’s incredible what just encoding language can do, but it is not a rational thinking system.

I don’t think most people care about the proverbial man behind the curtain, it talks like a human so it must be smart like a human.

[-] LilB0kChoy@midwest.social 1 points 2 weeks ago

When it comes to LLMs, they cannot do any work that you yourself do not understand.

And even if they could how would you ever validate it if you can't understand it.

[-] vxx@lemmy.world 0 points 2 weeks ago* (last edited 2 weeks ago)

What are they helpful tools for then? A study showed that they make experienced developers 19% slower.

[-] panda_abyss@lemmy.ca -1 points 2 weeks ago

Vibe coding you do end up spending a lot of time waiting for prompts, so I get the results of that study.

I fall pretty deep in the power user category for LLMs, so I don’t really feel that the study applies well to me, but also I acknowledge I can be biased there.

I have custom proprietary MCPs for semantic search over my code bases that lets AI do repeated graph searches on my code (imagine combining language server, ctags, networkx, and grep+fuzzy search). That is way faster than iteratively grepping and code scanning manually with a low chance of LLM errors. By the time I open GitHub code search or run ripgrep Claude has used already prioritized and listed my modules to investigate.

That tool alone with an LLM can save me half a day of research and debugging on complex tickets, which pays for an AI subscription alone. I have other internal tools to accelerate work too.

I use it to organize my JIRA tickets and plan my daily goals. I actually get Claude to do a lot of triage for me before I even start a task, which cuts the investigation phase to a few minutes on small tasks.

I use it to review all my PRs before I ask a human to look, it catches a lot of small things and can correct them, then the PR avoids the bike shedding nitpicks some reviewers love. Claude can do this, Copilot will only ever point out nitpicks, so the model makes a huge difference here. But regardless, 1 fewer review request cycle helps keep things moving.

It’s a huge boon to debugging — much faster than searching errors manually. Especially helpful on the types of errors you have to rabbit hole GitHub issue content chains to solve.

It’s very fast to get projects to MVP while following common structure/idioms, and can help write unit tests quickly for me. After the MVP stage it sucks and I go back to manually coding.

I use it to generate code snippets where documentation sucks. If you look at the ibis library in Python for example the docs are Byzantine and poorly organized. LLMs are better at finding the relevant docs than I am there. I mostly use LLM search instead of manual for doc search now.

I have a lot of custom scripts and calculators and apps that I made with it which keep me more focused on my actual work and accelerate things.

I regularly have the LLM help me write bash or python or jq scripts when I need to audit codebases for large refactors. That’s low maintenance one off work that can be easily verified but complex to write. I never remember the syntax for bash and jq even after using them for years.

I guess the short version is I tend to build tools for the AI, then let the LLM use those tools to improve and accelerate my workflows. That returns a lot of time back to me.

I do try vibe coding but end up in the same time sink traps as the study found. If the LLM is ever wrong, you save time forking the chat than trying to realign it, but it’s still likely to be slower. Repeat chats result in the same pitfalls for complex issues and bugs, so you have to abandon that state quickly.

Vibe coding small revisions can still be a bit faster and it’s great at helping me with documentation.

[-] vxx@lemmy.world 0 points 2 weeks ago* (last edited 2 weeks ago)

Don't you have any security concerns with sending all your code and JIRA tickets to some companies servers? My boss wouldn't be pleased if I send anything that's deemed a company secret over unencrypted channels.

[-] panda_abyss@lemmy.ca 0 points 2 weeks ago

The tool isn’t returning all code, but it is sending code.

I had discussions with my CTO and security team before integrating Claude code.

I have to use Gemini in one specific workflow and Gemini had a lot of landlines for how they use your data. Anthropic was easier to understand.

Anthropic also has some guidance for running Claude Code in a container with firewall and your specified dev tools, it works but that’s not my area of expertise.

The container doesn’t solve all the issues like using remote servers, but it does let you restrict what files and network requests Claude can access (so e.g. Claude can’t read your env vars or ssh key files).

I do try local LLMs but they’re not there yet on my machine for most use cases. Gemma 3n is decent if you need small model performance and tool calls, phi4 works but isn’t thinking (the thinking variants are awful), and I’m exploring dream coder and diffusion models. R1 is still one of the best local models but frequently overthinks, even the new release. Context window is the largest limiting factor I find locally.

[-] 6nk06@sh.itjust.works 0 points 2 weeks ago

I have to use Gemini in one specific workflow

I would love some story on why AI is needed at all.

[-] panda_abyss@lemmy.ca 1 points 2 weeks ago

Batch process turning unstructured free form text data into structured outputs.

As a crappy example imagine if you wanted to download metadata about your albums but they’re all labelled “Various Artists”. You can use an LLM call to read the album description and fix the track artists for the tracks, now you can properly organize your collection.

I’m using the same idea, different domain and a complex set of inputs.

It can be much more cost effective than manually spending days tagging data and writing custom importers.

You can definitely go lighter than LLMs. You can use gensim to do category matching, you can use sentence transformers and nearest neighbours (this is basically what Semantle does), but LLM performed the best on more complex document input.

[-] 6nk06@sh.itjust.works 1 points 2 weeks ago
[-] zerofk@lemmy.zip 2 points 2 weeks ago* (last edited 2 weeks ago)

in which the service admitted to “a catastrophic error of judgement”

It’s fancy text completion - it does not have judgement.

The way he talks about it shows he still doesn’t understand that. It doesn’t matter that you tell it simmering in ALL CAPS because that is no different from any other text.

[-] hisao@ani.social -1 points 2 weeks ago

Are you aware of generalization and it being able to infer things and work with facts in highly abstract way? Might not necessarily be judgement, but definitely more than just completion. If a model is capable of only completion (ie suggesting only the exact text strings present in its training set), it means it suffers from heavy underfitting in AI terms.

[-] ChairmanMeow@programming.dev 0 points 2 weeks ago

Completion is not the same as only returning the exact strings in its training set.

LLMs don't really seem to display true inference or abstract thought, even when it seems that way. A recent Apple paper demonstrated this quite clearly.

[-] hisao@ani.social -2 points 2 weeks ago

Coming up with even more vague terms to try to downplay it is missing the point. The point is simple: it's able to solve complex problems and do very impressive things that even human struggle to, in very short time. It doesn't really matter what we consider true abstract thought of true inference. If that is something humans do, then what it does might very well be more powerful than true abstract thought, because it's able to solve more complex problems and perform more complex pattern matching.

[-] ChairmanMeow@programming.dev 0 points 2 weeks ago

Well the thing is, LLMs don't seem to really "solve" complex problems. They remember solutions they've seen before.

The example I saw was asking an LLM to solve "Towers of Hanoi" with 100 disks. This is a common recursive programming problem, takes quite a while for a human to write the answer to. The LLM manages this easily. But when asked to solve the same problem with with say 79 disks, or 41 disks, or some other oddball number, the LLM fails to solve the problem, despite it being simpler(!).

It can do pattern matching and provide solutions, but it's not able to come up with truly new solutions. It does not "think" in that way. LLMs are amazing data storage formats, but they're not truly 'intelligent' in the way most people think.

[-] hisao@ani.social -1 points 2 weeks ago

This only proves some of them can't solve all complex problems. I'm only claiming some of them can solve some complex problems. Not only by remembering exact solutions, but by remembering steps and actions used in building those solutions, generalizing, and transferring them to new problems. Anyone who tries using it for programming, will discover this very fast.

PS: Some of them were already used to solve problems and find patterns in data humans weren't able to get other ways before (particle research in CERN, bioinformatics, etc).

[-] Jhex@lemmy.world 0 points 2 weeks ago

The point is simple: it's able to solve complex problems and do very impressive things that even human struggle to, in very short time

You mean like a calculator does?

[-] hisao@ani.social -1 points 2 weeks ago

Yeah, this is correct analogy, but much more complex problems than calculator. How much it is similar or not to humans way of thinking is completely irrelevant. And how much exact human type of thinking is necessary for any kind of problem solving or work is not something that we can really calculate. Considering that scientific breakthroughs, engineering innovations, medical stuff, complex math problems, programming, etc, do necessarily need human thinking or benefit from it as opposed to super advanced statistical meta-patterning calculator is wishful thinking. It is not based on any real knowledge we have. If you think it is wrong to give it our problems to solve, to give it our work, then it's a very understandable argument, but you should say exactly that. Instead this AI-hate hivemind tries to downplay it using dismissive braindead generic phrases like "NoPe ItS nOt ReAlLy UnDeRsTaNdInG aNyThInG". Okay, who tf asked? It solves the problem. People keep using it and become overpowered because of it. What is the benefit of trying to downplay its power like that? You're not really fighting it this way if you wanted to fight it.

[-] PattyMcB@lemmy.world 1 points 2 weeks ago

Aww... Vibe coding got you into trouble? Big shocker.

You get what you fucking deserve.

[-] dan@upvote.au 1 points 2 weeks ago* (last edited 2 weeks ago)

At this burn rate, I’ll likely be spending $8,000 month,” he added. “And you know what? I’m not even mad about it. I’m locked in.”

For that price, why not just hire a developer full-time? For nearly $100k/year, you could find a very good intermediate or senior developer even in Europe or the USA (outside of expensive places like Silicon Valley and New York).

The job market isn't great for developers at the moment - there's been lots of layoffs over the past few years and not enough new jobs for all the people who were laid off - so you'd absolutely find someone.

[-] tonytins@pawb.social 1 points 2 weeks ago* (last edited 2 weeks ago)

Corporations: "Employees are too expensive!"

Also, corporations: "$100k/yr for a bot? Sure."

[-] dan@upvote.au 0 points 2 weeks ago

There's a lot of other expenses with an employee (like payroll taxes, benefits, retirement plans, health plan if they're in the USA, etc), but you could find a self-employed freelancer for example.

Or just get an employee anyways because you'll still likely have a positive ROI. A good developer will take your abstract list of vague requirements and produce something useful and maintainable.

[-] TheReturnOfPEB@reddthat.com 1 points 2 weeks ago* (last edited 2 weeks ago)

the employee also gets to eat and have a place to live

which is nice

[-] Blackmist@feddit.uk 1 points 2 weeks ago

The world's most overconfident virtual intern strikes again.

Also, who the flying fuck are either of these companies? 1000 records is nothing. That's a fucking text file.

[-] towerful@programming.dev 1 points 2 weeks ago

Not mad about an estimated usage bill of $8k per month.
Just hire a developer

[-] KSPAtlas@sopuli.xyz 1 points 2 weeks ago

Replit is a vibe coding service now? Swear it just used to be a place to write code in projects

[-] codexarcanum@lemmy.dbzer0.com 0 points 2 weeks ago* (last edited 2 weeks ago)

It sounds like this guy was also relying on the AI to self-report status. Did any of this happen? Like is the replit AI really hooked up to a CLI, did it even make a DB to start with, was there anything useful in it, and did it actually delete it?

Or is this all just a long roleplaying session where this guy pretends to run a business and the AI pretends to do employee stuff for him?

Because 90% of this article is "I asked the AI and it said:" which is not a reliable source for information.

[-] eestileib@lemmy.blahaj.zone 1 points 2 weeks ago

It seemed like the llm had decided it was in a brat scene and was trying to call down the thunder.

this post was submitted on 21 Jul 2025
18 points (100.0% liked)

Technology

73657 readers
1825 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS