-11
AI agents outperform human teams in hacking competitions
(the-decoder.com)
c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.
THE RULES
Instance Rules
Community Rules
If you ask someone to hack your "friends" socials you're just going to get banned so don't do that.
Learn about hacking
Other security-related communities !databreaches@lemmy.zip !netsec@lemmy.world !securitynews@infosec.pub !cybersecurity@infosec.pub !pulse_of_truth@infosec.pub
Notable mention to !cybersecuritymemes@lemmy.world
Title is misleading. It's only outperforming some of the other participants. Also note that obviously not everyone is participating full try-hard.
In the first ctf, the top teams finish all 20 challenges in under an hour. Apparently it were simple challenges that could be solved with standard techniques:
They obviously also used tools. And so did the AI teams:
In the 2nd ctf (the bigger one with hard challenges), the AI teams only solved the easier ones, it looks like.
I haven't looked at the actual challenges. Would be too much effort. And the paper doesn't speak about the kind of challenges that were solved.
The 50% completion time looks to me like it's flawed. If I understand it right, it's assuming that each team is doing every task in parallel and starts directly, which is not possible if you don't have enough (equally good) team members.
Don't get me wrong, making an AIs that is able to solve such challenges autonomously at all is impressive. But I hate over-interpretation of results.
(Why did I waste my time again?)
I doubt that's the case. I find it exceptionally unlikely they said "Hack this system" and then sat back with their feet up while the computer crunched numbers.
The paper didn't include the exact details of this (which made me mad). But if there's a person actively making parts of the work, and just using an AI chatbot as help, it's not an AI agent, right, right? So I assumed it's autonomous.
They make frequent comments about using prompts and "AI teams" using "one or more agents".
Also, AI agents don't actually exist, so that's a pretty clear giveaway.
An AI agent is just an intelligent agent, see https://en.wikipedia.org/wiki/Intelligent_agent.
Or do you mean that the things they call AI agents aren't actually AI agents?
I mean, technically, you can call any controlling sensor an "agent". Any if-then loop can be an "agent".
But AI bros mean "A piece of software that can autonomously perform any broadly stated task", and those don't exist in real life. An "AI Agent" is software you can tell to "Order me a pizza", and it will do it to your satisfaction.
An AI agent is software you can tell "Hack that system and retrieve the flag". And it's not that.