submitted 1 month ago by remixtures@tldr.nettime.org to c/cybersecurity@fedia.io

1 comments fedilink hide all child comments

"Recent advances in operating system (OS) agents enable vision-language models to interact directly with the graphical user interface of an OS. These multimodal OS agents autonomously perform computer-based tasks in response to a single prompt via application programming interfaces (APIs). Such APIs typically support low-level operations, including mouse clicks, keyboard inputs, and screenshot captures. We introduce a novel attack vector: malicious image patches (MIPs) that have been adversarially perturbed so that, when captured in a screenshot, they cause an OS agent to perform harmful actions by exploiting specific APIs. For instance, MIPs embedded in desktop backgrounds or shared on social media can redirect an agent to a malicious website, enabling further exploitation. These MIPs generalise across different user requests and screen layouts, and remain effective for multiple OS agents. The existence of such attacks highlights critical security vulnerabilities in OS agents, which should be carefully addressed before their widespread adoption."

#AI #GenerativeAI #LLMs #CyberSecurity #APIs #OS #AIAgents

top 1 comments

sorted by: hot top controversial new old

[-] corsicanguppy@lemmy.ca 1 points 1 month ago

Slop, posted by Ai? Is there a slopper even involved, or did another Ai algorithmically decide what the most click-baity prompt should be?

this post was submitted on 04 Jun 2025

4 points (100.0% liked)

Cybersecurity

2 readers

55 users here now

An umbrella community for all things cybersecurity / infosec. News, research, questions, are all welcome!

Rules

Community Rules

Be kind
Limit promotional activities
Non-cybersecurity posts should be redirected to other communities within infosec.pub.

founded 2 years ago

MODERATORS

shellsharks@fedia.io

tweedge@fedia.io