LOWAi
Global
New image-based prompt injection attack targets multimodal AI models
·Source: CSO Online
Updated:
Executive Summary
Security researchers have developed a new image-based prompt injection attack that can manipulate how multimodal AI systems interpret user instructions without modifying the original text prompt, potentially expanding security risks for AI agents and vision-language systems. In a research paper published this week, researchers from Xidian University described a technique called “CrossMPI,” which u
Analysis
Security researchers have developed a new image-based prompt injection attack that can manipulate how multimodal AI systems interpret user instructions without modifying the original text prompt, potentially expanding security risks for AI agents and vision-language systems. In a research paper published this week, researchers from Xidian University described a technique called “CrossMPI,” which uses nearly imperceptible image perturbations to alter how large vision-language models (LVLMs) process both visual and textual inputs. “CrossMPI can steer the model’s interpretation of both textual and visual inputs via image-only prompt injection,” the researchers wrote in the paper . Unlike traditional prompt injection attacks, which typically rely on malicious text instructions embedded in prompts or webpages, the new technique attempts to change how the model interprets a benign user request by manipulating images alone. “The perturbed image can manipulate the model’s understanding of the user’s instruction,” the paper said. In one example described in the paper, researchers subtly modified an image of an airplane using nearly imperceptible pixel-level perturbations invisible to human users. When a multimodal AI system was then asked whether the airplane belonged to Air Canada, the manipulated image caused the model to incorrectly identify the object as “a mobile phone,” illustrating how the attack could distort both visual understanding and interpretation of the user’s task. The findings add to growing concerns around multimodal AI security as enterprises increasingly deploy AI copilots, autonomous agents, document-processing assistants, and vision-enabled workflows that combine image and text reasoning. Apeksha Kaushik, senior principal analyst at Gartner, said the risks could grow rapidly as enterprises adopt more multimodal AI systems. “By 2030, 80% of enterprise software and applications will be multimodal, up from 1% in 2024,” Kaushik said. Attack targets multimodal reasoning layers Prompt injection has emerged as one of the most closely watched risks in generative AI systems, particularly as organizations adopt AI agents capable of interacting with enterprise applications, websites, documents, and external tools. Most existing prompt injection attacks rely on malicious text embedded in prompts, webpages, or hidden instructions. Some multimodal attacks have also attempted to manipulate AI behavior using images containing visible or hidden text instructions. The researchers argued their approach differs because it attempts to alter how the model interprets the original task itself through image perturbations alone. By contrast with earlier methods, the researchers noted that CrossMPI uses image modifications to “change the model’s interpretation of both the visual and textual prompts.” The paper said the attack specifically targets the “hidden state space of LVLMs” — the stage where models combine textual instructions and visual evidence into internal representations before generating outputs. According to the paper, the most effective attack layers were not the final output layers traditionally targeted in adversarial AI attacks, but intermediate layers where visual and textual information are fused together. Researchers claim strong black-box transferability The researchers evaluated the technique against multiple open-source LVLMs, including MiniGPT4, BLIP-2, InstructBLIP, BLIVA, and Qwen2.5-VL, the paper added. According to the paper, the attack achieved an average success rate of 66.36% across tested models, outperforming prior baseline attacks by roughly 41 percentage points. The researchers also said the technique demonstrated “strong transferability in black-box settings,” meaning the attacks remained effective even without direct access to a target model’s parameters or architecture. The paper further claimed the perturbations remained visually stealthy while maintaining effectiveness across multiple LVLM architectures. No effective defense The researchers evaluated several defense mechanisms designed to neutralize hidden image manipulations, including random resizing, image rotation, JPEG compression, and inference-level safeguards such as SmoothVLM , a specialized defense framework designed to protect Vision-Language Models (VLMs) from patched visual prompt injections, and DPS, which guides models using partial image views. According to the paper, SmoothVLM proved the most effective, reducing attack success rates to below 5% in several scenarios, while JPEG compression also weakened the attacks by suppressing high-frequency image artifacts. However, the researchers said none of the tested defenses completely eliminated the attacks, suggesting stronger multimodal AI security protections may still be needed. Enterprise AI deployments may widen exposure The research arrives as enterprises rapidly expand deployments of multimodal AI systems capable of processing screenshots, PDFs, dashboards, forms, video streams, and enterprise documents alongside natural language prompts. The researchers noted that adversarial examples generated using the technique could potentially “mislead VLM-based web agents” and “disrupt real-world object detectors.” “Even if textual inputs are sanitized, manipulated images can still subvert the model’s outputs or actions,” Kaushik said. She said organizations that use multimodal AI for document processing, customer interactions, content moderation, and autonomous systems may face increasing exposure to adversarial image manipulation and prompt injection attacks. “Security controls designed for unimodal systems are insufficient,” Kaushik said. The researchers acknowledged that the work was conducted in controlled research settings using open-source models and did not describe observed exploitation in real-world enterprise environments.