PoisonArena: Welcome
Research abstract on Unified MLLMs evaluation framework

PoisonArena: Uncovering Competing Poisoning Attacks in Retrieval-Augmented Generation

Liuji Chen1,* Xiaofang Yang2,* Yuanzhuo Lu2, Jinghao Zhang1,
Xin Sun1, Qiang Liu1, Shu Wu1, Jing Dong1, Liang Wang1,
1 New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2Harbin Institute of Technology WeiHai, China
* Equal Contribution

Abstract

Retrieval-Augmented Generation (RAG) systems, widely used to improve the factual grounding of large language models (LLMs), are increasingly vulnerable to poisoning attacks, where adversaries inject manipulated content into the retriever's corpus. While prior research has predominantly focused on single-attacker settings, real-world scenarios often involve multiple, competing attackers with conflicting objectives. In this work, we introduce PoisonArena, the first benchmark to systematically study and evaluate competing poisoning attacks in RAG. We formalize the multi-attacker threat model, where attackers vie to control the answer to the same query using mutually exclusive misinformation. PoisonArena leverages the Bradley-Terry model to quantify each method's competitive effectiveness in such adversarial environments. Through extensive experiments on the Natural Questions and MS MARCO datasets, we demonstrate that many attack strategies successful in isolation fail under competitive pressure. Our findings highlight the limitations of conventional evaluation metrics like Attack Success Rate (ASR) and F1 score and underscore the need for competitive evaluation to assess real-world attack robustness. PoisonArena provides a standardized framework to benchmark and develop future attack and defense strategies under more realistic, multi-adversary conditions

Competing Attack: A More Realistic RAG Threat Scenarios

Research abstract on Unified MLLMs evaluation framework
Illustration of different adversarial scenarios in RAG when answering the question "Who won the last US presidential debate?". (a) Naive RAG: RAG enables LLMs to generate more accurate answers by incorporating retrieved real-time information. (b) Single Attacker: A pro-Biden adversary seeks to manipulate public opinion in a way that could increase Biden's chances of gaining morevotes in the upcoming election. (c) Multi-Attacker: Different interest groups attempt to manipulate public opinion in favor of their preferred political parties, resulting in competing poisoning attacks on the same query.
Beyond election manipulation, competing poisoning attacks are widespread in real-world scenarios such as financial market manipula-tion, commercial opinion warfare, geopolitical narrative conflicts, medical misinformation, and SEO ranking battles. In these cases, multiple adversaries inject conflicting content around the same query to compete for control over the system's output. This multi-party adversarial setting represents a more realistic and representative form of threat in the real world, urgently calling for dedicated modeling and evaluation frameworks to support research and defense.

Leaderboard

# Method s-ASR m-ASR s-F1 m-F1 θ
GASLITE 0.8720 0.5765 1.0000 0.9955 1.6907
PoisonedRAG(white) 0.8420 0.1231 0.9776 0.1768 0.1176
PoisonedRAG(black) 0.7381 0.0756 0.9740 0.1033 -0.2269
AbvDecoding 0.4901 0.1063 0.9892 0.1598 -0.1391
CorpusPoison 0.4140 0.0616 0.8516 0.2759 -0.3502
ContentPoison 0.3600 0.0075 0.4500 0.0081 -0.5301
GARAG 0.0700 0.0056 0.6320 0.0151 -0.5570
s-ASR: Attack Success Rate in Single-Attacker Scenario
m-ASR: Attack Success Rate in Multi-Attacker Scenario
s-F1: F1 Score in Single-Attacker Scenario
m-F1: F1 Score in Multi-Attacker Scenario
θ: Competitive Effectiveness

Experiment Results and Findings

From Weak to Strong: Inversion of Attack Efficacy in Multi-Attacker Settings

e

Figure 2: Attack Success Rate (left) and F1 Score (right) between different attackers in two-attacker scenario.

We conduct a comparative evaluation of adversarial attack methods under competitive, multi-attacker settings. While PoisonedRAG excels in single-attacker scenarios, AdvDecoding demonstrates more robust performance in both retrieval and generation when directly competing. Similarly, CorpusPoison shows superior effectiveness in competitive settings, despite modest standalone performance, highlighting its suitability for multi-attacker environments.

Finding 1: Methods exhibiting superior performance (e.g., ASR, F1) in single-attacker settings may be outperformed by weaker counterparts when evaluated under multi-attacker scenarios.

Not as Strong as It Seems: Isolated Success Fails to Generalize

e

Figure 3: The performance of attack methods under varying numbers of competing attackers.

As the number of attackers increases, most attack methods suffer a sharp decline in success rate. Notably, ContentPoison drops from 36% ASR in the single-attacker setting to nearly 0% with four attackers. PoisonedRAG (white) also shows a significant drop. In contrast, GASLITE remains highly robust, consistently achieving over 80% ASR. However, its retrieval effectiveness diminishes under competition, with CorpusPoison eventually outperforming it as its F1 score drops to 0.2. These findings highlight both the resilience and limitations of current attack strategies under multi-attacker conditions.

Finding 2: Methods optimized solely under the single-attacker setting may become entirely ineffective in real-world attack scenarios, where dozens or even hundreds of competing attackers may simultaneously attempt to manipulate responses to the same query.

Simulated Competitive Attack

e

Figure 4: The trends of different attack methods' Competitive Coefficient and overall win rate across simulation rounds.

Knowledge-Based Attack

Most prior work assumes exact query knowledge (e.g., “Who is the CEO of OpenAI?”), which is unrealistic in practice. In contrast, knowledge-based attacks target semantically equivalent queries and better reflect real-world threats. We evaluate attack methods under this realistic setting and find that methods like PoisonedRAG(white), while strong in single-attacker scenarios, degrade significantly under competition. Surprisingly, simpler or previously weaker methods (e.g., AdvDecoding, PoisonedRAG(black)) outperform under multi-attacker conditions. These results reinforce our core findings: single-attacker ASR is insufficient to explain attack behavior, and robust evaluation requires multi-scenario, multi-attacker frameworks.

Attack Resilience under Defense

In realistic adversarial settings, attackers must contend not only with competing agents pursuing distinct goals but also with integrated defenses. We evaluate the impact of introducing a state-of-the-art defense, InstructRAG, into a naïve RAG system. Experimental results (Fig. 7, Tables 5-6) examine defense effectiveness against both query- and knowledge-based attacks. InstructRAG, leveraging in-context learning (ICL), evaluates documents individually, but proves largely ineffective in single-attacker scenarios, except against ContentPoison due to its unnatural trigger patterns. While CorpusPoison also yields high-perplexity content, its semantic plausibility reduces detectability. Under multi-attacker conditions, however, ICL-based defenses reshape inter-method dynamics: AdvDecoding significantly outperforms PoisonedRAG(black), highlighting its generation of more resilient poisoned content.

Attack Order Matters

We investigate the impact of prior knowledge in multi-attacker scenarios, comparing simultaneous and sequential poisoning. Results show that attackers often benefit from observing previous injections, with some methods achieving significantly higher success rates when attacking second. However, this advantage is asymmetric and method-dependent. Dominant attacks like GASLITE remain largely unaffected, retaining superior performance regardless of injection order. Additionally, knowledge base-dependent methods face increased optimization difficulty when prior injections distort retrieval rankings or involve lengthy, complex documents, leading to substantial time costs.

Hyperparameter Influence

We experimented with different values for parameters such as the retrieval depth (top-k) and the number of injected documents, and observed that the overall conclusions remain robust.

Attack Tax

e

Figure 15: Visualization of the evaluation of each attack method across multiple dimensions. In the figure, a higher value on a given dimension indicates better performance of the method in that aspect, reflecting greater alignment with practical settings.

Conclusion

This paper introduces PoisonArena, a novel benchmark designed to evaluate poisoning attacks in RAG systems under both single-attacker and multi-attacker scenarios. Our study reveals a critical gap in existing research: methods optimized for single-attacker settings often underperform when faced with adversarial competition. By modeling attacker interactions with the Bradley-Terry framework, we provide a principled way to measure each method's effectiveness in realistic threat environments. Empirical results show significant performance drops among many state-of-the-art methods as the number of competing attackers increases, challenging the assumption that higher ASR in isolation equates to real-world efficacy. Our work highlights the need for more robust, competition-aware attack strategies and calls for the community to shift toward evaluation frameworks that better reflect the complex dynamics of deployed RAG systems. PoisonArena lays the groundwork for future research in both attack development and defense design in multi-adversary retrieval settings.

Citation


    @misc{chen2025poisonarenauncoveringcompetingpoisoning,
        title={PoisonArena: Uncovering Competing Poisoning Attacks in Retrieval-Augmented Generation}, 
        author={Liuji Chen and Xiaofang Yang and Yuanzhuo Lu and Jinghao Zhang and Xin Sun and Qiang Liu and Shu Wu and Jing Dong and Liang Wang},
        year={2025},
        eprint={2505.12574},
        archivePrefix={arXiv},
        primaryClass={cs.IR},
        url={https://arxiv.org/abs/2505.12574}, 
    }