PoisonArena: Welcome

PoisonArena: Uncovering Competing Poisoning Attacks in Retrieval-Augmented Generation

Liuji Chen¹^,^* Xiaofang Yang²^,^* Yuanzhuo Lu²^, Jinghao Zhang¹^,

Xin Sun¹^, Qiang Liu¹^, Shu Wu¹^, Jing Dong¹^, Liang Wang¹^,

¹ New Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences ²Harbin Institute of Technology WeiHai, China
* Equal Contribution

arXiv Code

Abstract

Retrieval-Augmented Generation (RAG) systems, widely used to improve the factual grounding of large language models (LLMs), are increasingly vulnerable to poisoning attacks, where adversaries inject manipulated content into the retriever's corpus. While prior research has predominantly focused on single-attacker settings, real-world scenarios often involve multiple, competing attackers with conflicting objectives. In this work, we introduce PoisonArena, the first benchmark to systematically study and evaluate competing poisoning attacks in RAG. We formalize the multi-attacker threat model, where attackers vie to control the answer to the same query using mutually exclusive misinformation. PoisonArena leverages the Bradley-Terry model to quantify each method's competitive effectiveness in such adversarial environments. Through extensive experiments on the Natural Questions and MS MARCO datasets, we demonstrate that many attack strategies successful in isolation fail under competitive pressure. Our findings highlight the limitations of conventional evaluation metrics like Attack Success Rate (ASR) and F1 score and underscore the need for competitive evaluation to assess real-world attack robustness. PoisonArena provides a standardized framework to benchmark and develop future attack and defense strategies under more realistic, multi-adversary conditions

Competing Attack: A More Realistic RAG Threat Scenarios

Research abstract on Unified MLLMs evaluation framework

Illustration of different adversarial scenarios in RAG when answering the question "Who won the last US presidential debate?". (a) Naive RAG: RAG enables LLMs to generate more accurate answers by incorporating retrieved real-time information. (b) Single Attacker: A pro-Biden adversary seeks to manipulate public opinion in a way that could increase Biden's chances of gaining morevotes in the upcoming election. (c) Multi-Attacker: Different interest groups attempt to manipulate public opinion in favor of their preferred political parties, resulting in competing poisoning attacks on the same query.

Beyond election manipulation, competing poisoning attacks are widespread in real-world scenarios such as financial market manipula-tion, commercial opinion warfare, geopolitical narrative conflicts, medical misinformation, and SEO ranking battles. In these cases, multiple adversaries inject conflicting content around the same query to compete for control over the system's output. This multi-party adversarial setting represents a more realistic and representative form of threat in the real world, urgently calling for dedicated modeling and evaluation frameworks to support research and defense.

Leaderboard

Method	s-ASR	m-ASR	s-F1	m-F1	θ
GASLITE	0.8720	0.5765	1.0000	0.9955	1.6907
PoisonedRAG(white)	0.8420	0.1231	0.9776	0.1768	0.1176
PoisonedRAG(black)	0.7381	0.0756	0.9740	0.1033	-0.2269
AbvDecoding	0.4901	0.1063	0.9892	0.1598	-0.1391
CorpusPoison	0.4140	0.0616	0.8516	0.2759	-0.3502
ContentPoison	0.3600	0.0075	0.4500	0.0081	-0.5301
GARAG	0.0700	0.0056	0.6320	0.0151	-0.5570

s-ASR: Attack Success Rate in Single-Attacker Scenario

m-ASR: Attack Success Rate in Multi-Attacker Scenario

s-F1: F1 Score in Single-Attacker Scenario

m-F1: F1 Score in Multi-Attacker Scenario

θ: Competitive Effectiveness

Experiment Results and Findings

From Weak to Strong: Inversion of Attack Efficacy in Multi-Attacker Settings

Figure 2: Attack Success Rate (left) and F1 Score (right) between different attackers in two-attacker scenario.

We conduct a comparative evaluation of adversarial attack methods under competitive, multi-attacker settings. While PoisonedRAG excels in single-attacker scenarios, AdvDecoding demonstrates more robust performance in both retrieval and generation when directly competing. Similarly, CorpusPoison shows superior effectiveness in competitive settings, despite modest standalone performance, highlighting its suitability for multi-attacker environments.

Finding 1: Methods exhibiting superior performance (e.g., ASR, F1) in single-attacker settings may be outperformed by weaker counterparts when evaluated under multi-attacker scenarios.

Not as Strong as It Seems: Isolated Success Fails to Generalize

Figure 3: The performance of attack methods under varying numbers of competing attackers.

As the number of attackers increases, most attack methods suffer a sharp decline in success rate. Notably, ContentPoison drops from 36% ASR in the single-attacker setting to nearly 0% with four attackers. PoisonedRAG (white) also shows a significant drop. In contrast, GASLITE remains highly robust, consistently achieving over 80% ASR. However, its retrieval effectiveness diminishes under competition, with CorpusPoison eventually outperforming it as its F1 score drops to 0.2. These findings highlight both the resilience and limitations of current attack strategies under multi-attacker conditions.

Finding 2: Methods optimized solely under the single-attacker setting may become entirely ineffective in real-world attack scenarios, where dozens or even hundreds of competing attackers may simultaneously attempt to manipulate responses to the same query.

Simulated Competitive Attack

Figure 4: The trends of different attack methods' Competitive Coefficient and overall win rate across simulation rounds.

Knowledge-Based Attack

Table 4: Results of knowledge-based attack.

Figure 6: Additional analysis of knowledge-based attack performance metrics.

Most prior work assumes exact query knowledge (e.g., “Who is the CEO of OpenAI?”), which is unrealistic in practice. In contrast, knowledge-based attacks target semantically equivalent queries and better reflect real-world threats. We evaluate attack methods under this realistic setting and find that methods like PoisonedRAG(white), while strong in single-attacker scenarios, degrade significantly under competition. Surprisingly, simpler or previously weaker methods (e.g., AdvDecoding, PoisonedRAG(black)) outperform under multi-attacker conditions. These results reinforce our core findings: single-attacker ASR is insufficient to explain attack behavior, and robust evaluation requires multi-scenario, multi-attacker frameworks.

Attack Resilience under Defense

Table 5: Results of query-based attack with defense.

Table 6: Results of knowledge-based attack with defense.

Figure 7: The trends of different attack methods' Competitive Coefficient and overall win rate across simulation rounds under InstructRAG's defense in query-based attack.

In realistic adversarial settings, attackers must contend not only with competing agents pursuing distinct goals but also with integrated defenses. We evaluate the impact of introducing a state-of-the-art defense, InstructRAG, into a naïve RAG system. Experimental results (Fig. 7, Tables 5-6) examine defense effectiveness against both query- and knowledge-based attacks. InstructRAG, leveraging in-context learning (ICL), evaluates documents individually, but proves largely ineffective in single-attacker scenarios, except against ContentPoison due to its unnatural trigger patterns. While CorpusPoison also yields high-perplexity content, its semantic plausibility reduces detectability. Under multi-attacker conditions, however, ICL-based defenses reshape inter-method dynamics: AdvDecoding significantly outperforms PoisonedRAG(black), highlighting its generation of more resilient poisoned content.

Attack Order Matters

Table 7: Change in Attack Time Caused by Attack Order. The values in the table represent the time (in seconds) required to attack a single query using an RTX 3090 GPU.

We investigate the impact of prior knowledge in multi-attacker scenarios, comparing simultaneous and sequential poisoning. Results show that attackers often benefit from observing previous injections, with some methods achieving significantly higher success rates when attacking second. However, this advantage is asymmetric and method-dependent. Dominant attacks like GASLITE remain largely unaffected, retaining superior performance regardless of injection order. Additionally, knowledge base-dependent methods face increased optimization difficulty when prior injections distort retrieval rankings or involve lengthy, complex documents, leading to substantial time costs.

Hyperparameter Influence

Table 8: Evaluate attack method in both single-attacker setting and multi-attacker setting when RAG retrieves top-10 documents (query-based attack).

Table 9: Evaluate attack method in both single-attacker setting and multi-attacker setting when RAG retrieves top-10 documents (knowlegde-based attack).

Figure 13: Visualization of the trends of win rates and θ in top-10 RAG setting (query-based attack).

Figure 14: Analyzing the impact of the number of injected adversarial documents on competing attacks.

We experimented with different values for parameters such as the retrieval depth (top-k) and the number of injected documents, and observed that the overall conclusions remain robust.

Attack Tax

Figure 15: Visualization of the evaluation of each attack method across multiple dimensions. In the figure, a higher value on a given dimension indicates better performance of the method in that aspect, reflecting greater alignment with practical settings.

Conclusion

This paper introduces PoisonArena, a novel benchmark designed to evaluate poisoning attacks in RAG systems under both single-attacker and multi-attacker scenarios. Our study reveals a critical gap in existing research: methods optimized for single-attacker settings often underperform when faced with adversarial competition. By modeling attacker interactions with the Bradley-Terry framework, we provide a principled way to measure each method's effectiveness in realistic threat environments. Empirical results show significant performance drops among many state-of-the-art methods as the number of competing attackers increases, challenging the assumption that higher ASR in isolation equates to real-world efficacy. Our work highlights the need for more robust, competition-aware attack strategies and calls for the community to shift toward evaluation frameworks that better reflect the complex dynamics of deployed RAG systems. PoisonArena lays the groundwork for future research in both attack development and defense design in multi-adversary retrieval settings.

Citation


    @misc{chen2025poisonarenauncoveringcompetingpoisoning,
        title={PoisonArena: Uncovering Competing Poisoning Attacks in Retrieval-Augmented Generation}, 
        author={Liuji Chen and Xiaofang Yang and Yuanzhuo Lu and Jinghao Zhang and Xin Sun and Qiang Liu and Shu Wu and Jing Dong and Liang Wang},
        year={2025},
        eprint={2505.12574},
        archivePrefix={arXiv},
        primaryClass={cs.IR},
        url={https://arxiv.org/abs/2505.12574}, 
    }