Publications

(2025). Temporal Logic-Based Multi-Vehicle Backdoor Attacks against Offline RL Agents in End-to-end Autonomous Driving. In NeurIPS.
(2025). INTENTEST: Stress Testing for Intent Integrity in API-Calling LLM Agents. In NeurIPS.
(2025). ASPIRER: Bypassing System Prompts with Permutation-based Backdoors in LLMs. In ACL.
(2024). When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search. In NeurIPS.
(2024). RL-JACK: Reinforcement Learning-powered Black-box Jailbreaking Attack against LLMs.
(2023). Dynamics Model Based Adversarial Training For Competitive Reinforcement Learning. In NeurIPS.
(2023). ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP. In NeurIPS.
(2023). BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning. In NeurIPS.