Xuan Chen

Xuan Chen

CS Ph.D. Student

Purdue University

About

I am a Ph.D. student in the Computer Science Department at Purdue University, advised by Professor Xiangyu Zhang. I also work closely with Professor Wenbo Guo. Before joining Purdue, I obtained my master’s degree in Electrical and Computer Engineering at Carnegie Mellon University and my bachelor’s degree from University of Science and Technology Beijing.

My research interests lie in machine learning security, especially backdoor attacks and defenses and using reinformcement learning to solve complex security problems.

Interests

  • machine learning security
  • backdoor attacks and defenses

Education

  • MSc in Electrical and Computer Engineering, 2020

    Carnegie Mellon University

  • BSc in Electrical Engineering, 2019

    University of Science and Technology Beijing

Publications

ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Backdoor attacks have emerged as a prominent threat to natural language processing (NLP) models, where the presence of specific triggers in the input can lead poisoned models to misclassify these inputs to predetermined target classes. Current detection mechanisms are limited by their inability to address more covert backdoor strategies, such as style-based attacks. In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the interpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to stay stealthy. Based on this observation, we hypothesize that while the model’s predictions for paraphrased clean samples should remain stable, predictions for poisoned samples should revert to their true labels upon the mutations applied to triggers during the paraphrasing process. We employ ChatGPT, a state-of-the-art large language model, as our paraphraser and formulate the trigger-removal task as a prompt engineering problem. We adopt fuzzing, a technique commonly used for unearthing software vulnerabilities, to discover optimal paraphrase prompts that can effectively eliminate triggers while concurrently maintaining input semantics. Experiments on 4 types of backdoor attacks, including the subtle style backdoors, and 4 distinct datasets demonstrate that our approach surpasses baseline methods, including STRIP, RAP, and ONION, in precision and recall.

Projects

Explanation for Deep Reinforcement Learning

Behavior-level explanations of the Deep-Q-Network trained agents.

Uncertainty Calibration for Neural Network Output

Trainable and Diffientiable Loss for Classification Neural Network.