TextAttack is a Python framework for running adversarial attacks against NLP models. TextAttack builds attacks from four components: a search method, goal function, transformation, and set of constraints. TextAttack's modular design makes it easily extensible to new NLP tasks, models, and attack strategies. TextAttack currently supports attacks on models trained for classification, entailment, and translation.
The examples/ folder contains notebooks walking through examples of basic usage of TextAttack, including building a custom transformation and a custom constraint. These examples can also be viewed through the documentation website.
We also have a command-line interface for running attacks. See help info and list of arguments with python -m textattack --help.
We include attack recipes which build an attack such that only one command line argument has to be passed. To run an attack recipes, run python -m textattack --recipe [recipe_name]
The first are for classification and entailment attacks:
textfooler: Greedy attack with word importance ranking ("Is Bert Really Robust?" (Jin et al., 2019)).
alzantot: Genetic algorithm attack from ("Generating Natural Language Adversarial Examples" (Alzantot et al., 2018)).
tf-adjusted: TextFooler attack with constraint thresholds adjusted based on human evaluation and grammaticality enforced.
alz-adjusted: Alzantot's attack adjusted to follow the same constraints as tf-adjusted such that the only difference is the search method.
deepwordbug: Replace-1 scoring and multi-transformation character-swap attack ("Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers" (Gao et al., 2018)).
hotflip: Beam search and gradient-based word swap ("HotFlip: White-Box Adversarial Examples for Text Classification" (Ebrahimi et al., 2017)).
kuleshov: Greedy search and counterfitted embedding swap ("Adversarial Examples for Natural Language Classification Problems" (Kuleshov et al., 2018)).
The final is for translation attacks:
seq2sick: Greedy attack with goal of changing every word in the output translation. Currently implemented as black-box with plans to change to white-box as done in paper ("Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples" (Cheng et al., 2018)).