Trojan-and-Backdoor-Papers
编辑日期: 2024-08-12 文章阅读: 次
The list below contains curated papers and arXiv articles that are related to Trojan attacks, backdoor attacks, and data poisoning on neural networks and machine learning systems. They are ordered "approximately" from most to least recent and articles denoted with a "*" mention the TrojAI program directly. Some of the particularly relevant papers include a summary that can be accessed by clicking the "Summary" drop down icon underneath the paper link. These articles were identified using variety of methods including:
- A flair embedding created from the arXiv CS subset; details will be provided later.
- A trained ASReview random forest model
-
A curated manual literature review
-
Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors
-
Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
-
ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks
-
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
-
Physical Adversarial Attack meets Computer Vision: A Decade Survey
-
MARNet: Backdoor Attacks Against Cooperative Multi-Agent Reinforcement Learning
-
Not All Poisons are Created Equal: Robust Training against Data Poisoning
-
Evil vs evil: using adversarial examples against backdoor attack in federated learning
-
Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior
-
Defending Backdoor Attacks on Vision Transformer via Patch Processing
-
SentMod: Hidden Backdoor Attack on Unstructured Textual Data
-
Adversarial poisoning attacks on reinforcement learning-driven energy pricing
-
Hiding Needles in a Haystack: Towards Constructing Neural Networks that Evade Verification
-
TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors
-
BackdoorBench: A Comprehensive Benchmark of Backdoor Learning
-
Fooling a Face Recognition System with a Marker-Free Label-Consistent Backdoor Attack
-
Backdoor Attacks on Bayesian Neural Networks using Reverse Distribution
-
Design of AI Trojans for Evading Machine Learning-based Detection of Hardware Trojans
-
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning
-
Robust Anomaly based Attack Detection in Smart Grids under Data Poisoning Attacks
-
Disguised as Privacy: Data Poisoning Attacks against Differentially Private Crowdsensing Systems
-
LinkBreaker: Breaking the Backdoor-Trigger Link in DNNs via Neurons Consistency Check
-
Natural Backdoor Attacks on Deep Neural Networks via Raindrops
-
MPAF: Model Poisoning Attacks to Federated Learning based on Fake Clients
-
ADFL: A Poisoning Attack Defense Framework for Horizontal Federated Learning
-
Toward Realistic Backdoor Injection Attacks on DNNs using Rowhammer
-
Execute Order 66: Targeted Data Poisoning for Reinforcement Learning via Minuscule Perturbations
-
A Feature Based On-Line Detector to Remove Adversarial-Backdoors by Iterative Demarcation
-
BlindNet backdoor: Attack on deep neural network using blind watermark
-
DBIA: Data-free Backdoor Injection Attack against Transformer Networks
-
Romoa: Robust Model Aggregation for the Resistance of Federated Learning to Model Poisoning Attacks
-
Generative strategy based backdoor attacks to 3D point clouds: Work in Progress
-
Deep Neural Backdoor in Semi-Supervised Learning: Threats and Countermeasures
-
FooBaR: Fault Fooling Backdoor Attack on Neural Network Training
-
Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis
-
Data Poisoning against Differentially-Private Learners: Attacks and Defenses
-
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain
-
SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks
-
Interpretability-Guided Defense against Backdoor Attacks to Deep Neural Networks
-
HOW TO INJECT BACKDOORS WITH BETTER CONSISTENCY: LOGIT ANCHORING ON CLEAN DATA
-
A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples
-
Poisonous Label Attack: Black-Box Data Poisoning Attack with Enhanced Conditional DCGAN
-
Backdoor Attacks on Network Certification via Data Poisoning
-
Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks
-
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Federated Learning
-
Multi-Target Invisibly Trojaned Networks for Visual Recognition and Detection
-
A Countermeasure Method Using Poisonous Data Against Poisoning Attacks on IoT Machine Learning
-
FederatedReverse: A Detection and Defense Method Against Backdoor Attacks in Federated Learning
-
BinarizedAttack: Structural Poisoning Attacks to Graph-based Anomaly Detection
-
On the Effectiveness of Poisoning against Unsupervised Domain Adaptation
-
Simple, Attack-Agnostic Defense Against Targeted Training Set Attacks Using Cosine Similarity
-
Data Poisoning Attacks Against Outcome Interpretations of Predictive Models
-
Poisoning attacks and countermeasures in intelligent networks: status quo and prospects
-
The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks
-
BadEncoder: Backdoor Attacks to Pre-trainedEncoders in Self-Supervised Learning
-
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
-
Poisoning Attacks via Generative Adversarial Text to Image Synthesis
-
Ant Hole: Data Poisoning Attack Breaking out the Boundary of Face Cluster
-
MT-MTD: Muti-Training based Moving Target Defense Trojaning Attack in Edged-AI network
-
Text Backdoor Detection Using An Interpretable RNN Abstract Model
-
Garbage in, Garbage out: Poisoning Attacks Disguised with Plausible Mobility in Data Aggregation
-
Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks
-
Poisoning Knowledge Graph Embeddings via Relation Inference Patterns
-
Adversarial Training Time Attack Against Discriminative and Generative Convolutional Models
-
Poisoning of Online Learning Filters: DDoS Attacks and Countermeasures
-
Rethinking Stealthiness of Backdoor Attack against NLP Models
-
SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics
-
Backdoor Attack on Machine Learning Based Android Malware Detectors
-
Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning
-
Fight Fire with Fire: Towards Robust Recommender Systems via Adversarial Poisoning Training
-
Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch
-
AdvDoor: Adversarial Backdoor Attack of Deep Learning System
-
Defending against Backdoor Attacks in Natural Language Generation
-
De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks
-
Poisoning MorphNet for Clean-Label Backdoor Attack to Point Clouds
-
Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility
-
MLDS: A Dataset for Weight-Space Analysis of Neural Networks
-
Regularization Can Help Mitigate Poisioning Attacks. . . With The Right Hyperparameters
-
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching
-
Towards Robustness Against Natural Language Word Substitutions
-
Backdoor Attacks Against Deep Learning Systems in the Physical World
-
Transferable Environment Poisoning: Training-time Attack on Reinforcement Learning
-
Investigation of a differential cryptanalysis inspired approach for Trojan AI detection
-
Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers
-
Robust Backdoor Attacks against Deep Neural Networks in Real Physical World
-
The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor Game
-
Explainability-based Backdoor Attacks Against Graph Neural Networks
-
DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation
-
Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective
-
SPECTRE: Defending Against Backdoor Attacks Using Robust Covariance Estimation
-
Black-box Detection of Backdoor Attacks with Limited Information and Data
-
TOP: Backdoor Detection in Neural Networks via Transferability of Perturbation
-
T-Miner : A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification
-
What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors
-
Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks
-
An Approach for Poisoning Attacks Against RNN-Based Cyber Anomaly Detection
-
Backdoor Scanning for Deep Neural Networks through K-Arm Optimization
-
TAD: Trigger Approximation based Black-box Trojan Detection for AI*
-
Data Poisoning Attack on Deep Neural Network and Some Defense Methods
-
Baseline Pruning-Based Approach to Trojan Detection in Neural Networks*
-
Covert Model Poisoning Against Federated Learning: Algorithm Design and Optimization
-
TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask)
-
A Master Key Backdoor for Universal Impersonation Attack against DNN-based Face Verification
-
Detecting Universal Trigger's Adversarial Attack with Honeypot
-
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
-
Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks
-
Data Poisoning Attacks to Deep Learning Based Recommender Systems
-
One-to-N & N-to-One: Two Advanced Backdoor Attacks against Deep Learning Models
-
DeepPoison: Feature Transfer Based Stealthy Poisoning Attack
-
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
-
Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks
-
Poisoning Attacks on Cyber Attack Detectors for Industrial Control Systems
-
Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification*
-
Machine Learning with Electronic Health Records is vulnerable to Backdoor Trigger Attacks
-
Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
-
Detection of Backdoors in Trained Classifiers Without Access to the Training Set
-
TROJANZOO: Everything you ever wanted to know about neural backdoors(but were afraid to ask)
-
DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation
-
Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder
-
Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff
-
BaFFLe: Backdoor detection via Feedback-based Federated Learning
-
Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection
-
FaceHack: Triggering backdoored facial recognition systems using facial characteristics
-
Poisoned classifiers are not only backdoored, they are fundamentally broken
-
BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models
-
Don’t Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks
-
CLEANN: Accelerated Trojan Shield for Embedded Neural Networks
-
Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching
-
Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks
-
Can Adversarial Weight Perturbations Inject Neural Backdoors?
-
Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases
-
Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks
-
Cassandra: Detecting Trojaned Networks from Adversarial Perturbations
-
Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review
-
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
-
Backdoor Attacks on Facial Recognition in the Physical World
-
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
-
Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks
-
Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition
-
Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks
-
ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks
-
Model-Targeted Poisoning Attacks: Provable Convergence and Certified Bounds
-
Deep Partition Aggregation: Provable Defense against General Poisoning Attacks
-
The TrojAI Software Framework: An OpenSource tool for Embedding Trojans into Deep Learning Models*
-
Influence Function based Data Poisoning Attacks to Top-N Recommender Systems
-
BadNL: Backdoor Attacks Against NLP Models
Summary
-
Vulnerabilities of Connectionist AI Applications: Evaluation and Defence
-
Defending Support Vector Machines against Poisoning Attacks: the Hardness and Algorithm
-
A new measure for overfitting and its implications for backdooring of deep learning
-
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks
-
MetaPoison: Practical General-purpose Clean-label Data Poisoning
-
Backdooring and Poisoning Neural Networks with Image-Scaling Attacks
-
Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability
-
On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping
-
STRIP: A Defence Against Trojan Attacks on Deep Neural Networks
Summary
-
TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents
-
Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection
-
Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks
-
Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems
-
A backdoor attack against LSTM-based text classification systems
-
Detection of Backdoors in Trained Classifiers Without Access to the Training Set
-
ABS: Scanning neural networks for back-doors by artificial brain stimulation
-
NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations
-
Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs
-
Programmable Neural Network Trojan for Pre-Trained Feature Extractor
-
Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection
-
TamperNN: Efficient Tampering Detection of Deployed Neural Nets
-
TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems
-
Design and Evaluation of a Multi-Domain Trojan Detection Method on ins Neural Networks
-
Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks
-
Deep Poisoning Functions: Towards Robust Privacy-safe Image Data Sharing
-
A new Backdoor Attack in CNNs by training set corruption without label poisoning
-
Deep k-NN Defense against Clean-label Data Poisoning Attacks
-
Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
-
Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics
-
TensorClog: An imperceptible poisoning attack on deep neural network applications
-
DeepInspect: A black-box trojan detection and mitigation framework for deep neural networks
-
Resilience of Pruned Neural Network Against Poisoning Attack
-
Neural cleanse: Identifying and mitigating backdoor attacks in neural networks
-
SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems
Summary
-
PoTrojan: powerful neural-level trojan designs in deep learning models
-
Spectral Signatures in Backdoor Attacks
Summary
-
Defending Neural Backdoors via Generative Distribution Modeling
-
Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
Summary
-
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
Summary
-
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
Summary
-
Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation
-
Hu-Fu: Hardware and Software Collaborative Attack Framework against Neural Networks
-
Attack Strength vs. Detectability Dilemma in Adversarial Machine Learning
-
BEBP: An Poisoning Method Against Machine Learning Based IDSs
-
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Summary
-
Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
-
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
-
Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
-
Data Poisoning Attacks on Factorization-Based Collaborative Filtering
-
Using machine teaching to identify optimal training-set attacks on machine learners
-
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
-
Antidote: Understanding and defending against poisoning of anomaly detectors
⚔🛡 Awesome Backdoor Attacks and Defenses
This repository contains a collection of papers and resources on backdoor attacks and backdoor defense in deep learning.
Table of contents
- 📃Survey
- ⚔Backdoor Attacks
- Supervised learning (Image classification)
- Semi-supervised learning
- Self-supervised learning
- Federated learning
- Reinforcement Learning
- Other CV tasks (Object detection, segmentation, point cloud)
- Multimodal models (Visual and Language)
- Diffusion model
- Large language model & other NLP tasks
- Graph Neural Networks
- Theoretical analysis
- 🛡Backdoor Defenses
- Defense for supervised learning (Image classification)
- Defense for semi-supervised learning
- Defense for self-supervised learning
- Defense for reinforcement learning
- Defense for federated learning
- Defense for other CV tasks (Object detection, segmentation)
- Defense for multimodal models (Visual and Language)
- Defense for Large Language model & other NLP tasks
- Defense for diffusion models
- Defense for Graph Neural Networks
- Backdoor for social good
- Watermarking
- Explainable AI
- ⚙Benchmark and toolboxes
📃Survey
Year | Publication | Paper |
---|---|---|
2023 | arXiv | Adversarial Machine Learning: A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Example |
2022 | TPAMI | Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses |
2022 | TNNLS | Backdoor Learning: A Survey |
2022 | IEEE Wireless Communications | Backdoor Attacks and Defenses in Federated Learning: State-of-the-art, Taxonomy, and Future Directions |
2021 | Neurocomputing | Defense against Neural Trojan Attacks: A Survey |
2020 | ISQED | A Survey on Neural Trojans |
Tutorial & Workshop
Venue | Title |
---|---|
ICCV 2023 | Backdoor Learning: Recent Advances and Future Trends |
NeurIPS 2023 | Backdoors in Deep Learning |
⚔Backdoor Attacks
Supervised learning (Image classification)
Semi-supervised learning
Year | Publication | Paper | Code |
---|---|---|---|
2023 | ICCV 2023 | The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning | |
2023 | ICML 2023 | Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning | |
2021 | AAAI 2021 | DeHiB: Deep Hidden Backdoor Attack on Semi-supervised Learning via Adversarial Perturbation | |
2021 | TIFS 2021 | Deep Neural Backdoor in Semi-Supervised Learning: Threats and Countermeasures |
Self-supervised learning
Year | Publication | Paper | Code |
---|---|---|---|
2023 | ICCV 2023 | An Embarrassingly Simple Backdoor Attack on Self-supervised Learning | |
2022 | CVPR2022 | Backdoor Attacks on Self-Supervised Learning |
Federated learning
Year | Publication | Paper | Code |
---|---|---|---|
2023 | NeurIPS 2023 | IBA: Towards Irreversible Backdoor Attacks in Federated Learning | |
2023 | NeurIPS 2023 | A3FL: Adversarially Adaptive Backdoor Attacks to Federated Learning | |
2023 | SIGIR 2023 | Manipulating Federated Recommender Systems: Poisoning with Synthetic Users and Its Countermeasures | |
2022 | ICML2022 | Neurotoxin: Durable Backdoors in Federated Learning | |
2020 | AISTATS 2020 | How To Backdoor Federated Learning | |
2020 | ICLR 2020 | DBA: Distributed Backdoor Attacks against Federated Learning | |
2020 | NeurIPS 2020 | Attack of the Tails: Yes, You Really Can Backdoor Federated Learning | |
2022 | USS 2022 | FLAME: Taming Backdoors in Federated Learning |
Reinforcement Learning
Year | Publication | Paper | Code |
---|---|---|---|
2021 | IJCAI 2021 | BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning |
Other CV tasks (Object detection, segmentation, point cloud)
Year | Publication | Paper | Code |
---|---|---|---|
2023 | NeurIPS 2023 | BadTrack: A Poison-Only Backdoor Attack on Visual Object Tracking | |
2022 | ICLR 2022 | Few-Shot Backdoor Attacks on Visual Object Tracking | |
2022 | MM 2022 | Backdoor Attacks on Crowd Counting | |
2021 | ICCV 2021 | A Backdoor Attack against 3D Point Cloud Classifiers | |
2021 | ICCV 2021 | PointBA: Towards Backdoor Attacks in 3D Point Cloud |
Multimodal models (Visual and Language)
Year | Publication | Paper | Code |
---|---|---|---|
2024 | IEEE SP | Backdooring Multimodal Learning | |
2022 | CVPR2022 | Dual-Key Multimodal Backdoors for Visual Question Answering | |
2022 | ICASSP 2022 | Object-Oriented Backdoor Attack Against Image Captioning | |
2022 | ICLR 2022 | Poisoning and Backdooring Contrastive Learning |
Diffusion model
Year | Publication | Paper | Code |
---|---|---|---|
2023 | NeurIPS 2023 | VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models | |
2023 | ICCV 2023 | Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis | |
2023 | CVPR 2023 | How to Backdoor Diffusion Models? | |
2023 | CVPR 2023 | TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets |
Large language model & other NLP tasks
Graph Neural Networks
Year | Publication | Paper | Code |
---|---|---|---|
2022 | CCS 2022 | Clean-label Backdoor Attack on Graph Neural Networks | |
2022 | ICMR 2022 | Camouflaged Poisoning Attack on Graph Neural Networks | |
2022 | RAID 2022 | Transferable Graph Backdoor Attack | |
2021 | SACMAT 2021 | Backdoor Attacks to Graph Neural Networks | |
2021 | USS 2021 | Graph Backdoor | |
2021 | WiseML 2021 | Explainability-based Backdoor Attacks Against Graph Neural Network |
Theoretical analysis
Year | Publication | Paper | Code |
---|---|---|---|
2020 | NeurIPS 2020 | On the Trade-off between Adversarial and Backdoor Robustness |
🛡Backdoor Defenses
Defense for supervised learning (Image classification)
Before-training (Preprocessing) stage
Year | Publication | Paper | Code |
---|---|---|---|
2023 | ICCV 2023 | Beating Backdoor Attack at Its Own Game | |
2023 | USENIX Security 2023 | Towards A Proactive ML Approach for Detecting Backdoor Poison Samples | |
2023 | USENIX Security 2023 | ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms | |
2023 | USENIX Security 2023 | How to Sift Out a Clean Data Subset in the Presence of Data Poisoning? | |
2023 | ICLR 2023 | Towards Robustness Certification Against Universal Perturbations | |
2021 | ICML 2021 | SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics | |
2021 | USENIX Security 2021 | Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection | |
2020 | ICLR 2020 | Robust anomaly detection and backdoor attack detection via differential privacy | |
2019 | IEEE SP | Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks | |
2018 | Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering | ||
2018 | NeurIPS 2018 | Spectral Signatures in Backdoor Attacks |
In-training stage
Year | Publication | Paper | Code |
---|---|---|---|
2023 | CVPR 2023 | Backdoor Defense via Adaptively Splitting Poisoned Dataset | |
2023 | CVPR 2023 | Backdoor Defense via Deconfounded Representation Learning | |
2023 | IEEE SP | RAB: Provable Robustness Against Backdoor Attacks | |
2023 | ICLR 2023 | Towards Robustness Certification Against Universal Perturbations | |
2022 | ICLR 2022 | Backdoor defense via decoupling the training process | |
2022 | NeurIPS 2022 | Effective Backdoor Defense by Exploiting Sensitivity of Poisoned Samples | |
2022 | AAAI 2022 | Certified Robustness of Nearest Neighbors against Data Poisoning and Backdoor Attacks | |
2021 | NeurIPS 2021 | Anti-Backdoor Learning: Training Clean Models on Poisoned Data | |
2021 | AAAI 2021 | Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks | |
2022 | NeurIPS 2022 | BagFlip: A Certified Defense against Data Poisoning |
Post-training stage
Inference stage
Defense for semi-supervised learning
Year | Publication | Paper | Code |
---|---|---|---|
Defense for self-supervised learning
Year | Publication | Paper | Code |
---|---|---|---|
2023 | CVPR 2023 | Detecting Backdoors in Pre-trained Encoders | |
2023 | CVPR 2023 | Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning |
Defense for reinforcement learning
Year | Publication | Paper | Code |
---|---|---|---|
2023 | NeurIPS 2023 | BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning | |
2023 | ICCV 2023 | PolicyCleanse: Backdoor Detection and Mitigation for Competitive Reinforcement Learning |
Defense for federated learning
Year | Publication | Paper | Code |
---|---|---|---|
2023 | NeurIPS 2023 | Theoretically Modeling Client Data Divergence for Federated Natural Language Backdoor Defense | |
2023 | NeurIPS 2023 | FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning | |
2023 | NeurIPS 2023 | Lockdown: Backdoor Defense for Federated Learning with Isolated Subspace Training | |
2023 | ICCV 2023 | Multi-Metrics Adaptively Identifies Backdoors in Federated Learning | |
2023 | ICLR 2023 | FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning |
Defense for other CV tasks (Object detection, segmentation)
Year | Publication | Paper | Code |
---|---|---|---|
2023 | NeurIPS 2023 | Django: Detecting Trojans in Object Detection Models via Gaussian Focus Calibration |
Defense for multimodal models (Visual and Language)
Year | Publication | Paper | Code |
---|---|---|---|
2023 | NeurIPS 2023 | Robust Contrastive Language-Image Pretraining against Data Poisoning and Backdoor Attacks | |
2023 | ICCV 2023 | CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning | |
2023 | ICCV 2023 | TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models | |
2023 | CVPR 2023 | Detecting Backdoors in Pre-trained Encoders |
Defense for Large Language model & other NLP tasks
Defense for diffusion models
Year | Publication | Paper | Code |
---|---|---|---|
Defense for Graph Neural Networks
Year | Publication | Paper | Code |
---|---|---|---|
Backdoor for social good
Watermarking
Year | Publication | Paper | Code |
---|---|---|---|
2022 | IJCAI2022 | Membership Inference via Backdooring | |
2022 | NeurIPS 2022 | Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection | |
2018 | USS 2018 | Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring |
Explainable AI
Year | Publication | Paper | Code |
---|---|---|---|
2021 | KDD 2021 | What Do You See?: Evaluation of Explainable Artificial Intelligence (XAI) Interpretability through Neural Backdoors |
⚙Benchmark and toolboxes
Name | Publication | Paper | Code |
---|---|---|---|
BackdoorBench | NeurIPS 2022 | BackdoorBench: A Comprehensive Benchmark of Backdoor Learning | |
OpenBackdoor | NeurIPS 2022 | A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks | |
TrojanZoo | EuroS&P 2022 | TrojanZoo: Towards Unified, Holistic, and Practical Evaluation of Neural Backdoors | |
BackdoorBox | BackdoorBox: An Open-sourced Python Toolbox for Backdoor Attacks and Defenses | ||
BackdoorToolbox |