Prototype-Guided Backdoor Mitigation for Federated Large Language Model Fine-Tuning in Cross-Silo Healthcare Systems
Keywords:
federated learning, large language models, backdoor mitigation, prototype guidance, cross-silo healthcare, security, robustnessAbstract
The integration of large language models within federated learning frameworks for cross-silo healthcare systems presents significant promise for privacy-preserving clinical decision support. However, such systems are vulnerable to backdoor attacks where malicious clients inject adversarial triggers during fine-tuning, causing the global model to behave incorrectly on targeted inputs while maintaining normal performance otherwise. Existing mitigation strategies often assume centralized data access or incur prohibitive computational overhead, making them unsuitable for resource-constrained and highly regulated healthcare environments. This paper proposes a prototype-guided backdoor mitigation framework specifically designed for federated fine-tuning of large language models in cross-silo architectures. The approach leverages prototype consistency across heterogeneous client distributions to detect and suppress poisoned model updates without requiring access to raw patient data or auxiliary clean datasets. We provide a comprehensive system-level discussion encompassing architectural design, deployment trade-offs, robustness guarantees, fairness implications, and governance challenges. Through comparative analysis with prior methods, we demonstrate that prototype-guided mechanisms offer a favorable balance between security and utility while respecting the strict data sovereignty and auditability requirements of healthcare consortia. The paper also examines policy considerations for adoption in clinical workflows and outlines future directions for adaptive, cross-institutional backdoor defense.
References
1. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (pp. 1273–1282). PMLR.
2. Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733.
3. Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V. (2020). How to backdoor federated learning. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (pp. 2938–2948). PMLR.
4. Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175–1191). ACM.
5. Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems 30 (pp. 4077–4087).
6. McMahan, B., Moore, E., Ramage, D., & y Arcas, B. A. (2016). Federated learning of deep networks using model averaging. arXiv preprint arXiv:1602.05629.
7. Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50–60.
8. Xie, C., Huang, K., Chen, P.-Y., & Li, B. (2020). DBA: Distributed backdoor attacks against federated learning. In International Conference on Learning Representations.
9. Wang, Z., Song, M., Zhang, Z., Song, Y., Wang, Q., & Qi, H. (2020). Beyond inferring class-level labels: Transfer learning from natural language to clinical text. Journal of the American Medical Informatics Association, 27(12), 1878–1887.
10. Blanchard, P., Mhamdi, E. M. E., Guerraoui, R., & Stainer, J. (2017). Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems 30 (pp. 119–129).
11. Bernstein, J., Zhao, J., Azizzadenesheli, K., & Anandkumar, A. (2019). signSGD: Compressed optimisation for non-convex problems. In Proceedings of the 35th International Conference on Machine Learning (pp. 560–569). PMLR.
12. Tolpegin, V., Truex, S., Gursoy, M. E., & Liu, L. (2020). Data poisoning attacks against federated learning systems. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security (pp. 480–492). ACM.
13. Shui, Y., Jin, R., Dou, Z., & Gao, Z. (2026). ProtoGuard-SL: Prototype Consistency Based Backdoor Defense for Vertical Split Learning. arXiv preprint arXiv:2604.03595.
14. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 308–318). ACM.
15. Chen, F., Luo, M., Dong, Z., Li, Z., & He, Q. (2018). Federated meta-learning with fast convergence and efficient communication. arXiv preprint arXiv:1802.07876.
16. Goldblum, M., Tsipras, D., Xie, C., Chen, P.-Y., Schwarzschild, A., Song, D., Madry, A., Li, B., & Goldstein, T. (2022). Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 1563–1580.
17. Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al. (2021). Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1–2), 1–210.
18. Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
19. Mohri, M., Sivek, G., & Suresh, A. T. (2019). Agnostic federated learning. In Proceedings of the 36th International Conference on Machine Learning (pp. 4615–4625). PMLR.
20. Li, T., Sanjabi, M., Beirami, A., & Smith, V. (2020). Fair resource allocation in federated learning. In International Conference on Learning Representations.
21. Zafar, M. B., Valera, I., Gomez Rodriguez, M., & Gummadi, K. P. (2017). Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web (pp. 1171–1180). ACM.
22. Dwork, C., Roth, A., et al. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4), 211–407.
23. Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175–1191). ACM.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Computer Science and Engineering Transactions

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



