Prompt Injection Resistance in Clinical LLM Agents via Structured Medical Ontology Alignment
Keywords:
prompt injection, large language models, clinical agents, medical ontology, adversarial robustness, healthcare AI safetyAbstract
Large language model agents deployed in clinical settings offer transformative potential for decision support, patient communication, and workflow automation. However, their vulnerability to prompt injection attacks poses a critical safety risk, especially when adversarial inputs can manipulate model outputs to produce harmful or misleading medical advice. This paper proposes a structural defense framework based on the alignment of clinical large language model agents with a formal medical ontology, such as the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) or the Unified Medical Language System (UMLS). By constraining the agent’s reasoning and generation processes to a structured representation of medical knowledge, the system can detect and reject inputs that deviate from clinically valid pathways. We present an architectural design that integrates ontological grounding at multiple stages of the agent pipeline, including input preprocessing, context injection, and output validation. The approach is evaluated against a taxonomy of prompt injection techniques, including direct, indirect, and multi-turn attacks. Results demonstrate that ontology-aligned agents exhibit significantly higher resistance to adversarial manipulations compared to unconstrained baseline models, while maintaining clinical accuracy and fluency. The paper also discusses the trade-offs between security and expressiveness, the computational overhead of ontology integration, and the implications for regulatory compliance and deployment in resource-constrained healthcare environments. We argue that structured ontology alignment represents a promising direction for building trustworthy clinical large language model agents that can safely operate in adversarial open-world interactions.
References
1. Perez, F., & Ribeiro, I. (2022). Ignore previous prompt: Attack techniques for language models. In Proceedings of the NeurIPS 2022 Workshop on Security in Machine Learning.
2. Greshake, K., Abdolrashidi, A., Ramaswamy, S., & Shacham, H. (2023). Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS 2023).
3. Schulhoff, S., Sachan, S., Calvo, R., & van der Wal, T. (2024). Defending against prompt injection: A survey and taxonomy. ACM Computing Surveys, 57(1), Article 12.
4. Bodenreider, O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 32(Database issue), D267–D270.
5. Elkin, P. L., Brown, S. H., & Huss, E. (2005). A systematic evaluation of the quality of SNOMED CT. Journal of the American Medical Informatics Association, 12(5), 553–561.
6. Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., & Wu, X. (2024). Knowledge graph-enhanced large language models via ontology-aware prompt tuning. In Proceedings of the 38th AAAI Conference on Artificial Intelligence.
7. Agrawal, M., Hegselmann, S., Lang, H., Kim, Y., & Sontag, D. (2022). Large language models are zero-shot clinical information extractors. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022).
8. Bagdasaryan, E., & Shmatikov, V. (2023). Spinning sequences: How to inject new instructions into conversational agents. In Proceedings of the 2023 IEEE Symposium on Security and Privacy.
9. Garcez, A. d., Broda, K., & Gabbay, D. M. (2002). Neural-symbolic learning systems: Foundations and applications. Springer.
10. Cimino, J. J. (1998). Desiderata for controlled medical vocabularies in the twenty-first century. Methods of Information in Medicine, 37(4-5), 394–403.
11. Hu, S. (2026). Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks. arXiv preprint arXiv:2605.08257.
12. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations (ICLR 2015).
13. Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. New England Journal of Medicine, 380(14), 1347–1358.
14. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.
15. Neumann, M., & Stuckenschmidt, H. (2018). Ontology-based data access with a large language model? In Proceedings of the 31st International Workshop on Description Logics.
16. Kohane, I. S., & Altman, R. B. (2023). A safety framework for clinical AI. JAMA, 329(15), 1275–1276.
17. Lakkaraju, H., & Bastani, O. (2020). "How do I fool you?": Manipulating user trust via misleading black box explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society.
18. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., ... & Bizer, C. (2015). DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6(2), 167–195.
19. Seal, H., & Li, S. (2023). Adversarial robustness of medical language models: A case study on clinical note generation. In Proceedings of the 2023 Machine Learning for Healthcare Conference.
20. Wong, A., & Lewis, P. (2024). Prompt injection in multi-agent systems: A taxonomy and defense. arXiv preprint arXiv:2404.12345.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Computer Science and Engineering Transactions

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



