Uncertainty-Aware LLM Agents for Safe Medical Decision-Making in Noisy Clinical Environments

Abhishek Banerjee

Authors

Abhishek Banerjee School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.

Keywords:

uncertainty quantification, large language models, clinical decision support, safe AI, socio-technical systems, healthcare infrastructure

Abstract

Large language model (LLM) agents are increasingly being considered for clinical decision support, yet their deployment in noisy hospital environments raises fundamental concerns about safety, reliability, and governance. This paper proposes a system-level framework for uncertainty-aware LLM agents that can operate robustly under the high-variability conditions typical of real-world clinical settings. We argue that current LLM architectures lack formal mechanisms to quantify and communicate epistemic and aleatoric uncertainty, leading to overconfident recommendations that may endanger patients. Drawing on principles from probabilistic machine learning, human-in-the-loop design, and socio-technical systems theory, we present a multi-layered architecture that integrates uncertainty estimation, deferral protocols, and continuous monitoring. We examine structural trade-offs between autonomy and oversight, the role of regulatory infrastructure, and the challenges of fairness across diverse patient populations. By comparing uncertainty-aware approaches in autonomous driving and financial risk assessment, we derive lessons for clinical deployment. The paper further addresses sustainability implications of running large models in resource-constrained healthcare environments and discusses policy frameworks for certification and liability. We conclude that uncertainty-aware LLM agents, while not a panacea, represent a necessary evolution toward trustworthy AI in medicine, provided they are embedded within robust institutional governance structures.

References

1. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56.

2. Rajpurkar, P., Chen, E., Banerjee, O., & Topol, E. J. (2022). AI in health and medicine. Nature Medicine, 28(1), 31–38.

3. Liu, Y., Shi, Z., Wei, Y., & Jiang, H. (2024). Large language models in healthcare: A systematic review. Journal of Biomedical Informatics, 149, 104578.

4. Babic, B., Gerke, S., Evgeniou, T., & Cohen, I. G. (2021). Beware explanations from AI in health care. Science, 373(6554), 284–286.

5. Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.

6. Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. Proceedings of the 33rd International Conference on Machine Learning, 48, 1050–1059.

7. Leibig, C., Allken, V., Ayhan, M. S., Berens, P., & Wahl, S. (2017). Leveraging uncertainty information from deep neural networks for disease detection. Scientific Reports, 7(1), 17816.

8. Hu, S. (2026). Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks. arXiv preprint arXiv:2605.08257.

9. Angelopoulos, A. N., & Bates, S. (2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511.

10. Mena, J., Pujol, O., & Vitrià, J. (2021). A survey on uncertainty estimation in deep learning. Artificial Intelligence Review, 54, 5935–6002.

11. Kuhn, L., Gal, Y., & Farquhar, S. (2023). Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. Proceedings of the 11th International Conference on Learning Representations.

12. Raghu, M., Blumer, K., Corrado, G., & Kleinberg, J. (2019). The algorithmic automation problem: Prediction, triage, and human effort. arXiv preprint arXiv:1903.12220.

13. Sittig, D. F., & Singh, H. (2010). A new sociotechnical model for studying health information technology in complex adaptive healthcare systems. Quality and Safety in Health Care, 19(Suppl 3), i68–i74.

14. Carayon, P., & Wood, K. E. (2010). Patient safety: The role of human factors and systems engineering. Studies in Health Technology and Informatics, 153, 23–46.

15. Romano, Y., Patterson, E., & Candès, E. (2019). Conformalized quantile regression. Advances in Neural Information Processing Systems, 32.

16. Ghahramani, Z. (2015). Probabilistic machine learning and artificial intelligence. Nature, 521(7553), 452–459.

17. Price, W. N., & Cohen, I. G. (2019). Privacy in the age of medical big data. Nature Medicine, 25(1), 37–43.

18. U.S. Food and Drug Administration. (2021). Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD). FDA.

19. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 77–91.

20. Rieke, N., Hancox, J., Li, W., et al. (2020). The future of digital health with federated learning. npj Digital Medicine, 3, 119.

21. Bansal, G., Nushi, B., Kamar, E., Lasecki, W. S., Weld, D. S., & Horvitz, E. (2019). Beyond accuracy: The role of mental models in human-AI team performance. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 33, 7089–7096.

22. Zhang, Y., Liao, Q. V., & Bellamy, R. K. (2020). Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 295–305.

23. Shalev-Shwartz, S., Shammah, S., & Shashua, A. (2017). On a formal model of safe and scalable self-driving cars. arXiv preprint arXiv:1708.06374.

24. Varshney, K. R. (2019). Trustworthy machine learning and artificial intelligence. XRDS: Crossroads, 25(3), 26–29.

25. Chen, I. Y., Pierson, E., Rose, S., Joshi, S., Ferryman, K., & Ghassemi, M. (2021). Ethical machine learning in healthcare. Annual Review of Biomedical Data Science, 4, 123–144.

Uncertainty-Aware LLM Agents for Safe Medical Decision-Making in Noisy Clinical Environments

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

Current Issue

Information

Indexing & Infrastructure