SlowDelib-RAG: Integrating Reflective Retrieval-Augmented Reasoning into Fast Decision Policies for LLM Agents

Aapo L. Bush; Reid Lawson; Jorge Rao

Authors

Aapo L. Bush Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, USA.
Reid Lawson Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, USA.
Jorge Rao Department of Computer Science, Binghamton University, Binghamton, NY, USA.

Keywords:

retrieval-augmented generation, LLM agents, dual‑process reasoning, reflective retrieval, fast and slow decision policies, system architecture, governance, fairness

Abstract

Large language model (LLM) agents are increasingly deployed in real-world decision-making pipelines where both speed and accuracy are critical. However, existing retrieval-augmented generation (RAG) frameworks typically operate as a single-pass, feed-forward process that retrieves external knowledge once and then generates a response without iterative reflection. This design prioritizes low latency but can lead to shallow reasoning, factual inconsistencies, and inadequate handling of ambiguous or conflicting information. In this paper, we propose SlowDelib-RAG, a hybrid architecture that injects a reflective retrieval-augmented reasoning module into a conventional fast decision policy used by LLM agents. The system separates agent behavior into two primary modes: a fast mode that executes pre-trained, pattern-matching decision heuristics for routine tasks, and a slow mode that activates a deliberative RAG loop when uncertainty exceeds a learned threshold or when task complexity warrants deeper analysis. The slow mode performs iterative retrieval, context evaluation, and self‑critique before arriving at a final response, while the fast mode ensures that low‑risk, high‑volume operations are completed within strict latency budgets. We examine the structural trade‑offs between response time and decision quality, discuss the governance framework required to manage the switching policy between modes, and analyze the implications for infrastructure sustainability, robustness to adversarial perturbations, and fairness across diverse user populations. Through a series of illustrative deployment scenarios, we demonstrate that SlowDelib-RAG improves factual accuracy by up to 18% over standard RAG on complex multi‑hop queries while maintaining average response times within acceptable bounds. We also discuss policy challenges related to transparency, accountability, and the potential for biased mode activation. The proposed architecture offers a principled pathway toward LLM agents that can both react quickly and reason deeply, aligning with the dual‑process theory of cognition that has long informed human decision‑making.

References

1. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.

2. Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., ... & Sifre, L. (2022). Improving language models by retrieving from trillions of tokens. Proceedings of the 39th International Conference on Machine Learning, 2206–2240.

3. Shao, Z., Gong, Y., Huang, Y., Duan, N., & Zhou, M. (2023). Enhancing retrieval-augmented large language models with iterative retrieval. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 1239–1252.

4. Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., ... & Clark, P. (2023). Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.

5. Chen, L., Tu, J., Long, Y., & Wan, X. (2024). Fast vs. slow: A holistic evaluation of reasoning in large language models. Transactions of the Association for Computational Linguistics, 12, 456–473.

6. Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.

7. Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., ... & Yih, W. (2020). Dense passage retrieval for open-domain question answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 6769–6781.

8. Izacard, G., & Grave, E. (2021). Leveraging passage retrieval with generative models for open domain question answering. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 874–885.

9. Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. Proceedings of the 11th International Conference on Learning Representations.

10. Zhou, J., Li, Z., & Wang, G. (2023). FastGen: Accelerating language model generation via early exiting. arXiv preprint arXiv:2305.11654.

11. Shi, W., Han, M., Zhu, H., & Zhao, T. (2024). SlowGen: Deliberate generation with iterative self-evaluation. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics, 1201–1215.

12. Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273–278.

13. Savarese, P., Figurnov, M., & Nachman, L. (2021). Learning to defer to experts for efficient inference. Advances in Neural Information Processing Systems, 34, 13188–13200.

14. Leviathan, Y., Kalman, M., & Matias, Y. (2023). Fast inference from transformers via speculative decoding. Proceedings of the 40th International Conference on Machine Learning, 19274–19287.

15. Patel, V. L., Kannampallil, T. G., & Shortliffe, E. H. (2015). Role of cognition in generating and mitigating diagnostic errors. BMJ Quality & Safety, 24(5), 322–329.

16. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650.

17. Dou, Z., Cui, D., Yan, J., Wang, W., Chen, B., Wang, H., ... & Zhang, S. (2025). Dsadf: Thinking fast and slow for decision making. arXiv preprint arXiv:2505.08189.

18. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.

19. Liu, A., Choi, Y., & Davis, J. (2024). Fairness in retrieval-augmented generation: A survey. ACM Computing Surveys, 57(2), 1–36.

20. Jiang, J., Liang, P., & Hashimoto, T. B. (2023). Knowledge distillation from large language models using retrieval-augmented training. arXiv preprint arXiv:2306.10333.

SlowDelib-RAG: Integrating Reflective Retrieval-Augmented Reasoning into Fast Decision Policies for LLM Agents

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

Current Issue

Information

Indexing & Infrastructure