Retrieval Augmented Generation against the Machine: Merging Human Cyber Security Expertise with Generative AI

Brennan Lodge

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33132

Retrieval Augmented Generation against the Machine: Merging Human Cyber Security Expertise with Generative AI

Authors: Brennan Lodge

Abstract:

Amidst a complex regulatory landscape, Retrieval Augmented Generation (RAG) emerges as a transformative tool for Governance Risk and Compliance (GRC) officers. This paper details the application of RAG in synthesizing Large Language Models (LLMs) with external knowledge bases, offering GRC professionals an advanced means to adapt to rapid changes in compliance requirements. While the development for standalone LLMs is exciting, such models do have their downsides. LLMs cannot easily expand or revise their memory, and they cannot straightforwardly provide insight into their predictions, and may produce “hallucinations.” Leveraging a pre-trained seq2seq transformer and a dense vector index of domain-specific data, this approach integrates real-time data retrieval into the generative process, enabling gap analysis and the dynamic generation of compliance and risk management content. We delve into the mechanics of RAG, focusing on its dual structure that pairs parametric knowledge contained within the transformer model with non-parametric data extracted from an updatable corpus. This hybrid model enhances decision-making through context-rich insights, drawing from the most current and relevant information, thereby enabling GRC officers to maintain a proactive compliance stance. Our methodology aligns with the latest advances in neural network fine-tuning, providing a granular, token-level application of retrieved information to inform and generate compliance narratives. By employing RAG, we exhibit a scalable solution that can adapt to novel regulatory challenges and cybersecurity threats, offering GRC officers a robust, predictive tool that augments their expertise. The granular application of RAG’s dual structure not only improves compliance and risk management protocols but also informs the development of compliance narratives with pinpoint accuracy. It underscores AI’s emerging role in strategic risk mitigation and proactive policy formation, positioning GRC officers to anticipate and navigate the complexities of regulatory evolution confidently.

Keywords: Retrieval Augmented Generation, Governance Risk and Compliance, Cybersecurity, AI-driven Compliance, Risk Management, Generative AI.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 156

References:

[1] Bureau of Labor Statistics, U.S. Department of Labor, Occupational Outlook Handbook, Information Security Analysts, at https://www.bls.gov/ooh/computer-and-information-technology/information-security-analysts.htm (visited September 06, 2023).
[2] UNCTAD (United Nations Conference on Trade and Development). (n.d.). Summary of Adoption of E-commerce Legislation Worldwide. UNCTAD Global Cyberlaw Tracker. Retrieved from https://unctad.org/topic/ecommerce-and-digital-economy/ecommerce-law-reform/summary-adoption-e-commerce-legislation-worldwide
[3] Morgan, S. (2023, April 14). Cybersecurity Jobs Report: 3.5 Million Unfilled Positions in 2025. Cyber Crime Magazine. https://cybersecurityventures.com/jobs/
[4] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., & Kiela, D. (2021). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Facebook AI Research; University College London; New York University. Submitted on 22 May 2020 (v1), last revised 12 Apr 2021 (v4). Available at: https://arxiv.org/abs/2005.11401
[5] Riedel, S., Kiela, D., Lewis, P., & Piktus, A. (2020). Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models. ML Applications, Open Source. https://ai.meta.com/blog/retrieval- augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/
[6] Anand, Y., Nussbaum, Z., Duderstadt, B., Schmidt, B. M., Treat, A., & Mulyar, A. (2023). GPT4All: An ecosystem of open source assistants that run on local hardware. Nomic AI.
[7] Pinecone Company. (2023). Sentence Transformers: Meanings in Disguise. https://www.pinecone.io/learn/series/nlp/sentence-embeddings/
[8] SANS Organization. (n.d.). Security Policy Templates. Retrieved from https://www.sans.org/information-security- policy/
[9] Checkpoint Research. (2023). Global Cyberattacks Continue to Rise with Africa and APAC Suffering Most. Retrieved from https://blog.checkpoint.com/research/global-cyberattacks-continue-to-rise/
[10] “Getting Started with Data Protection.” Information Commissioner's Office, n.d., https://ico.org.uk/for-organisations/advice-for-small-organisations/frequently-asked-questions/getting-started-with-data-protection/#receivescomplaint.
[11] “Maryland’s New Children’s Privacy Laws and Their Impact on Technology.” The New York Times, 7 Apr. 2024, https://www.nytimes.com/2024/04/07/technology/maryland-children-privacy-laws-technology.html.
[12] “The Biggest GDPR Fines So Far.” Termly, n.d., https://termly.io/resources/articles/biggest-gdpr-fines/