KARTAL: Web Application Vulnerability Hunting Using Large Language Models : Novel method for detecting logical vulnerabilities in web applications with finetuned Large Language Models

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Broken Access Control is the most serious web application security risk as published by Open Worldwide Application Security Project (OWASP). This category has highly complex vulnerabilities such as Broken Object Level Authorization (BOLA) and Exposure of Sensitive Information. Finding such critical vulnerabilities in large software systems requires intelligent and automated tools. State-of-the-art (SOTA) research including hybrid application security testing tools, algorithmic brute forcers, and artificial intelligence has shown great promise in detection. Nevertheless, there exists a gap in research for reliably identifying logical and context-dependant Broken Access Control vulnerabilities. We modeled the problem as text classification and proposed KARTAL, a novel method for web application vulnerability detection using a Large Language Model (LLM). It consists of 3 components: Fuzzer, Prompter, and Detector. The Fuzzer is responsible for methodically collecting application behavior. The Prompter processes the data from the Fuzzer and formulates a prompt. Finally, the Detector uses an LLM which we have finetuned for detecting vulnerabilities. In the study, we investigate the performance, key factors, and limitations of the proposed method. Our research reveals the need for a labeled Broken Access Control vulnerability dataset in the cybersecurity field. Thus, we custom-generate our own dataset using an auto-regressive LLM with SOTA few-shot prompting techniques. We experiment with finetuning 3 types of decoder-only pre-trained transformers for detecting 2 sophisticated vulnerabilities. Our best model attained an accuracy of 87.19%, with an F1 score of 0.82. By using hardware acceleration on a consumer-grade laptop, our fastest model can make up to 539 predictions per second. The experiments on varying the training sample size demonstrated the great learning capabilities of our model. Every 400 samples added to training resulted in an average MCC score improvement of 19.58%. Furthermore, the dynamic properties of KARTAL enable inferencetime adaption to the application domain, resulting in reduced false positives.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)