Early Automated Risk Assessment of CVE Descriptions

University essay from Linköpings universitet/Institutionen för datavetenskap

Author: Felix Nyrfors; Ludvig Sandholm; [2022]

Keywords: ;

Abstract: Data security is becoming more relevant as more and more information is stored digitally and many new data vulnerabilities are found everyday. To keep track of what software programs could be exposed, databases such as the NVD that try to list all identified vulnerabilities were created. The vulnerabilities stored on NVD are stored in the form of CVEs and are analyzed by experts to provide further information. The problem with this is that there is a delay from the time the CVEs are published to the time they are analyzed and during this period systems could be vulnerable. This thesis investigates an early risk assessment using CVE descriptions and text classification to predict whether a CVE is relevant or irrelevant to a specific list of software programs. The list of programs consists of common programs that have vulnerability data associated to it on vulnerability databases. A dataset of CVEs was collected, analyzed and labeled as relevant or irrelevant. The machine learning models train on the CVE dataset to classify CVE description to the correct label. The machine learning models were also compared to a simple pattern matching algorithm that was tested on the same datasets. The binary classification achieved a precision and recall of 0.99 and 0.95 and the multiclass classification achieved an average precision and recall of 0.98 and 0.92. The results of the classification support the potential of automated risk assessment given a list of programs and the analysis of the CVE dataset provides further support for the method’s potential.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)