Android Malware Detection Using Machine Learning

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Background. The Android smartphone, with its wide range of uses and excellent performance, has attracted numerous users. Still, this domination of the Android platform also has motivated the attackers to develop malware. The traditional methodology which detects the malware based on the signature is unfit to discover unknown applications. In this paper, detection has been tried whether an application is malware or not using Static Analysis (SA). Considered all the permissions that an application asks for and took them as input to feed our machine learning models.  Objectives. The objectives to address and fulfill the aim of this thesis are: To find/create the necessary data set containing malware in the android systems. To test this, different classifiers have been built using different machine learning (ML) algorithms such as Support Vector Machine (SVM) (Linear and RBF), Logistic Regression (LR), Random Forest Algorithm (RF), Gaussian Naive-Bayes (GNB), Decision Tree Method (DT) etc., and also compared their performances. To evaluate and compare each of the chosen models using Accuracy, Precision, F1-Score and Recall methods among the algorithms mentioned in detecting the malware in android with better accuracy in real-time scenarios.  Methods. To answer the research question, 1 method has been chosen which is: To identify malware in android system, the Experiment has been used.  Results. The Sequential Neural Network (SNN) performed well on the dataset with 98.82 percent than the other Machine Learning (ML) algorithms. So, it is the most fruitful algorithm for the Android malware detection. Random Forest (RF), Decision Tree (DT) are the second-best algorithms on the dataset with 97 percent.  Conclusions.  Among Logistic Regression, KNN, SVM Linear, SVM RBF, Decision Tree, Random Forest, Gaussian Naive Bayes, and Sequential Neural Network Random Forest is declared as the most efficient algorithm after comparing all the models based on the performance metrics Precision, Recall, F1-Score and also by calculating Accuracy. Random Forest is considered as the most efficient algorithm among the four algorithms when they were compared.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)