Prediction of Factors Influencing Rats Tuberculosis Detection Performance Using Data Mining Techniques

University essay from Uppsala universitet/Institutionen för informatik och media

Author: Joan Jonathan; [2019]

Keywords: ;

Abstract: This thesis aimed to predict the factors that influence rats TB detection performance using data mining techniques. A rats TB detection performance dataset was given from APOPO TB training and research center in Morogoro, Tanzania. After data preprocessing, the size of the dataset was 471,133 rats TB detection performance observations and a sample size of 4 female rats. However, in the analysis, only 200,000 data observations were used. Based on the CRISP-DM methodology, this thesis used R language as a data mining tool to analyze the given data. To build the predictive model the classification technique was used to predict the influencing factors and classify rats using a decision tree, random forest, and naive Bayes algorithms. The built predictive models were validated with the same test data to check their classification prediction accuracy and to find the best. The results pinpoint that the random forest is the best predictive model with an accuracy of 78.82%. However, the accuracy differences are negligible. When considering the predictive model accuracy (78.78%) and speed (3 seconds) of the decision tree, it is the best predictive model since it has less building time compared to the random forest (154 seconds). Moreover, the results manifest that age is the most significant influencing factor, and rats of ages between 3.1 to 6 years portrayed potentiality in detection performance. The other predicted factors are Session_Completion_Time, Session_Start_Time, and Av_Weight_Per_Year. These results are useful as a reference to rats TB trainers and researchers in rats TB and Information Systems. Further research using other data mining techniques and tools is valuable.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)