MLID : A multilabelextension of the ID3 algorithm

University essay from Blekinge Tekniska Högskola/Institutionen för programvaruteknik

Abstract: AbstractMachine learning is a subfield within artificial intelligence that revolves around constructingalgorithms that can learn from, and make predictions on data. Instead of following strict andstatic instruction, the system operates by adapting and learning from input data in order tomake predictions and decisions. This work will focus on a subcategory of machine learningcalled “MultilabelClassification”, which is the concept of where items introduced to thesystem is categorized by an analytical model, learned through supervised learning, whereeach instance of the dataset can belong to multiple labels, or classes.This paper presents the task of implementing a multilabelclassifier based on the ID3algorithm, which we call MLID (MultilabelIterative Dichotomiser). The solution is presentedboth in a sequentially executed version as well as an parallelized one.We also presents acomparison based on accuracy and execution time, that is performed against algorithms of asimilar nature in order to evaluate the viability of using ID3 as a base to further expand andbuild upon in regards of multi label classification.In order to evaluate the performance of the MLID algorithm, we have measured theexecution time, accuracy, and made a summarization of precision and recall into what iscalled Fmeasure,which is the harmonic mean of both precision and sensitivity of thealgorithm. These results are then compared to already defined and established algorithms,on a range of datasets of varying sizes, in order to assess the viability of the MLID algorithm.The results produced when comparing MLID against other multilabelalgorithms such asBinary relevance, Classifier Chains and Random Trees shows that MLID can compete withother classifiers in term of accuracy and Fmeasure,but in terms of training the algorithm,the time required is proven inferior. Through these results, we can conclude that MLID is aviable option to use as a multilabelclassifier. Although, some constraints inherited from theoriginal ID3 algorithm does impede the full utility of the algorithm, we are certain thatfollowing the same path of development and improvement as ID3 experienced would allowMLID to develop towards a suitable choice of algorithm for a diverse range of multilabelclassification problems.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)