Data Complexity and its effect on Classification Accuracy in Multi Class Classification Problems : A study using synthetic datasets

University essay from KTH/Datavetenskap

Author: Fredrik Östlund; Erik Fahlman; [2022]

Keywords: ;

Abstract: This study investigates how the performance of a selection of machine learning classifiers is affected by the data complexity, measured by F1, N1, N2, and N3 in a multi class classification setting. This study uses synthetic datasets that span across the range of possible complexity levels for each complexity measure, allowing us to target the desired level of complexity for each dataset. The number of dimensions of the datasets was inspired by the Fashion-MNIST benchmark dataset. The study finds that classifier accuracy decreases when dataset complexity increases, the robustness of accuracies decreases as dataset complexity increases, and that the descriptive power of N1 and N3 are most reflective of real world performance.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)