Error Pattern Recognition Using Machine Learning

University essay from Linköpings universitet/Statistik och maskininlärning; Linköpings universitet/Filosofiska fakulteten

Abstract: Mobile networks use automated continuous integration to secure the new technologies, which must reach high quality and backwards compatibility. The machinery needs to be constantly improved to meet the high demands that exist today and will evolve in the future. When testing products in large scale in a telecommunication environment, many parameters may be causing the error. Machine learning can help to assign troubleshooting labels and identify problematic areas in the test environment. In this thesis project, different modeling approaches will be applied step-wise. First, both the TF-IDF (term frequency-inverse document frequency) method and Topic model- ing will be applied for constructing variables. Since the TF-IDF method generates high dimensional variables in this case, Principal component analysis (PCA) is considered as a regularization method to reduce the dimensions. The results of this part will be evaluated by using different criteria. After the variable construction, two semi-supervised models called Label propagation and Label spreading will be applied for the purpose of assigning troubleshooting labels. In both algorithms, one weight matrix for measuring the similarities between different cases needs to be constructed. Two different methods for building up the weight matrix will be tested separately: Gaussian kernel and the nearest-neighbor method. Different hyperparameters in these two algorithms will be experimented with, to select the one which will return the optimal results. After the optimal model is selected, the unlabeled data will be divided up in different proportions for fitting the model. This is to test if the proportions of unlabeled data will affect the result of semi-supervised learning in our case. The classification results from the modeling part will be examined using three classical measures: accuracy, precision and recall. In addition, random permutations cross- validation is applied for the evaluation.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)