Comparison of Undersampling Methods for Prediction of Casting Defects Based on Process Parameters

University essay from Högskolan i Skövde/Institutionen för ingenjörsvetenskap

Abstract: Prediction of both big and small decisions is something most companies have to make on a daily basis. The importance of having a highly accurate technique for different decision-making is not something that is new. However, even though the importance of prediction is a fact to most people, current techniques for estimation are still often highly inaccurate. The consequences of an inaccurate prediction can be huge in the differences between the misclassifications. Not just in the industry but for many different areas. Machine learning have in the recent couple of years improved significantly and are now considered a reliable method to use for prediction. The main goal of this research is to predict casting defects with the help of a machine-learning algorithm based on process parameters. In order to achieve the main goal, some sub-objectives have been identified to successfully reach those goals. A problem when dealing with machine learning is an unbalanced dataset. When training a network, it is essential that the dataset is balanced. In this research we have successfully balanced the dataset. Undersampling was the method used in our research to establish our balanced dataset. The research compares and evaluates a couple of different undersample methods in order to see which undersampling is best suited for this project. Three different machine models, “random forest”, “artificial neural network”, and “k-nearest neighbor”, are also compared to each other to see what model performs best. The conlcusion reached was that the best method for both undersampling and machine learning model varied due to many different reasons. So, in order to find the best model with the best method for a specific job, all the models and methods need to be tested. However, the undersampling method that provided best performances most times in our research was the NearMiss version 2 model. Artificial Neural Network was the machine learning model that had most success in our research. It performed best in two out of three evaluations and comparisons. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)