Information Extraction for Test Identification in Repair Reports in the Automotive Domain

University essay from Uppsala universitet/Institutionen för lingvistik och filologi

Abstract: The knowledge of tests conducted on a problematic vehicle is essential for enhancing the efficiency of mechanics. Therefore, identifying the tests performed in each repair case is of utmost importance. This thesis explores techniques for extracting data from unstructured repair reports to identify component tests. The main emphasis is on developing a supervised multi-class classifier to categorize data and extract sentences that describe repair diagnoses and actions. It has been shown that incorporating a category-aware contrastive learning objective can improve the repair report classifier’s performance. The proposed approach involves training a sentence representation model based on a pre-trained model using a category-aware contrastive learning objective. Subsequently, the sentence representation model is further trained on the classification task using a loss function that combines the cross-entropy and supervised contrastive learning losses. By applying this method, the macro F1-score on the test set is increased from 90.45 to 90.73. The attempt to enhance the performance of the repair report classifier using a noisy data classifier proves unsuccessful. The noisy data classifier is trained using a prompt-based fine-tuning method, incorporating open-ended questions and two examples in the prompt. This approach achieves an F1-score of 91.09 and the resulting repair report classification datasets are found easier to classify. However, they do not contribute to an improvement in the repair report classifier’s performance. Ultimately, the repair report classifier is utilized to aid in creating the input necessary for identifying component tests. An information retrieval method is used to conduct the test identification. The incorporation of this classifier and the existing labels when creating queries leads to an improvement in the mean average precision at the top 3, 5, and 10 positions by 0.62, 0.81, and 0.35, respectively, although with a slight decrease of 0.14 at the top 1 position.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)