Using Machine Learning Methods for Evaluating the Quality of Technical Documents

University essay from Linnéuniversitetet/Institutionen för datavetenskap (DV)

Abstract: In the context of an increasingly networked world, the availability of high quality translations is critical for success in the context of the growing international competition. Large international companies as well as medium sized companies are required to provide well translated, high quality technical documentation for their customers not only to be successful in the market but also to meet legal regulations and to avoid lawsuits. Therefore, this thesis focuses on the evaluation of translation quality, specifically concerning technical documentation, and answers two central questions: How can the translation quality of technical documents be evaluated, given the original document is available? How can the translation quality of technical documents be evaluated, given the original document is not available? These questions are answered using state-of-the-art machine learning algorithms and translation evaluation metrics in the context of a knowledge discovery process. The evaluations are done on a sentence level and recombined on a document level by binarily classifying sentences as automated translation and professional translation. The research is based on a database containing 22, 327 sentences and 32 translation evaluation attributes, which are used for optimizations of five different machine learning approaches. An optimization process consisting of 795, 000 evaluations shows a prediction accuracy of up to 72.24% for the binary classification. Based on the developed sentence-based classifi- cation systems, documents are classified using recombination of the affiliated sentences and a framework for rating document quality is introduced. Therefore, the taken approach successfully creates a classification and evaluation system.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)